 Rosemary Harris from Queen Mary University in London. She's going to speak about insight from Nomarkovian Rano Walks. I want to remember the speaker that she has 25 minutes for the talk, plus five minutes for discussions. Please. Good morning. So hopefully you can see and hear me and see my screen. Okay. Thank you very much for the introduction and for the invitation. It's very nice to be here, albeit virtually. So I'm going to tell you about some statistical physics insights from simple Nomarkovian random walks. And in particular, some applications to decision modelling, which fit within the complex systems side of this very nice meeting. So I appreciate that it's first thing in the morning. It's even earlier here in the UK, because you know we like to be different to the rest of Europe. So we're going to start very, very simple. So simple that we could almost be in this book, Statistical Physics for Babies. We're going to start by thinking about Markovian random walk and perhaps the simplest kind of Markovian random walks in one dimension and discrete time. So that means that we have the position at time t plus one, which is just the position at time t plus some step length or noise. Where the step lengths are independent, identically distributed random variables. And of course, the most familiar example is perhaps the case where the step lengths are plus one with some probability, let's call it p plus and minus one with probability p minus. And then you just have a random walker on a one dimensional lattice that goes to the right and left with different probabilities. And even this super simple model can be thought of as a kind of trivial model of decision making. If you imagine somebody repeatedly making a decision with no memory of what they did before and just making the decision randomly. So tossing a bias coin. They perhaps have some vague idea that one of the choices, let's say to move to the right, is better than the other. So they're more likely to take that, but they're just making a random decision at every time step. That of course is not how we make decisions even early in the morning. And Markovian random walks are not really what we're interested in in statistical physics. So now we're going to try and add some memory. And the idea is to add memory to these simple models both to help illuminate some general principles of non Markovian statistical mechanics, but also to give insight into perhaps slightly more realistic decision modeling. But as soon as I say I'm going to add memory, you probably think, well, how, what kind of memory? There are probably as many different ways to add memory as there are people working in the field. You could think, for example, of having some kind of hidden Markov process with latent variables or internal states. Or you could think of your favorite reset renewal model with non geometric times between the renewals. And that of course can be related to run and tumble type processes of the type we already heard a bit about yesterday. So there are lots of different ways to think about non Markov random walks even in one dimension. I'm going to focus on just one class of processes. A class of processes where the step distributions or more generally the transition probabilities depend on the time average velocity or if you like the current of the particle up to that point. So they depend on the whole history of the random walk albeit in quite a simple way. And by zooming in on this class of processes, I'm going to try and tell you two things in the time I have this morning. First and quite briefly, I'll say something about fluctuation mechanisms in systems with long range memory. And that's joint work with Rob Jack in Cambridge. And then for the majority of the talk, I'll focus on this application to decision making with distorted memory, which is based partly on some slightly older work and then on this rather recent preprint. So on to the first part, this is really a tale of two elephants. The first elephant is quite a famous non Markovian random walk, the so-called elephant random walk. Elephants never forget. So an elephant random walk introduced in 2004 is a random walker which remembers its whole history, positions in discrete space and the probability of going right or left at the next step depends on the fraction of steps right or left in the past. In particular, we have some memory parameter A and if the value of the velocity up to the time we're at is VT, the next step is right or left with the probabilities you see here. So we'll concentrate on the case where A is positive and that means we have a kind of positive reinforcement. So if the velocity is positive in the past, the random walker is more likely to go to the right in the future and similarly for negative. So that's the standard elephant random walk. The second elephant that Rob and I considered is a variant of that really which we call the Gaussian elephant random walk. This has a position in continuous space and if the value of the velocity up to the time we're at is VT then the next step is from a Gaussian distribution clues in the name with mean AVT and variance one. So you're perhaps already spot that these models look very similar the mean step length is the same in both of them and the variance is the same in both of them but we'll see there are some important differences. Let's first of all think about the typical behavior. Let's consider a kind of conditional expectation that I'll call Delta. That's the mean displacement for the T plus one time step assuming we have velocity VT so far. Then you can see relatively straightforwardly that you expect kind of mean typical trajectories to be given by the following discrete mapping. So the velocity at time T plus one related to the velocity at time T and this Delta function. And from there you get that fixed point satisfactorily obvious condition that at a fixed point the expected displacement at the next step. So Delta is kind of instantaneous velocity if you like is the same as the velocity in the past. And as usual with fixed points you can consider their stability by looking at the slope of this function Delta V. So you have a kind of cobweb construction which shows you that if the derivative of Delta V is less than one at the fixed point you have a stable fixed point and if it's greater than one you have an unstable fixed point. For both the elephant random walk and the Gaussian elephant random walk then this Delta function is just A times the velocity. So the fixed point is trivially zero and if A is less than one then it's stable. So in the long time limit the velocity converges to zero in both models. Now we want to think about fluctuations to see velocities away from zero. And generally they involve excursions at the start of the trajectory because there's less probabilistic cost for an excursion there. And I'm just going to focus in now on the case where A is greater than a half when the memory effect is strong in some sense. I can't say something about other regimes if you're interested. So A is greater than a half and less than one. The elephant random walk, the original elephant random walk can be treated as a polyurethane. It is a kind of polyurethane so you can use results of Frangini and you can find the asymptotic distribution of the velocity which has this exponential form. And for those who are familiar with these things this is of course just a large deviation principle with speed T, but with some slightly weird rate function that's non-analytic for general values of A. So the memory here has introduced a non-animicity into the rate function. The Gaussian elephant random walk is much easier to analyze because it's Gaussian you basically end up just summing Gaussian displacements and you can find again the asymptotic distribution of the velocity. But this looks rather different. It's again a large deviation form. The rate function is well behaved, it's just quadratic but now you have a different power of T. This is a large deviation principle with a modified speed which was also found in some earlier work and various things since then. So what's the difference between these two models? Why do they have such different fluctuation behavior even though their typical behavior is the same? Well, this relates actually to something we heard about in the very first talk of the conference. It's sort of connected to this idea of a big jump. So in the elephant random walk each step can't be larger than one. So if you want to have a fluctuation above the average, what happens is that you have a macroscopic fraction of the steps at the beginning of the trajectory with step length close to one. So you have lots of steps, a bit above the average if you like. On the other hand, for the Gaussian elephant random walk the step lengths can be arbitrarily long. So the cheapest way to achieve a fluctuation away from the average is to have just a few steps at the beginning that are absolutely huge and then to decay with the natural dynamics of the process. And we call this an initial giant leap but actually it's related to this big jump idea. And these two generic mechanisms long initial excursion under the initial giant leap are found in other models too and we were able to give conditions for when you'd see these different kinds of mechanism in different kinds of non-Markovian models also including many particle models. But let me leave that now and return in the last 15 minutes or so to the decision-making. So how do we actually make decisions? Well, it's very complicated but part of it relates to this idea and economics of utility which goes back to the greatest happiness principle of Jeremy Bentham. Here's Jeremy Bentham, his body is famously at UCL. This is the picture I used to show of him actually he moved just before the first COVID lockdown and he now sits in this box on the right in the Student Inquiry Center at UCL. Nicely COVID secure behind the glass screen. Anyway, apart from leaving his body with these slightly eccentric instructions he formulated this idea of a greatest happiness principle which is the idea that human beings make decisions in a way that maximizes their happiness. And that's the sort of forerunner of what we now call utility which you can think of as a measure of reward or satisfaction or benefit you get from making a decision. And the idea is that if you're totally rational you weigh up the utility for different decisions and you pick the decision with the highest utility. Of course, we're not totally rational. One thing is there's quite a lot of noise in our decision making and we'll see that later. But the other thing is that the utility we use in our decisions might be quite different from the utility we actually experience when we make those decisions. And part of the reason for that is that we miss remember things. Our memory is distorted. And one way that our memory is distorted relates to this psychological heuristic called the peak end rule. So I'm going to tell you about that and then we'll see eventually how it connects to the kind of models we were thinking at in the beginning of the talk. So the peak end rule is famously worked by Daniel Kahneman who got the Nobel Prize for economics. And what he and his co-workers did in the original experiments they didn't look at happiness or benefit but they looked at pain during colonoscopies. So they took a bunch of patients who I'm told had to have colonoscopies anyway and they didn't have general aesthetics. They were awake but mildly sedated and they asked the patients that every minute during the procedure how horrible it was. And then they plotted these graphs of pain against time. And then at the end when it was all over they went back to the same patients and they asked them to retrospectively evaluate it to give a single number saying how terrible their experience was. And what they found was that they could predict very well this retrospective evaluation just by taking a straight mean of the peak the worst bit of experience and the end. So we might look at these graphs and say well this is pain against time so clearly patient B had a worse time because they've got more integrated. But it turns out in real life it's not like that. Really, at least in this case it's just the peak and the end the player rule. The length of experience didn't seem to matter very much it's called duration neglect. And in fact they tested this apologies to those who are still eating their breakfast by keeping the probe inside people longer than was necessary but making the end bit a little bit more gentle so giving them some extra pain and people went away happier. So I think one of these papers is called when more pain is preferred to less. And since then the peak end rule has been tested in different scenarios happier scenarios, how you remember holidays for example with the idea being that you remember the best bit on the journey home perhaps how people evaluate movies and various other situations. And I would say that in some cases it seems to work well in some cases it works a bit less well. But what is true and is evident is that as humans we don't remember everything we don't just remember the most recent things we don't have some kind of smoothly decaying memory kernel but actually we remember particular snapshots particular episodes and in particular we remember extremes. So our aim was to investigate how memory of the peak will ignore the end for now affects future decisions. So here's the decision model which I already hinted at earlier. We have a single agent repeatedly deciding between plus and minus choices right to the left if you like so a one dimensional random walk and we count the number of plus choices up to time t and the number of minus choices up to time t and call those capital X. In fact it's generally easier to consider the fractions which of course are the velocities right and velocities left. So far so boring. The new ingredient now is that we model this utility every time the agent makes a choice we draw a value of a utility random variable experience random variable from some distribution that might be different for the right choices and for the left choices. And the idea is that the agent doesn't remember all of those values of utility. They just remember the maximum for the right choices the plus choices and the maximum for the left choices. And I'll call those things you have plus and you have minus. So remember that the hats is the peak is the maximum and it's that peak that goes into the probabilities for what happens next. So probabilities are going right and left and now themselves random variables capital P plus and capital P minus. And I take what economies call a log it form but perhaps it's familiar to us because they look like Boltzmann distributions where the t that appears here can be thought of as a kind of social temperature or a level of noise. So let's just look at that a little bit more closely. Here it is again at the top and let's see that it makes sense. If capital T is very big, if the noise is very big then both these probabilities are approximately a half in other words, there's really no memory in the process it's just a random walker somebody making random decisions. On the other hand, if the noise is small and positive then there's a bias in the direction of whichever is bigger, U hat minus or U hat plus. So you tend to go in the direction where you remember in this distorted way having the best experience. So let's see what happens. First of all, let's look at the case where we have two choices with the same utility distributions homogeneous choices. Notice that that doesn't mean that the distributions of U hat plus and U hat minus are the same at any moment in time because they will depend on how many steps the agent has made right or left in the past. What happens then in the long-time limit? Well, perhaps you can convince yourself that there are a couple of things that might happen. Perhaps you might say, well, the problem is a priori symmetric so what I expect to happen in the long-time limit is that U hat plus will be asymptotically the same as U hat minus. Both the probabilities will be about a half, a fraction of half, half the steps will be right half will be left. I'll be in a kind of mixed decision state where the agent is sampling both choices. On the other hand, perhaps it's possible that somehow early on you have a particularly good experience, let's say for a rightward step and then you're more likely to go right and you're more likely to have an even better experience. So perhaps you can sort of get sucked into a fixed point where asymptotically you only go right or of course asymptotically you only go left. And that corresponds to a kind of frozen decision state where you're trapped by habits, you're trapped by your past experience. And indeed we see these things. So on the right I plot simulation results showing the empirical histogram of velocity after a hundred time steps with an exponential distribution of utilities for both right and left steps. And in green you have a high value of the noise and you see the distribution is p to put p to around zero and so the symmetric fixed point and in red you have a low value of noise and you see the distribution is p around minus one and plus one. So this frozen state where you're only going right or only going left. And the question then is, well how does this picture depend on the details? How does it depend on the level of noise? And in particular on the distribution of the utilities. It's quite hard in general but there's a sort of shortcut, a simplified model which turns out to give quite useful information. And the idea is that we approximate the random variables for the maximums you have plus or minus by the so-called characteristic largest value after x plus steps right and x minus steps left. So that comes from extreme value theory. You just say, well, if I've made x plus trials of a random variable, what typically will be the largest value of that random variable I see. And you can get those characteristic largest values directly from the CDFs. And if you do that, then the probabilities for going right and left are now no longer random variables that deterministic functions of the number of steps right and left in the past or equivalently of the velocity and time. This is now back in the class of a generalized Elephant Random Worker or a PolyEarn problem. And you can play exactly the same games that I showed you earlier and solve for fixed points and check their stability. And I won't show you the details, but if you know anything about extreme value theory, you perhaps won't be surprised. I hope this will answer the question I see in the chat from Zemida. What decides if the fixed point will be symmetric or asymmetric for large values of time? So you have three generic classes that come from extreme value theory. If your utility distribution has fat tails like a Pareto distribution, for example, then it turns out that the symmetric fixed point is always unstable for long enough times. In other words, what happens is that the agent has a very good experience in one direction, they're more likely to go in that direction, more likely to get an even better value of the utility and overwhelmingly end up going one way or the other. So if you have a fat tailed utility distribution, that's what the simplified model predicts and simulations looking at the root mean square of the velocity in the real model, bear this out that for large values of time, you end up with a variance of one, so in plus minus one. On the other hand, if your utility distribution is bounded like a uniform distribution, then the symmetric fixed point is always stable for long enough times. There might be some early metastable behavior, but eventually the agent will have seen effectively the maximum for right choices and left choices and then we'll be back to symmetric random walk and you'll end up in the symmetric fixed point. So with a variance going to zero for long times. The interesting case, as I hinted at before is the case of exponential tails, then you really do have a transition from a frozen state for low values of noise to a mixed state for high values of noise, possibly with some logarithmic dependence on top. So this simplified model suggests that for homogeneous choices, if you have fat tailed utility distributions or exponential utilities with low noise, you have a kind of ergodicity breaking, a trapping where the time average velocity for a specific agent is not the same as the ensemble average. If you're in this red situation, then a specific agent will have time average velocity minus one or plus one, but the ensemble average will be zero. It doesn't matter because in this homogeneous case, all the choices have the same average value of utility, so it doesn't really matter whether or not you're trapped. It becomes much more serious in the case of heterogeneous choices, which is basically been the PhD work of this guy who's just finishing up now. He's a journalist, Mr. Kappas. And the question then is how can we escape trapping in the wrong choice? How can we avoid the case that we're deluded by some early experience and end up trapped in a choice which has lower expected utility? So very briefly, let me show you simulations of the time and ensemble average utility for the case where you have an exponential utility for right steps with parameter two, so mean a half and for left steps with mean one. So in this case, the left step, the minus choice is the better one. And I plot here the average utility against noise for both the simplified model and the full model, so the real model. And what you see is that the qualitative features are rather similar. And in both cases, as noise goes to zero or as noise goes to infinity with finite time, the average utility has this rather simple form which evaluates to 0.75 here. And the reason for that is that as noise goes to zero an individual agent makes its first choice with probability a half for each direction and then does the same forever after. So the ensemble average is 0.5 times one over lambda plus plus 0.5 times one over lambda minus. On the other hand in the high noise limit, we're back to the case where each agent samples both choices equally. So both an individual agent and for the ensemble the average is 0.75. But in the middle, there appears to be a peak that's somewhere around one over lambda plus. And I know I'm almost out of time. So in the last minute or so, let me tell you something about that peak by concentrating on the simplified model. You no longer have fixed points but you have a kind of dynamical landscape picture which predicts three different regimes. A low noise regime where you still have this trapping and intermediate noise regime between these two dashed lines where as time goes to infinity, you eventually saturate. You eventually escape the bad trap and end up in the good trap. And you can predict that the way you saturate is one over time with a log correction and check that with simulations. And then a high noise regime, go to the right of one on the plot here where again, eventually if you fix the noise and take time to infinity, you saturate at the best possible outcome. But the saturation is much slower. It's a power law which again, you can predict. So I hope I've convinced you that systems with long range memory are interesting as rich fluctuation behavior. And also these reinforcement effects can lead to interesting trapping effects, different classes of behavior even in a very simple model. And if you have different utility distributions you have to worry about how to escape the wrong trap. And it turns out that under certain conditions there's an optimal level of noise which maximizes the returns. Lots of extensions and further things to do but let me stop there. So there's time for one or two questions. Thank you very much. Rosemary, I think we have time for a quick question. Yeah, who wants to make the question? Yeah, there is. Here is what besides the fixed point would be symmetric or asymmetric for large T? Question mark, I guess. Yeah, so I think I answered that in the talk. So the crucial thing is that basically the tails of the utility distribution and that gives you these different classes of behavior. I think the question came in before I explained that. So hopefully it's now clear. If not, do you come back to me, Sumida? Thank you. So thank you Rosemary for the nice talk. And then I think we can go to the next speaker. Let's take the game. Thank you. And now we are going to the next talk. The speaker is Alexia Ufev, Institute Nail France, and she is going to speak about QQBIT in gene fluid by entanglement and local measurements. Please. Thanks for the introduction. You have 25 minutes plus five minutes for questions. Yeah, thanks. So yeah, thanks for the introduction, the invitation, even if I would obviously prefer to be physically present at this conference. So the name of the conference is Statistical Physics of Complex System. And I'm afraid to say that I'm just going to talk about three quantum systems, but at which level should we put complexity? This is an open question because they are quantum disguise. So I hope it will still fit in the scope. So the results that I'm going to present are the result of a very nice collaboration with American colleagues. So Kater Merch and Henry Jordan, respectively at St. Louis and now at Japan University. And these are the young guys that participated actively to this work. The outline of my presentation is the following. So first I will present you some new kind of engine that is quite different from the type of nanoengine that you find in, I would say, mainstream stochastic thermodynamics, which are powered by quantum measurements. So in other words, it's enough to look at the quantum system to turn it into an engine. So I will explain you how it works and what are the fundamental insights behind this. Then I will present you the core of this talk, which is a two-qubit engine. And I will use this two-qubit engine to try to describe a little bit more deeply what I just briefly mentioned so far. Which is the measurement fuel. So where does the energy provided by the measurement come from and the fact that I will be able to describe fluxes of energy with an entangled system will allow us to understand this better. So that's the menu. So first, so the whole talk and the whole proposal is based on this idea that quantum measurement can play a specific role in stochastic thermodynamics. So usually, when you think about the measurement, you think of it in terms of information extraction. But the fact is that in quantum physics, measurement has another facet, which is the fact that it perturbs irreducibly a system. When you look at a system in quantum physics, you perturb it. And if you draw the consequences of this from the energy point of view, quantum measurement can provide energy to a quantum system. So these are these measurement induced quantum fluctuations I'm talking about. And this perturbation, it is also at the origin of specifically quantum stochastic trajectories where the stochasticity of the trajectory doesn't come from the presence of any thermal base. Only again, the fact that I'm looking at the system generates randomness in the trajectory of a system. So the bottom line of this is that it's possible to build a perfectly consistent thermodynamic framework based on the randomness induced by quantum measurement only. No thermal base in the picture. And this is the game that we have played a few years ago with my group and collaborators. Sorry, I want to change slide. So what is this framework about? So as you can see here, the parties that are involved are extremely simple. I mean, this scenery, you could find it in a lecture for undergraduate on quantum mechanics. You have a quantum system that is on the one hand driven by some entity with a time-dependent Hamiltonian. And on the other hand, you measure your system projectively on discrete times. So you apply the measurement postulate. So this is Schrodinger, and this is the measurement postulate to say it in a brute way. And with this, actually, you can build essential quantum trajectory. Because if you have your system that is in a pure initial state, and if you have access to all the measurement outcomes of your measuring device, then you can know, and obviously, as you're the controller, you also know the Hamiltonian that you apply, you can rebuild at any time in which state your system is. So this is a sketch of the typical quantum trajectory that you can build that way. Here, you have the Hamiltonian evolution. You know how to build this. And here, these are the stochastic quantum jumps. But you know here from which point to jump, because you have access to the measurement outcomes here. So this is a stochastic quantum trajectory. And now we are in a thermodynamic talk. So starting from this, you need to define internal energy. And the internal energy, I'm simply define it as the expectation value of my Hamiltonian along this trajectory. So very intuitive. Now, what is the first law in this transposition of standard stochastic thermodynamics? The first law, you need to define where it can hit. So just like in stochastic thermodynamics, I'm going to define work as the energetic quantity that is exchanged during the continuous evolution. And it can be seen as the deterministic energy exchange with the controlling device. While heat, so I'm putting quotes here. This is because there is no thermal bath. Quantum heat is exchanged during the quantum jumps that are induced by quantum measurement. So this is a stochastic energy exchange, which has no classical equivalent. They are really intrinsically due to measurement back action and to the fact that when you measure, you do something to your system in quantum physics. With this, the first law is guaranteed by construction. And you may expect the second law, but I won't talk about the second law here because I have no time for that. And the first law is way enough to what I want to present to you. Before I go further in the proposal, I want to take a simple example. My example is a qubit. So a two level system, which has two states, 0 and 1. And I'm going to do the simplest thermodynamic transformation I can think about, which is preparation and measurement. So I'm preparing my qubit in the ground state. And I'm measuring sigma x. So sigma x, the basis is a bit tilted. As you can see, there are two stochastic trajectories. Either I jump from 0 to plus, or I jump from 0 to minus. And the fact is that since I start from the ground state, each state I will end up into will have more energy, which means that measurement has provided energy to my qubit. So meet the quantum heat. Actually, this is the simplest example you can think about. And if you think about it from a mathematical point of view, it's obvious in the sense that I'm measuring an observable, which does not commute with the free Hamiltonian of my system. And since there is no discommutation, there is nothing that prevents me from having a change of mean energy of my system. So quantum heat is transferred on average during this step. And now I want to use this property to build a quantum engine. So just to give you the general strategy, this is the usual sketch that you have in mind when you think about an engine. It's classical thermodynamics, or I would say thermal-based thermodynamics. So you have thermal baths, or hot source, cold sources, that induce thermal fluctuations. And from these thermal fluctuations, you can rectify them and possibly extract work from the protocol. What I want to do now is basically to replace the hot source that provides energy by a quantum, by a measuring device, and then extract work from quantum fluctuations. So this is actually the essence of what I called measurement-powered engine. And so there have been a few since the first proposal we made in 2017. And always based on the same idea, you have a quantum system. You measure it projectively. You extract some work. That's magical. So the position of the work I want to present today after this long introduction is the following. So so far, all the measurement-driven engines we had seen in the literature, they were based on the situation of a single quantum system undergoing a classical, well, a measurement by a classical device, which means that the measurement channel was not modeled at all. So in the work that I'm going to present you now, we tried to answer two questions. The first one is, how can we extend this idea of measurement-powered engine to multipartite entangled working substances? So typically, the two qubits that are in the title of my talk. And the second question is, is measurement energy, which I described you as heat so far, is it rather heat or work? Because you can also put yourself from the point of view of the guy who is measuring. And from this point of view, well, you spend work to do the measurement. So how can we understand this kind of double perspective on measurement? And can we get a unified point of view on this? So that's the kind of question we have tackled, which brings me to the second part of the talk, which is the principle of the two qubit engine. And the principle is very intuitive. You have two qubits, so red on the blue, or this to say that the red qubit A has less energy than the blue qubit B. And at time zero, you excite the first qubit. So you put an excitation here. And then the two guys are coupled. So a long time, they get entangled, such that the quantum of energy that is here basically gets delocalized between the red qubit and the blue qubit. And then you perform a measurement using a classical device from now. So you ask the qubit, say the blue qubit, you ask it if it is excited or not. And from this, there are two options. Either the blue qubit is not excited, which means the excitation is in the red qubit. And we are back to the initial state, basically. Or your measurement, and this happens with a finite probability, your measurement reveals that the excitation is in the blue qubit. And in that case, if you look at what happens on average, you gain energy because you have a finite probability to get the excitation in the qubit that has more energy. So how do we explain that? Actually, this is typically the same effect I described you before. In the sense that we can write down the Hamiltonian for the two qubits, there is a local term with the eigen energy, the free energies of each qubit. And there is a coupling term, which describe the exchange of the excitation between each qubit. And when I measure, I actually measure something that commutes with the local term. So the observable that I'm measuring does not commute with the total Hamiltonian of my two qubits. And because of this non-commutation property, I get this input of quantum heat. So we can do the energetic analysis of this from a classical point of view first. So actually, there are two conservation rules that really allow to grasp the essence of the mechanism. With this Hamiltonian, you have the first conservation that is the conservation of the number of excitation. I start with a quantum. I end up with a quantum. There is no other option. And we also have, so because of this conservation of the number of excitation, I can look at the local Hamiltonian. And what I see is that it oscillates, which is normal. Because if I start from the red qubit here, then I know that my excitation has a probability to go to the blue qubit. And if I strictly add up the two energy terms, my energy, local energy increases. But then I also have the conservation of the total energy of the two qubits that involve also the coupling term. And actually, the coupling term, it oscillates on the opposite sense to compensate for the increase of the local energy term. So this coupling term can really be seen as some sort of binding energy that is here to ensure both the conservation of excitation and the conservation of energy. So what happens when I do my measurement? When I do my measurement, actually, I'm erasing the coherences between the two states. So I'm erasing the entanglement, basically. I end up in a mixed state. Either my excitation is in the red qubit or in the blue qubit. Doing so, I just strongly increase. I just strongly increase immediately the binding energy. I erase it. It goes to zero. And therefore, the total energy of my two qubits, it increases by the same amount. So this is what I call here the measurement energy or the quantum heat. So we could use it, and that's what we have proposed in the paper. We could use it to build the Maxwell-Diemann engine just by playing with this entangling operation, the measurement, and using the information extracted by the measurement to feedback on one of the two qubits and extract the work. And finally, erasing the memory of the demon. I won't present this part of the game because I want to keep time, actually, to open the black box and to analyze where the measurement energy comes from. Excuse me, my chairman, because I didn't notice which time I started. How much time do I have yet? Left. You're still 10 minutes. 10 minutes, perfect. So yeah, the final part of the talk is devoted to modeling the origin of the measurement fuel. So far, this measurement fuel, as I told you, I treated it like heat because it corresponds to energetic fluctuations that stem from instantaneous projective measurements. So from the two qubit system side, there is an input of energy that goes together with an input of entropy. But now I want to go beyond this and actually stop looking at the measurement only by using the measurement postulate, which does not allow me to model the measurement channel, but start providing a microscopic analysis of this measurement channel. And this is based on what in quantum physics we know as von Neumann pre-measurement model. So basically, we are going to model the measurement as an entanglement of the system with a quantum meter. And you see the sketch of this measurement, this pre-measurement here. So what we are going to do is to couple the blue qubit to a quantum meter here that is another qubit that starts from a well-defined state. And we are going to entangle this blue qubit with the quantum meter to model the measurement. OK, so in the ideal situation for this pre-measurement, the blue qubit and the red qubit, they are uncoupled. And what happens is that we switch on the coupling between the blue qubit and the quantum meter. So the qubit-meter coupling, it has this expression here where you can see that we have a dynamic of the quantum meter that basically depends on the presence of an excitation in the blue qubit. So we can start from this state where the two qubits they are entangled. We couple them to the meter prepared in some zero state, which is the initial state. And then after some time, we build a tripartite entangled state, or GHZ state, where the quantum meter state depends on the state of the blue qubit, as you can see here. And this corresponds to a perfect, non-destructive pre-measurement. And the time at which this state appears, we are going to WTN. So now, we are not in this situation in the proposal for the two qubit engine, because A and B, the blue and the red qubits, they are coupled all the time. And that's where actually the magic comes from. So the evolution now of the two qubits and the quantum meter, we can describe it by using the perturbative approach, very simply, where at zero order, the two qubits act as if they were not coupled. And at first order, we have a small term here that comes from the fact that actually the two qubits are coupled. And what we are going to do is to analyze the energy flow along this pre-measurement process. Our thermodynamic system is the blue, the red qubit, and the quantum meter. And what happens is that between the beginning and the end of the measurement, the three-partite system is totally isolated. We are really describing a unitary transformation of these three guys with a time independent Hamiltonian. So what happens at the initial and at the last time of the process is that I have an external agent that is going to switch on and off the measurement channel, which corresponds to actually the same work input. So we won't take this into account in the modeling. So here's what's happening. So here you remember what's happening if we have the classical measurement picture in mind where the measurement is instantaneous. So we have a jump of energy that corresponds to this input of quantum qubits. Now with the modeling with the quantum meter, we are going to be able to zoom on this measurement and to look at the energy flows within this small window here. So what we must keep in mind during this process is that the local Hamiltonian term is constant during the measurement because it's very fast at the scale of the Rabi oscillation. So during this step, I have a few time now, I'm going to accelerate. I'm aware of the fact that the time flies. So during this measurement, what happens is that I'm starting from a correlation term. OK, sorry. What we saw before is that the coupling term between the two qubits because of the measurement gets erased and this corresponds actually to the input of the measurement energy. Now with this modeling with the quantum meter, we can see that actually this term exactly corresponds to the buildup of correlations between the two qubits and the quantum meter. So the measurement energy to jump to the conclusion with this modeling, we see that it corresponds to the transfer of correlation from the two qubits to the qubit meter system. And what happens is while the measurement channel is switched off, these terms remain constant. And the input of energy of the external operator, it's directly input into the correlation between the meter and the blue qubit. So the measurement energy also corresponds to a work that is provided by the agent to the global three-partite system. So to conclude, if it wasn't clear, is this measurement energy welcome heat? So as always in thermodynamics, it depends on the thermodynamic system that you consider. If you consider the total system of the two qubits and the meter, what you have to model the measurement is a reversible entropy preserving energy input. And that's work. While if you just consider the local system of the two qubits that you are measuring, then the measurement corresponds to some irreversible energy input that leads to some entropy increase that can be seen as heat. So now you could also try to go one step further and ask, but what is, if I have a very global vision of the world, the physical world I'm trying to describe, is there a most fundamental point of view or has this question even meaning? And here after years of headaches, my answer right now to this question is that it really depends on your favorite interpretation of quantum mechanics and of the measurement postulate. If you think in terms of measurement as the creation of a massive entanglement by unitary transformation, which means that your brain is more or less plugged like in Everett's interpretation, then you naturally think of measurement as a reversible entropy preserving process. And for you in that case, measurement energy will be fundamentally worked. But if on the other hand you have von Neumann legacy in mind and you are more or less Copenhagen plugged, then you have this idea that there is always a classical measurement that ends the measuring channel. And this classical measurement, which in this view is fundamental, you cannot avoid the cut, the Eisenberg cut. Then measurement is irreversible. It's not entropy preserving. So the measurement energy from this global point of view becomes fundamentally heat or very similar to heat. So here are the conclusions and outlook. So I've presented you a corpus of work where we play with this double meaning of the measurement in quantum physics. There is information extraction on the one hand. But there is also back action on the other hand. And this back action provides energy. And this energetic imprint of the measurement postulate, you can see it from very different sides. From the system side, this looks like quantum heat. And this quantum heat can be converted into work in measurement, power, and gene. If you look at it from the operator side, there is a work cost. And in that specific case, I presented you this work cost is here to erase quantum coherences and quantum correlations. And if you pick up a global point of view, then the nature as heat or work depends on your favorite interpretation. And I would say it's an open problem and there is no meaning to fight on that. And one important point that opens on application is that this framework I have presented you, which basically leads you to rebuild thermodynamics on quantum measurement only. It's valid in the absence of thermal baths. And this is well adapted to analyze like the energy cost of quantum technologies where the fluctuations are not necessarily induced by thermal sources of noise. And I will stop talking now. Thank you very much. Please stop sharing. We have space for some questions. Please. Hi, Alessia, I'm Andre Gambassi. So my question is about what happens if instead of two qubits, you start considering a statistical system or a larger system. Is the machine going to work anyhow? I mean, maybe some of the concepts can be put more on the statistical framework. Has this been done? No. No, no, this is really... So what you have in mind is to extend the concept of measurement-powered engines to systems that could be, for instance, massively entangled. Is that what you have in mind? Yes, yes. Some, for example... I mean, the simplest case could be a spin chain. Yeah. A spin chain, and then you ask exactly the same questions that you were asking. Because in the past, we started studying something... I mean, I didn't think about much whether to call it heat work. Eventually, I think I studied something similar after a quencher that called it work. Maybe it was too superficial. But so the idea could be to try to see this concept when your system is not as simple to qubit system, but it's a chain. Yeah, that would be very interesting. I would say that to stick to the original concept, which is still exploiting the measurement-back action as an energy source, what you need to keep is pure states. Because you want the measurement-back action. But apart from that, so you worked on that already. I mean, you pinpointed the measurement energy. No, Trili, maybe I send you a message we can discuss. OK. It's pleasure. OK. Thank you. Are there other questions? We have a question here. Yeah. Hi, Alexia. I'm Pedro from the University of Sao Paulo. And in your two qubits engine, you have one step that you have to couple the qubits and also you have one step that you have to measure. And if you do it really fast, then they don't have time to produce a lot of coherence. And also you can enter the Zeno effect regime because you are measuring too fast. And I would like to know if you looked at these two things and also whether this is a limitation for physical applications and physical implementation of the engine. If the Zeno effect would be a limitation, you mean? Yeah, because if you build the engine, you can't go too fast. So I don't know if in the lab. Because the Zeno effect on the contrary is a good thing. Because, OK, so maybe the Zeno effect is a good thing. But maybe the two qubits engine is not the best system to see it. So I'm going to do a fast answer. And maybe if you want, we can exchange more. But the fact is that the Zeno effect, it allows you to provide energy to your system. And at the same time, not to have to pay the price that your measurement is stochastic, because you measure so fast that you basically freeze your system in a certain state. There is no any stochasticity anymore in the measurement. And if you build a Maxwell-Diemann, for instance, then you don't have to erase the memory anymore. So you can switch to a regular Maxwell-Diemann engine to basically a transducer of quantum heat to work without passing by the step where you need to process information. These are engines that work without information, basically. But we have studied this in the two qubits engine. It's also in the first proposal of measurement-powered engine in 2017. That was actually the magic. Okay, thank you. Hi, Alessia. This is Gonzalo Manzano. I'm also very interested in the properties of this energetic fluctuations due to the measurement back-action. So I was thinking whether these energy fluctuations are also similar to kind of stochastic sources of work, like, for instance, in electric currents or kind of stochastic forces that are non-conservative forces applied to particles and so on. So because to me, this just remembers me also kind of this kind of sources, which even if they are stochastic, but the energy that they exchange is more similar to work. This is also, for instance, the case for thermal baths at infinite temperature, right? Which stochasticity, nevertheless, but they are considered to be like work because they don't contribute to the entropy production. Yeah, so the debate on heat and work, and this is also an answer to the first member of the audience who asked me the question. The more it goes and the more artificial I think it is. So I insisted a bit on that just to put efforts to try to close the debate because it actually depends on the point of view. It also depends on your interpretation. It depends on your background, what you think of the measurement postulate. It depends on what you allow to call heat or not because I know some experts in the community who wouldn't want to dub something heat if there is no thermal baths around. So I think at the end of the day, you need to be consistent and also involve more experimenters in the game and fortunately they are more and more present to do experiments. So now regarding the stochastic work source, I must confess my ignorance. I don't exactly know what you have in mind so I can't make an accurate answer or any interesting comments. I'm interested in knowing more. Sorry, I can be more precise. So let's thank you, Alexia, for the very interesting talk. Thank you. And we can move to the last speaker of this morning session is Hugo Tushet, Alembos University in South Africa. Hi Hugo, how are you? Hi, ciao. And he is going to speak about machine learning of loud deviations. Thank you, Roberto. I assume that you see the slides and you can hear me. Yes, we do. Yes, thank you. Thank you for the organizers for organizing this conference and then making concessions to have online presentations. Of course, we would like to be there in person and then hopefully we'll be able to do this next year. For now, I'll give the presentation online from South Africa. I want to present some work that I've done recently with Yebe Yan and Grant Bruskov from Stanford on combining some tools from machine learning to estimate large deviation functions. So this is the reference there, which I'll be describing. It's been submitted recently. I'll start with the context. So we consider a mock-up process which is meant to represent a non-equilibrium system driven in a steady state. So it could be one particle, one Brownian particle, or it could be a system of interacting particles, but it's driven out of equilibrium by external forces or reservoirs in a steady state. The cartoon picture that I'll be using for the presentation is the following. So we can simulate or observe trajectories of the system like this trajectory X of t over time. And so what we do is that we monitor the system over some time interval, zero capital T. And then as we do this, we measure some observable. So we track some observable, which I'm going to call AT. So the observable itself is a function of the whole trajectory. It could be something like the integrated work that is done by the system, by the external force on the system. It could be an integrated current. It could be the entropy production. Usually these quantities are time-integrated and they relate to currents or they relate to the state of the system itself. So this is up to the model or the system that you're interested in. What we're interested in is to calculate the probability distribution of the observable in the long-time limit. And then we know in many cases that that distribution has a large deviation form, meaning that when capital T, so the observation time goes to infinity, that distribution peaks exponentially around some value, which is a typical value of the observable, which at the same time is the expectation of the observable. But the main point here is that you have this exponential concentration, which is described by what we call a large deviation principle or a large deviation approximation. So that funny sign here just means asymptotically when time goes to infinity, this is the dominant scaling of the distribution. And once you know this, then what you're interested to calculate is the exponent of that scaling, which we call the rate function. And so that's this part here in red. So that rate function will basically give you all the information that you want to have about the distribution. It characterizes the distribution to the dominant time scale for the observation time. Often, so we want to calculate that rate function. This is not an easy problem. And then often what we do is that we look at the generating function instead. So we take the expectation of the exponential of time, some parameter, real parameter that we call lambda, and then we have the observable. So this is a generating function. This is a Laplace transform. And if you know that the distribution scales exponentially, the generating function also scales exponentially with time with some exponent that we call the scale cumulant generating function. So it's a cumulant function because you would take the log of the expectation to extract that function. So whether you can calculate this, if you can calculate that function psi of lambda, then I'll show this in the later slide. By Legendre transform, you can get the rate function. And again, we're after the rate function. So these two functions, the rate function or the scale cumulant generating functions will give you the likelihood of fluctuations for the observable. What we're also interested is to what I call the prediction problem, which is try to understand how fluctuations arise in time. So this is the cartoon picture. I'm interested in this type of fluctuation here in detail. It's got low probabilities. So it's obviously a fluctuation. It's not something that I can observe many times if I monitor the system. What I want to do is track down all the trajectories that led to that value of the observable. So here I'm doing this by just coloring those trajectories. And what we want to understand is whether those trajectories can be described as a stochastic process on its own. And this is what we call the fluctuation process. Mathematically, this corresponds to conditioning. You want to take all the trajectories and then condition those trajectories on seeing a particular value of the observable. So it's a conditioning in terms of stochastic process. And then you can show that that conditioning is also a stochastic process in the long-time limit. It's a Markov process. And then you can write down what that Markov process is. And I'll refer to this as the fluctuation process. Okay, so this is the overall context where after calculating the distribution of some observable for a non-equilibrium systems. Now, how do we get this? So as I mentioned before, the rate function can be obtained as the Legendre transform of that function psi of lambda. So the focus of the talk will be on that function psi of lambda. For Markov process, what's interesting is that that function, the scale-cumin generating function is actually the dominant eigenvalue of some linear operator that we call the tilted generator or the tilted operator. So essentially calculating a rate function for a Markov process boils down to a spectral problem. You want to calculate the dominant. You want to solve a spectral problem to extract the dominant eigenvalue as a function of that parameter lambda. And then you'll get this function and while the Legendre transform, you get the rate function. Moreover, when you solve this spectral problem, you get the fluctuation process which I've just described. In the case of a diffusion, for instance, the fluctuation process is a diffusion in which the drift is modified by the gradient of the lug of the eigenfunction. So the scale-cumin generating function will be the dominant eigenvalue. If you select the eigenfunction related to this dominant eigenvalue, then with this eigenfunction you can compute a modified drift, which we see here, which will give you the modified drift of the fluctuation process describing those red trajectories that I have here. So those red trajectories give rise to the fluctuation here. So in a sense, that fluctuation process is a modified process. It's an effective process that has those values of the observable as a typical value. So it describes how those fluctuations are described. And the little lambda parameter that you have there is a Lagrange parameter. It's like a temperature. You fix it by the derivative of the rate function. So this is what we call the duality. This comes from the Legendre transform. So lambda is like an inverse temperature. If you want a large fluctuation, you change that lambda to reach that value according to the derivative of the rate function. Okay, so this is something that's been known for quite some time. We have that effective process that creates or explains how the fluctuation is created. And at the same time, this is good for simulation. It's a good process for simulations because it makes the fluctuation typical. This is how you change the drift essentially to make the fluctuation typical. And then it describes how that fluctuation happens with that modified drift. So this is a process that is useful for simulation, but the problem is that we need to solve the large deviation problem to get that fluctuation process. So there's no free lunch here. Whether you solve the spectral problem or whether you know the fluctuation process, then it's the same thing. Okay, so we want to calculate the rate function. We want to construct the fluctuation process. And there are many methods that you can use to do this. There are only a few systems that you can treat analytically. So many numerical methods have been developed over the years to solve that problem. Not actually specifically sometimes for the large deviation problem, but for rare event simulations. But once you have methods for a rare event simulation, you can apply those specifically to the large deviation problem. So here I'm putting a kind of landscape of the different methods. I'm not going to go over all of this, but you have different families. So you have sampling methods like important sampling. You have splitting or cloning. Then you can tackle the spectral problem numerically directly. So this is the first approach. You might want to solve the spectral problem numerically. And at the same time, we know that there's an optimization representations of these functions so you can try to solve that optimization or control problem, which I'm going to describe actually in the next line. And the problem with this is that often you might, beside actually solving a spectral problem which is not easy to solve, often, especially if you deal with many particle systems, you have to represent functions like the eigenfunctions in a very high dimensional space. So that's the bottleneck here. For the sampling over the cloning, the problem is that you have to simulate many trajectories, many clones, and this is memory time intensive. So there are different methods available. Some are more adapted to some problems than others, but they share the same, they share different difficulties, but it boils down to solving a difficult problem. And again, there's no free lunch. If you can solve the problem using cloning, it means that you've solved the spectral problem or an optimization problem. On the other hand, if you solve that spectral problem, you could have, you could use the solution to actually do cloning and important sampling. So they're all related. The approach that will be, that I'll present is actually based on a optimization representation of the large deviation function which was known, was developed in the 70s and 80s and then I've worked with this with Raphael Chitrit more recently and then Rob Jack and Peter Solik also worked on this at the same time. So the idea here is based on important sampling. So this is the original process and the original process will generate some measure probability for the different trajectories. And what you do is that you want to change that process to generate a new distribution of trajectories. So think of the blue trajectories as being the typical trajectories of your system that gives rise to typical values of the observable and we want to change the process. We want to bias the process or tilt the process in the direction of a fluctuation. So to do this, you modify the process meaning you modify the distribution of the trajectories. And then the control representation or the optimization representation says that the scale generating functions or the rate functions will actually be minimizers of some cost function. And the cost function is quite interesting. The cost function is a kind of lug of the ratio of the two path distributions that you generate from the original process in blue to the modified process in red. So if you have the ratio of that path distribution that gives you a random variable which is the cost. That's the random cost associated with a given trajectory. If you take the expectation, this is the relative entropy between the modified process in red and the blue process, the original process in blue. So that's kind of control cost. It's additive in time. You can show this. And so if you minimize this in the long time limit like this, then you get the scale to generating function. You see you have the lambda parameter there. If you constrain the minimization, if you minimize the cost subject to reaching a given fluctuation, you get the rate function. So in a sense, what we've done here is to represent the large deviation functions that are the rate function and the scale can be generating function as an optimization problem. On this side, you have an unconstrained minimization problem. On this side for the rate function, you have a constraint minimization problem and they are related by Lagrange multipliers. And this is the same as the Legendre transform that connects the rate function to the scale convention generating function. Now, we know what the minimizer is. We know what the optimal change of process is. That's the effective process. That's the fluctuation process I mentioned before. And so you can view now the fluctuation process as being a control process. You modify the process in the control way and it's optimal in the sense that it's the one that minimizes the cost to give you the large deviation functions. So this is one way to represent the whole of large deviation theory and the fluctuation process as a control problem. Now, for the purpose of simulation, I'm just going to mention this. We can measure the cost. We can calculate the cost over trajectory and the cost for diffusion is just a distance square between the original drift of the of the diffusion minus the modified drift. So this is interesting because now we have a quadratic cost which is simpler to deal with for diffusion. You can also define the cost for jump processes or Mach of chains is just slightly more complicated. But we know these formulas. So we have the explicit expression for the cost. Okay, so now I'm going to describe some previous approaches that have been followed to solve this problem. Now, again, we have two approaches. Either we solve a spectral problem or we solve an optimization problem or a control problem is the same. So we and these are related. Again, there's no free lunch. If you know the solution to the control problem, it means that you know the spectral problem and vice versa. The big difficulty here is to represent the eigenfunction. So a natural approach is that you want to approximate the eigenfunction into some basis of function. So you want to approximate this using some function that has a bunch of parameters. And so this is the architecture that you're going to use to approximate that eigenfunction. So there are many possible choices. And so in the past people have used basis functions like Fourier transform Fourier basis to represent the eigenfunction or Lagrange polynomials or Legend polynomials and different things. And then a lot of work has been done on this by David Limmer's group at Berkeley. I'm putting some references here for some other system, especially jump processes defined on nets or interacting particle system. You can also use matrix product systems or tensor nets to represent the eigenfunction. So this is a different representation of the approximation for the eigenfunction. It's a different architecture that will involve different parameters. Then more recently it's quite natural to use neural networks because you want to represent a very high dimensional function. So it just makes sense to use neural networks to do this. It's been done in many contexts and now people are starting using the tools that people have developed for instance in machine vision to represent eigenfunction. So I'm putting some references on this. And then once you know that you can represent these functions as a control problem, it makes sense also to use ideas for reinforcement learning to try to solve the control process. And this has been done very recently by the group of Juan Garan in Nottingham and more recently actually the two groups have started working together on this idea of using reinforcement learning. The approach I'm going to describe and this is the approach that we followed in the paper that I've mentioned before is slightly more direct. What we're going to do is that we're going to do a gradient minimization of the cost function related to the control problem. So we take the control problem as the line of attack and then we try to minimize the cost function directly by simulating trajectory. So there's no reinforcement learning. We just do a gradient minimization of the cost and then the input here is that we're going to use neural network to represent the control force. So there's going to be a minimization of the cost function and a feedback now to represent the control force and it's all based on their own networks. So I'm going to describe this now. So this is what I call a stochastic minimization. Again, we have the cost function which I've mentioned before and what we want to do is to minimize the cost function in an unconstrained way. So I'm going to solve this problem. I want to minimize the expectation of lambda, the observable minus the cost function. And this is the cost estimator which I'm using for the cost function and I do this in the long time limit. So what we do is that first we initialize the neural network to represent the force. So initially we don't know anything about the control force, the optimal control force. So we just initialize the parameters in some random way or we just initialize the parameters in such a way that we represent the initial drift. And what we do is to simulate M trajectory. So we simulate an ensemble of trajectories which we call batches that follows the terminology used in machine learning. So I'm simulating M short time trajectory. So we'll deal with the long time limit later but essentially I just need some ensemble to calculate some kind of mean cost. I need many trajectories because if you calculate the cost only over one trajectory then the gradient is too noisy. So you need M trajectories just to make the expectation or the average more well behaved. So you compute the cost for those M trajectories as a mean over the M trajectories and this is calculated also over the observation time t. Then you compute the gradient because you want to update the neural network and then you do this by normal gradient techniques for neural network. So you can do this by standard back propagation and then we'll see some example of this in a minute. So you can use just off the shelf software to do back propagation to calculate the gradient or if the system is quite big and then we'll see an example of this too. If the system has many particles then back propagation is not really efficient. The complexity is too high. In that case you can use an adjoint method to compute the gradient which is very efficient because it only uses the trajectories that you've simulated. Once you have the gradient you can update the neural network. So now the parameters will be updated according to the normal stochastic gradient formula and then gamma will be the learning rate which is an hyperparameter for the system. Now, simulating these m trajectories and computing the gradient is one step of the training. So you repeat the step one to four many, many times. So you repeat just the training steps. So I'm simulating m trajectories. I update the network. I'm simulating another m. I'm updating the network and so on until you see convergence of the cost function. So this follows normal training methods for neural network. And then once you've once you've reached convergence then you have one point. So you have the scale generating function for one particular lambda. So then what you do of course is that you repeat this for different lambdas to construct the large deviation function. And from this you can use the Legendre transform to get the rate function. There's some extra steps that you can put into this for instance instead of starting with some fixed lambda you can start with some lambda that's close to 0. And then you can just step the lambdas gradually. So then you can use the the final information that you have for a small lambda as the initial condition for the next step for the other lambda. And then with this you can cover a whole window of lambda values using again as initial condition the last convergence point. And this is called in machine learning transfer learning because you're just using the the learn network to inform another simulation for different parameters and that usually speeds up the convergence quite. Also if you're dealing with dynamical phase transition you can use replica exchange so you simulate things to systems for different lambdas and then you swap you swipe them you swap them consistently and that also accelerates the convergence. So this is the algorithm it's based on the neural network I'm not putting the details of the neural network that we're using. It's just even off the shelf software for representing that function. You just have to specify the number of layers the number of nodes per layers and then it's basically you just press play and that goes into this. You just have to be careful to track convergence to make sure that you're going somewhere. So I'm going to present now two examples. One simple and then one more involved with with active brown and particle I'll start with the simple diffusion this is just a test to see whether the algorithm works well in the low noise limit. This is a system that was studied in 2016 by Nemoto also Freddie Boucher. So this is a simple stochastic differential equation. You see it here it's a one-dimensional system with a quartic potential so it gives rise to this x3 diffusion drift and then the observable that we're tracking is this one here. So the reason for choosing this one is that it's not a linear observable it's not quite quadratic it's a mix of the two and this is non-trivial because it's a quartic model so it doesn't have an exact solution it has one for the low noise limit but otherwise it's not just it's not the the trivial Einstein of the process. So for this we did the simulation so we have here it's very simple we're using two layers for the neural network and we have 50 nodes per layers for the training step we just simulate 200 trajectories more or less for a small time t which is just t equals 10. So here actually we can calculate the limiting function which normally mathematically arises in the long time limit by taking small time and then you can track down convergence by looking at time but what we've seen and then it's also something that I knew from previous simulations with other technique is that you don't have to go to long time to get the large deviation limit. For this system we use direct back propagation so there's no fancy algorithm for calculating the gradient and you have the results here. So we have very smooth results for the skeleton generating function and then we can decrease the epsilon which is the noise amplitude to get to the known analytical result which is the the the solid curve here. What's interesting for the system is that there's a dynamical phase transition so there's a non-elastic point at lambda equals 0 you see the second derivative here it jumps so there's a second order phase transition here. Here I'm showing the training convergence of the skeleton generating function for different lambdas. So you see we just initializing the network kind of random and then we just follow the training steps and then we see convergence of the skeleton generating function. You can also plot the cost function and then it converges with these plateaus. There's a kind of shift here which is related to the network itself there's some dynamics but that's a dynamics of the neural network because here this is the training step. I'm not going to discuss the dynamical phase transition so much I just want to really focus on the numerical algorithm as such. And so here you have the result. One good thing about the algorithm is that there's no slowing down as you take smaller and smaller noise. The algorithm is just the same you don't change the number the simulation parameters that we do that you're using and then the quality of the results is just the same as epsilon goes to 0 which is not the case for cloning cloning has this problem that you have a critical slowing down of the simulation with epsilon goes to 0 so you have to increase the population size as you decrease the noise and there's no such problem. For the second application I'll just describe something in more physical this something we've seen yesterday in some of the talk talks this is a system of N interacting brun and particles that are driven by a potential which I'll take here to be a week's Chandler Anderson potential so WCA potential which is a bit like a Leonard Jones potential and then they're driven by some angle diffusive in line alignment which makes them active so you have a bunch of active brun and particles this is the set of stochastic diffusion diffusion equations that are coupled and they're coupled with the potential but also with the angles so that alignment angle is driven by an angle phi which is diffusive so the angle actually follows a brun motion itself and that aligns the particle in some direction with some velocity V there so that's the system you have now in interacting particles the the control force now is a control force for all the particles so it's quite a high-dimensional object and so this is why now it makes sense to represent this with a neural network the observables that we're tracking is the entropy production this is known to be a nice observable to track because it shows also a dynamical phase transition which is related to a particle bunching and collective behavior of the particles in a fluctuation sense and so that's been described also yesterday and some of the talks so here I'm showing the results for 40 particles 80 particles and 200 particles this is now the derivative of the Skelter generating function we know that the derivative of the Skelter generating function psi of lambda gives you the typical value of the entropy production for that lambda so it gives you the fluctuation value by tweaking the inverse parameter lambda here so it's actually upside down so we have the typical behavior here and then we have the rate functions for the different particles so the one I want to focus on is this one here so you see that you have different likelihood of the entropy production and with that we can relate to different configurations so the very unlikely fluctuations of the entropy productions are related to particles and the particles are related to particles bunching so you see this this is a snapshot of one trajectory of the whole system and then here you have just the normal motion that you would see if you don't bias the system so that's the typical behavior of the particle which has no bunching so again I'm not going to discuss any of the physics of this this will be the subject of another talk I just want to show how the algorithm is working here one thing I should mention is that for this particular system the gradient training for the neural network but we don't use standard back propagation for this for calculating the gradient because the system is too big if you have 40, 80 or 200 particles when you do the back propagation the size of the back propagation actually increases with the number of layers that you have in the neural network and it just explodes so in order to calculate the gradient here we use this other method that I've mentioned the adjoint method in the adjoint method all you need to know is the trajectory that you simulate and then the inverse that the time reverse trajectory that you've simulated so you only keep you only need to keep in store one trajectory at a time or in this case the m batches so it's it's very low complexity that's about it so I'll go to the conclusion so what I've discussed as some advantages over the other methods that have been devised so far one is that it's low complexity so we have the code available and then you can see most of the code is actually built from off-the-shelf packages that you can use for neural network especially if you only using standard back propagation and not the the adjoint method so it's low complexity you're not reinventing the rules here you can use PyTorch and other packages in order to build the neural network and then specific to your system there's a PyTorch for stochastic differential equation which is torch SDE it's got low complexity also because you don't need to store many trajectories like the cloning methods you only need as I mentioned to store only one trajectory as you simulate and then you can just erase a trajectory and then simulate another one I I focus mostly on stochastic differential equations but you can apply this to Markov changes jump processes more or less without changes so it's very it's very easy to code it's kind of blackbox you only need to tweak some parameters but it's off-the-shelf and blankbox and it works the same whether you consider a discrete system or a continuous system which is another advantage for instance over matrix product states because these works only for discrete systems now there there are various questions that you can ask yourself in various problems that you have to deal with so it's not all um uh rosy here one which is common to all the methods and in fact it's a problem that you have to deal with whenever you're using their own network is whether you true global minimum whether you've calculated the optimal control force and not just something that is sub-optimal um so we're working on this now they're specifically for the large deviation problem you can devise some test to tell you whether you you're in the global minimum regime so for instance if I go to the training here diagram where you you see convergence the fact that you see convergence doesn't mean that you converge to the global minimum so we're working on this trying to devise some test to really make sure that you reach the global minimum and then another more obvious problem is to look at the physics of the learn control force so for the active run in particle system especially um my collaborators are looking at kind of characterizing the the learn control force which is the optimal control force that will generate fluctuations trying to see if that force is local interacting potential or whether it changes also um the active part of the drift there and another problem of concern is to apply this to many particles so I've shown now simulations for 200 particles the the idea is to try to apply this to many particles and then this seems to be possible and it seems possible to reach a thousand or even more particles because the method is low complexity again all you need to store always is one trajectory as you go along um so with this I'll conclude this is the reference again if you want the details there's also supplementary material about all the numerical details for the neural network and if you're interested in the code this is the github address for the code that was devised mostly by Yaby um the post of working on this if you want more information about large deviation theory itself then I can direct you to my website thank you um I think we're a bit in some delay so there is space for a short question other questions please hello my name is Pedro from Sao Paulo and I would like to know if you see any clues for example in the stochastic differential equation that tells you oh now I need to use cloning algorithm or now I need to use this machine learning algorithm and to what extent did you compare this new algorithm to previous ones um okay so so there's not that many comparison and comparison and then something that we have to work on so so one point of comparison actually was this one the first simple diffusion so this system has been studied using cloning so we can compare it um the advantage of the method we have is it's just that we're not simulating many clones and again as I mentioned there's no there's no slowing down with the noise parameters so they seems to be some advantage here now then for the first question how can you decide this is a very difficult question I don't think is a is a clear cut answer here it's it's it's also very it's the same question that would you ask yourself if you want to solve a spectral problem or an optimization problem there's not one method that will be will be the best it depends a lot on the system that you study and the dimension of the optimization problem that you want to solve and the particular nature also of the the optimization problem whether it's a discrete optimization problem what's going to happen in the future is that we're going to have this ecology of different methods that we can combine and then we can use in different contexts you can combine cloning with this method you can combine important sampling with this method and so yeah and then there would be some advantages in keeping all these methods and combining them. Okay, thank you very much. So let's thank you go again for the talk and now we can go for the coffee break and we reconvene at 1110 so 10 past 11 so we will also the participants online so hi you go now. Thank you very much. So, welcome back. It is our greatest pleasure to have with us giving a talk physically in presence and present talk. Thank you very much for the coming in spite of the difficulties. And well, who doesn't need much of introduction so please you have 25 minutes for five minutes question. Thank you. Andrea. Thank you very much for the invitation also thanks to the bus. Sebastian Edgar and material for arranging this in difficult times it's really appreciated that it's my first trip since since the pandemic. So I'd like to introduce you and to highlight a few aspects around this notion of this thermodynamic uncertainty relation. And I hope to convince you that is arguably one of the most exciting developments which came out of stochastic thermodynamics. So, with recalling the principles of stochastic thermodynamics using molecular motors as a paradigm. And then I'll come to the core topic. So the basic idea. So this is supposed to move. Where is the problem. Okay, good. So the basic idea is shown on this slide, as I guess most by now if you know the idea of stochastic thermodynamics is to try to check whether and how the rules, which have been developed 200 years ago to understand machines like this one, whether and how these rules can be applied to this much smaller machines on the nanoscale like this rotary motor the F 180 PAs. Okay. So the basic modeling concept is Markovian dynamics in this field. And I'd like to illustrate it using a real life experiment by Matthias Reaves group. So they were looking at a small peptide a small protein, which can fold in different ways. And they've been able to distinguish six different Meso states of that molecule, the corresponding network is shown up here. And by applying a pretension to this molecule, and just looking at time traces, they saw these clearly distinguishable six states, and they were able to measure the transition rates in this network experimentally as a function of the applied force. So for any given force, it's an equilibrium problem. And I like to use this as a my opinion marvelous illustration of Markovian dynamics. So, once we know we have Markovian dynamics. The concept of the Meso states arises on this time scale. So we're assuming that the very many microstates contributing to one Meso state fluctuations of water molecules are side chains that these transitions are fast. And then on the Meso level, the capital IJ level, we have this Markov description. And the key point is that as I have just shown you the rates can be measured experimentally from such time traces. But the ratio of these rates, then also defines the free energy difference of any two of these Meso states. And by doing this at a different temperature, you can also experimentally get the internal energy and the intrinsic entropy of each of these states. So my main point here is that these quantities are operationally accessible in such experiments. So, there is a second paradigm for the dynamics which is longevity dynamics. And I guess, most of you know by now that this longevity dynamics in the overdamped case can be easily be in doubt with the thermodynamic interpretation concepts where the work internal energy anticipated heat can be derived or can be applied to such a dynamics. And then, for instance, the total entropy production is defined pathways or trajectory wise, essentially, as the log ratio between observing the probability and observing the trajectory, compared to observing its time reversed, exiled the trajectory, and that turns out to be to have two contributions, one is one is the heat, and the other one is this kind of stochastic entropy. And then it's easy to derive exact relations like this integral fluctuation theorem or the detailed one up here. And if you have seen the slide but I want to make one point. If you now measure experimentally this is the distribution which we did some 10 years ago with claimant speaking and stood cut, and you plot the data in this logarithmic fashion, you get a slow, you get a slow one which is the fluctuation theorem. What did you learn. Not the terrible lot you just learned that your experimental system complies with this longer my question. The interesting parties, or the interesting aspects are systems where it doesn't work so easily. And nice example for that is this rotary F 180 PA is these are data from Japanese group also 10 years ago, they are looking at the distribution of the angle of this rotary motion which is made visible by looking at a micron size probe particle attached to the nano sized molecular motor. Here you see traces of this angle as a function of time. You can clearly observe 120 degrees steps. Now if you plot the angle again in this logarithmic fashion. And if you assume that this dynamics was given by the simple rotary Brownian motion, you would predict that it again there is a slope and the slope should be independent of time and that would be the fluctuation theorem for the simple dynamics. However, it turned out in experiments that the slope depends on the time over which you allocate the data, which tells you that this is not this angle is not really equivalent to entropy production so there is a hidden angle of freedom which you do not see, which distorts the simple relationship. And in order to see that let me recall one model, which we developed for this for the system. So first, you can map this rotary motion to a linear motion, and then you have to appreciate that the nano sized motor will jump in 120 degree steps, but there is some elastic linker between this pro particle and the motor. So when the motor jumps on to 20 degrees, essentially that spring is loaded, and then that particle may follow or may even pull back the motor. Of course, modeling this will then require thermodynamic consistency. So, again, the ratio of the motor jumps has to involve the corresponding free energy difference. If you do that, you indeed get data, which are quite close to the experimental one so this is experimental this is our theory, and you can measure the slope or calculate the slope as a function of time, and you find precisely what has been seen in the experiment. So what we find in this introduction is that first there are exact theorems like the fluctuation theorem, which are identities. And if you look at this identities in a certain system they are either obeyed or they are violated. If they are violated you basically know that you have not covered the relevant degrees of freedom. You can come up with specific models as we did in this case, but this will require a model and it would be nice to have relations that are universally valid and use using those. You want to extract properties of systems in a model independent way, and that's actually what the thermodynamic uncertainty relation achieves as I'll show you now. I want to motivate it with my motivation when I started thinking about it some six years ago I guess I asked myself a very simple question suppose you have a clock or a watch, and you have this watch in a finite time, finite temperature environment. So clearly there will be fluctuation clearly such a watch such a clock will not be infinitely precise. The question is, is there a relation between the energy required to drive the watch, i.e. the battery and the precision, i.e. is a more precise clock more costly in its operation, not in its initial manufacturing but in its permanent operation. And I'll give you the answer yes, there is such a relation. And what we know by now is that if you operate a clock at room temperature and you want it to be precise with let's say an error position of one second per day, you have to pay at least 10 to the temperature per day. Now this figure sounds completely crazy and it's not obvious at all, perhaps not completely crazy but certainly not obvious. And I'll show you how you get to such a result and how universal it is. So again, you start of course with a simple small so my model for a clock is just an asymmetric random walk so each step of let's say the second hand of such a watch is will require some process some energy for instance ATP hydrolysis so the simple clock is just the asymmetric random and you know that after one minute on average, this will have made 60 steps, but it comes with a variance and this variance is given by the sum of the rates. So the uncertainty the position is defined as the variance divided by the output squared, and that leads to this combination of diffusion coefficient and current J. Now the cost, we know the rules for such processes so the ratio between forward and backward rate is the log ratio is given by the entropy production and the entropy production is the change in delta mu of this chemical stack. So if you combine these relations, and you find that the product between cost and uncertainty is given by this hyperbolic function, and that is larger than two kbt. That's a trivial calculation you just have to combine these two, these two results here. I wouldn't be talking about this if this was true only for this asymmetric random walk, but it turned out, and this is work with Andre. We looked at many networks with it analytical case studies. We worked out that we were not able to deliver the proof which was then given a year later by the MIT group. What is that this relation between cost and uncertainty is true for any process modeled as a stationary Markov process at finite temperature. And now if you want for instance a precision of 1% you have to pay at least 20,000 kbt and from this you get this number I quoted for the watch. So, in a sense slightly more generally you can show that for any current, which is basically given by the sum over all links and you can associate with each link IJ you can associate a time asymmetric weight D. For any such current the variance are the mean that the variance is embodied in this fluctuation theory of this diffusion constant story. That is constrained by the entropy production or the turning the other way around the stationary entropy production is larger than the mean squared divided by the variance so you see if you want a current with a very small variance you have to pay a lot in terms of describing the system. And I want to show you because this is mostly a theory audience I want to show you the kind of simplest proof this was not the original one this is now based on work by the hand and Sasha, though they were looking at the generating function for the general observable q which is defined as in Hugo's to chat start this morning as a functional of the trajectory. And now the idea is to bound this generating function through an auxiliary process with the same network but different rates K dagger. So this is an identity, and then you use essentially Jensen, and then you find that this generating function can be bounded by this expression which involves the cool buck like the distance between the two distributions. So one P dagger is this auxiliary one, and now you have to choose this auxiliary one in a smart way. So you perturb the original one slightly, but through this parameter Z. You get this cancellation of linear terms, and for the quadratic ones you have this relation and then by a variational ansatz, making sure that the new distribution is the same as the old one, you get actually this relation here for the variance of any observable, or in particular if there's such an observable is the current, you get this relation and this now holds even in finite time t the original was in the three to infinity limit, but we knew and again this was separately proven by Horowitz and Cambridge that even in finite time there is this relation. Okay, so now we have this what can we do with it. So one example, which are, which is on molecular motors, which I find particularly revealing about the power of this, of this relation. So we're using here data, 20 years ago from blocks group, where they were looking at the motion of the kinesine molecule against a specified force which was realized by this kind of feedback trap laser trap. And then they were measuring the velocity which corresponds to the mean current they were measuring the dispersion of this velocity which I'll call diffusion constant corresponding to this current. And typically that's experimentally quoted in this ratio called the randomness. So here you see beta this is the randomness as a function of the ATP concentration which drives the motor. And look at the randomness as a function of the external load for fixed ATP concentration. And these are data which has been around 20 years. Now we apply our uncertainty relation to this data and how do we do this. Well, we look at the efficiency. The efficiency is output divided by input output is the mechanical work so that's the velocity against the externally imposed force input is not known because you do not know how many ATP molecules the motor needs without making a model and we don't want to make a model. So we know that the difference between input and output that's what's lost so we can replace the unknown input by the output plus the entropy production and for the entropy production we have this inequality through the velocity and the dispersion. So you see now on the right hand side, they're only experimentally known or measurable quantities on the left hand side there is thermodynamic efficiency. So these are the data I've just shown you but now I'm overlaying this bound in this colored curves. So for instance, let's take this point here. So this is the 45% line. The statement now is that at this load, this molecular motor is at most 45% percent efficient in translating chemical energy into mechanical motion. And this is a model free statement. I'm saying that the motor is 45% efficient. I'm saying it is not more efficient than 45 it could be less 10, 5, 1.5. But it's the first time that you can make this kind of statements without assuming an underlying model. Okay, let me show you another example. This is to a well controversy or debate whether or not it's possible to reach car no efficiency at finite power pre 2011 I would have thought. Now that's impossible. It requires quasi static motion IE infinite cycle time IE no power output. But then there was this this beautiful up here L by Cassati and Julia Benanti and Katie say to who pointed out that in systems with a magnetic field where the on soccer symmetry of the off diagonal elements of the on soccer equations is broken that in such systems if you just use this 60 style linear reversible thermodynamics, you find that it might be possible or it's not excluded to have car no efficiency at finite power. I found this surprising I thought there must be some additional effect which will prevent this. And then there were quite a number of papers. And using the uncertainty relation, we could understand this various case studies now in the following way. In the steady state case power is a current. And the uncertainty relation to this work current which is power. And now the entropy production can be related to the exchange heat and then you if you play a little bit around you find that the power is less than, is less than the linear distance of your true efficiency to car no efficiency so if you just look at the round bracket here, you would say that power has to vanish universally at least linearly as you approach car no efficiency. However, there is this amplitude DW which are the fluctuations in the power, the corresponding diffusion coefficient. And I see that if they, if these fluctuations blow up, there is in principle the chance that you reach a finite limit. And that explains many of these case studies which have been invented. So there is a trade off not just between power and efficiency as we thought previously, but between power efficiency and constancy. And again, this is a very general relation. That is just based on assuming a Markovian description. So, the next question of course is, let's stay within a Markovian, but let's leave this realm of non equilibrium steady states. And then the first class of course is periodic driving, and we knew from with work with Andre that in general the thermodynamic uncertainty relation fails. So if you drive the system periodically you can get a higher precision at a lower cost. So in terms of time symmetric protocols, there was a nice bound by press months and the late Chris London Brooke, and then there were quite a number of what I would call technical bounds which are, you know, mathematically correct but useless since you don't have access to these quantities. And we found with my grad students we found a relation which I will describe later in a more general way which I think is operationally accessible. So in terms of systems, you might be interested in relaxation towards equilibrium or towards an unequilibrium steady state, and there were also relations valid now for this situation. So let me describe our, the result was Timor Coyote which, which generalizes and comprises many of these previous relations. So general time dependent driving IEV allow rates, which are time dependent. So they are, they depend on an experimental parameter lambda, and we assume that this lambda parameter can be experimentally changed faster or slower. And we observed the system for a total time t. The finalization of the tour is the following one. J is the mean current now observed for a fixed control speed V and the fixed time calligraphic T. This is the corresponding diffusion coefficient Sigma is the corresponding entropy production rate now averaged over this time dependent time dependent process. And this is the old relation except for this delta J term. So there's an additional term, which says, how does the mean current change with the observation time and how does it change with this experimentally speed parameter. Again, this is all of this is experimentally accessible. So let me show you two illustrations of this more general relation so the kind of paradigm is typically a driven tap particle so we start with the particle in a harmonic trap and then at time t equal to zero we start moving that trap of velocity V. And then we can look for instance at the finite time mean velocity which of course is just the distance covered divided by the time, or we can look at the finite time applied power, which which would be essentially force time systems, or the DVD lambda more more more precisely. Now we are measuring this experimentally accessible quantities and we look for the rest of the talk I'm plotting the uncertainty relation, putting this on the other side, and I call this the quality factor. So this will be a quantity q which is bounded by one. And you see now data. So the solid lines as a function of the observation time shows you the quality of this generalized tour for different stiffnesses of the spring, based on measuring the velocity or if you measure the power these are the dash data. And again, you can evaluate or experimentally you could evaluate these quantities for instance without knowing the stiffness of the trap. You just measure the velocity. And what you find is you would recover, depending on in which regime you are you would recover something between 40 and 80% of the entropy production. Again in a model free way. You don't, you don't need to know the stiffness. And perhaps as an even more convincing example. Let me come back to this experiment I introduced earlier. In these experiments often people also apply a force wrap. So the force becomes time dependent. I'm almost done with it. If you now just ask what is the state of the system in at the final time. This is I call little a or you ask how much time did the system spend in a certain state, let's say in the unfolded state. You have again this relation, which you can evaluate just based on your experimental data. And here are these quality factors so you see depending on the observable you use. Again, you're able to recover something like at the maximum 80% of the entropy production in this process, without assuming a model, just by observations. And you can, you know, you can observe cosplay variables, you don't have to observe the Meso states on individually. Okay. So just on one slide. I think that's the last one briefly mentioned where we know which situations which go beyond the Markovian regime. So we know that for under them dynamics, it fails for finite time. We believe that it's true in the infinite time limit, as far as I know, no proof yet for under them dynamics. So we know that we can break it with a magnetic field. We have applied it recently to stochastic field theories like the KBZ equation. I should mention there is a number of more generalized relations which typically get very bad if you go to the infinite time limit. Of course, it's now really strong activity to look at this in quantum dynamics is for five of the very early papers, and I don't think a complete picture has been reached yet. Thank you very much. Thank you for a very nice talk. There is space for some questions, please. You mentioned the possibility to reach the car no limit at finite time for system with the magnetic field. And, but there is some intuitive physical reason apart the computation there is, there is some intuitive argument or just the just manipulation. I mean, it's, it's, it's a kind of simple manipulation, but what Cassati and coworkers did, just based on this matrices linear irreversible thermodynamics and then you find that when the ratio between these off diagonal elements. No, no, no, no, this is understood, but there is some intuition why with magnetic field is possible without possible. No, I don't, I don't have an intuition, but we know that even with the magnetic field, for instance, if you do, this is work with chai Bruntner if you look at an end terminal situation and you use this land up particular scattering approach you can show that again you would need infinitely many broke terminals so we have found as a function of the number of terminals. So, to come up with a clean realization, I think that's still impossible. You can reach it in certain limits. So Rufazio did work with Michaela Campisi looking at the phase transition at the working substance which exactly is this kind of situation where the fluctuations blow up. But I have no, no, no simple intuition about it. There is space for a very quick question there is anyone. My question is about the non Markovianity. So in this list, the Markovianity, no Markovianity is in the, in the item of stochastic field theory, right. In the items of stochastic field theory, there you are. No, I mean, even while the undetermined dynamics is already non Markovian if you look at it from the perspective of just the position. Okay, so that the diffusion matrix is singular and therefore this proof doesn't work. Okay, okay, thank you. Actually the stochastic field theory I should not mention under the non Markovian case. So let's thank you again for the very nice talk. And we can go to the following. Resultfully online. Okay. My next speaker is Neri Merhoff from Technion. And he's going to speak about work structure that production information driven by the state engines. I want to remember the speaker that he has 25 minutes for his talk and five for discussions. Thank you very much. Can you hear me. Okay. So we can hear. Okay. So I'm not a physicist I'm actually in an electrical engineering department and my field is information theory. So I come to the problem area of information thermodynamics from the side of information theory and I want to discuss some relationships between entropy production and work extraction. And in a model of a finite state engine, which is also a conventional model in electrical engineering and in computer science. So what we talk about information thermodynamics. There has been a series of works around the subject. Some renaissance of the early works by Maxwell and sealed out that I mentioned in a few minutes. But the common theme of all of them is that the presence of information in a physical system. Apparently, seemingly violates the, the second law, because there's an extra term associated with a entropy change of the informational ingredient. And I would say that the research in this problem area can be roughly categorized into two main categories. One of them is about systems with measurement and feedback control. And I've listed here only a few works of this from the last decade and not completely non exhaustive least and also it doesn't include all the works from the last few years. But the point is that we continuously measure the system and create feedback. And the information is in the measurement. And the second category is the systems with so called information was a once in addition to the heat reservoir and the system itself. There we are talking about either on a memory device or a digital tape, where the sequence of bits or symbols sequentially interact with the system and focus on that later on. So going back to the good old Maxwell demon from about 150 years ago, we all know that we take, we take a volume of gas and put a barrier in the middle. And we open the door from left to right, whenever we encounter a speedy one particle, and we open it to the other direction, whatever we see relatively slow particle. And at the end of the process, we have all the speedy particles on one side and all the slow ones on the other side and we created a temperature difference, thereby seemingly violating the second law. We did an experiment of well known Maxwell demon, and about a few, a few decades later, Leo Cesar came up with his engine with a single particle, he put a barrier, you measure, you measure which one in which one of the parts of the system. It appears and then you can extract work and by letting this particle push the barrier, and then you complete the cycle by removing the barrier and looks like you're gaining work without investing any energy. And that was of course an extension of the scene of engine to a situation where the barrier is not necessarily in the middle, and the measurement and why is not necessarily clean, it might be noisy. And then the well known result by a Saga wide widow from 2011 is that the maximum work you can extract is proportional to the mutual information associated with the channel from the real position X to the measurement Y. And several years ago, together with a colleague and a student, we have used this relationship in the context of gambling. In general, the system with measurement feedback is a closed loop, where we have the dynamical system we measure, we look at the state x t time t. There's a noisy channel, by which we take the measurement we get y t, and the basis of y t and it's fast, we create the control and feed it back to the system. And what the the two Japanese physicists Saga wide widow have shown us is that in the well known inequality that relates extraction to work and change in the free energy, there is an additional term, which is proportional to the mutual information, just like I've shown before. And in fact, it is more exact to say that it's not in general it's not quite the mutual information but what is so called directed information which is well known in information theory as a related notion that takes into account also causality. And the interesting point here is that we see here the directed information, you know that here with an arrow in the context of feedback and it's well known in information theory to the capacity of a channel of feedback is in general, given by the directed information, and not the mutual information. So there was an interesting relationship here between information theory and physics. If you use about a decade ago, almost a decade ago, when on paper by Mandel and Joe Zinski, which looked at the system with the second category that I mentioned. There is a informational reserve or information reserve, which is a tape of bits that interact serially with the system in this case, a wheel that lifts a mess. And the mechanism is here that period of length now. This is a work of jump process, where the initial state is exactly the content of the current bit. And there is a transition back and forth from state zero to state one as we see here in the diagram, where state zero I can only make a half cycle counterclockwise. In state one, I can only move a clockwise, so only the blue arrow can lift the mass, while the red arrow causes the mass to descend. The overall work done by the system depends only on the parity of the number of transitions. And they show that overall, the maximum work that you can extract from the system is proportional to the difference between the entropy of the outgoing bits. And I forgot to mention that the final state of each period of length now is recorded on the outgoing tape. So eventually, a zero can turn to one and vice versa. The entropy of the outgoing sequence in red might be different than the incoming sequence. And the difference if an entropy is an upper bound on the work that we can extract or it can be put in form of a second law, but the total entropy, including the informational ingredient, must be non-negative, total entropy change. And the more in the later works by Befner and John Zinski, there's also with a memory device instead of the tape and they show an extended version of the second law where we have a change of entropy of the device plus the change of entropy of the heat reservoir plus the change of entropy in the information reservoir cannot be negative. So, so much for history. I want to talk about the slightly more general model that I developed a couple of years ago. On the basis of some earlier work by Boyd, Mandel, and Crutchby, and I think, and it looks the same spirit we have certain ratchet that is fed by an input tape. There's also a, like before, an output tape of bits written. And there's an internal state variable, which I denote by s of n, which is updated by the mechanism of the system. It interacts with a heat reservoir and also with the machine. We have state transitions and the energy level depends on the height of the mass. We have here two levels of random processes. One is the higher level processes in discrete time. It's the process defined by the incoming bits x of n, the outgoing bits y of n, and the state variable n. The discrete time is in multiples of a single period of length time as before. And this is the higher level process. The lower level process is within each period of length time. It's a Markov jump process with two components, psi of t and sigma of t. And the initial state at each period is x, n, s, n, the current input and state. The final output and the state are recording the final values of the pair x and sigma. The process within each period of length time is a Markov jump process with state transitions according to the energy level, assuming detailed balance. So for this model, Boyd, Bandon and Crutchfield have argued that the asymptotic normalized work, the work divided by n, where n is the number of cycles, is proportional to the entropy rate, namely the normalized. Here I'm denoting, I forgot to mention that I'm denoting the entropies by H, which is information theoretically notation. So the entropy rate of the output process minus the entropy rate of the input process, assuming of course that both input and output are stationary. I have, personally, I have some civil reservation regarding this paper. I'm not going to get into it deeply here in the details. They all appear in the paper. But let me just say that I've re-derived the results for exactly the same model with the following things that I can say about the first of all, the approach is very simple and rigorous and very mild assumptions, only with two essence, and rather than a symbolic result, the results that I have are exact for every number of cycles, capital N, the bounds that I'm getting are relatively simple to calculate and potentially pipe from discussing. And in contrast to the result here, the state variable also will play a role, as you will see shortly. The basic idea comes from the well-known inequality known for Markov processes, that the Kullbach-Leiber divergence between the instantaneous state distribution, P of tau, and the equilibrium distribution is monotonically decreasing in time. Here Kovr and Thomas is just one reference for that, plenty of course. So at every positive time tau, including the final time of each cycle, the divergence is smaller than at the beginning of the cycle, whether it's P naught. So if the equilibrium distribution is the canonical one, and we plug it in this inequality, we obtain right away an inequality concerning the energy change or the work that you can extract. In terms of the difference in entropies, but here it's not the same as before, here these are entropies that involve not only the input and output, but also the state of the system, as you can see, and that's a big difference. And this was for one cycle, when I'm summing up everything over all capital N cycles, we obtain this form, which can also be presented as the production of the conditional entropy production given the state. Namely, it's the entropy production beyond the past of the system, whatever the system remembers from the past is encoded in the state, plus a term which is the entropy production of the state itself. So if the number of states is finite, then the second term is relatively small because it's bounded by a constant, independent of N, while the first term roughly increases proportionality of capital N. So we have this relationship. There are a few comments about it. First of all, it is tight in the sense that I can construct a system which meets the bound arbitrarily closely. The spirit of the idea of quasi static process, I'm going in small steps, so I can get, if the equilibrium distribution is sufficiently close to the initial one, then the ratio between the actual work I can extract and the entropy production in that cycle is very close to one. And if I'm going in small steps, I can get as close as I wish to the to the bound. The details are in the paper. I'm not going to get into it now. A few comments about special inputs. First of all, for a memory less input, where the conditioning on the state is immaterial. The inequality is getting this form we have the sum of differences of unconditional entropies, plus the entropy production of the state minus the mutual information between the state and the output. So we are losing something if the input is memory less, but the machine has memory, then it has a non trivial state. Then we are losing something from possible dependencies between the state and the output. And we can do with memory less inputs, asymptotically, neglecting the entropy production of the state, which is finite. It is best to process memory inputs by memoryless machines, which is quite intuitively appealing. Markov input, we have the pair X and SN is Markov is jointly Markov. And then we can if we if it has a stationary distribution, then we can write the joint distribution of all four variables, where S and S is the current state and S prime is the next state. And we can compute actually everything both the work and the entropy production, just from this joint distribution. This is a lot easier than calculating the difference between entropy rates as was as in the result that I mentioned before, by Mandel, Boyd and the Crutchfield. In the case of a Markov input, the output of the system is a hidden Markov model, and it doesn't even have a closed form expression only bounds. A word about conditional entropy bounds, it turns out that if I define any function of the past, which includes the past inputs, the past outputs, and the past states. Then I can write an inequality of entropy production of conditional entropy or entropies, but it is quite disappointing that the best best choice of this VM or the function fm is simply none, namely conditional nothing, which leads us back to the basic result. However, if we look at it, not as an upper bound on the work, but as a lower bound of entropy production, then we can get a nice lower bound on the output entropy in terms of the input entropy and the work. So if we have any way, if we can compute easily the right hand side, then we have a nice lower bound. Finally, I want to mention, I think I'm almost out of time, right? In the equivalent result, in the more general, the information inequality that I started with, the monotonicity of the divergence for Markov process holds not only for the usual Kulbach-Lieber divergent, but also for f-divergences with any convex function. So here's another set of inequalities I define, which is written somewhat differently. I am looking at the quantity which can be thought of as the negative free energy in units of KT. Okay, so this is the negative free energy at the beginning of the cycle. This is at the end of the cycle. So the negative information, then the negative free energy is increasing, which means that the free energy is decreasing. So this is a one way to present the earlier result that I talked about. But now, when I'm replacing the logarithmic function of the divergence by a general convex function, Q, I'm getting generalized inequalities associated with the free energy before and after the cycle. So for certain specific choices of the convex function Q, I am getting certain interesting inequalities. Two of them are associated with the characteristic function of phi. Well, the parameter Z is in one case in the interval between 0 and 1, and in the other case, outside the interval 0 or 1, and then the direction of the inequality is reversed, which is kind of interesting. So let me conclude. We have seen simple relationships between work and differences in entropy, namely generalized second law. The bounds are potentially tight in the sense that we can approach them if we are willing to make an effort. The memory of the past of the system is encoded in the state, and we have also a term that is associated with the entropy production of the state. Those are relatively easy to calculate. There are examples in the paper. And as a last bonus, we also have a lower bound on the entropy of a hidden mark of one. That's about it. Thank you very much. Thank you for the nice talk. There is space for some questions. Questions are no questions from the audience. So thank you again. And we can go to the next speaker. Yeah, I'm around. Okay, fine. So we have now the final talk of this morning, which is given by Estelle Inak from Perimeter Institute in Canada. So she's going to speak about traditional neural annealing, please. You have 25 minutes and five minutes more for discussion. So let me share my screen. Can you see my screen? Yeah, what we see? Okay, fine. It's okay. Okay, once again, I cannot see myself. Thank you very much for the opportunity to present my work. I would have loved to be interested. I mean, I did my diploma in ICTP and my PhD at CSUN. So it would have been a pleasure to be there, but unfortunately I can't. But I want to take the organizers for the opportunity to present my work. I'm going to talk about variation neural annealing is a work that we put on archive in the beginning of this year and it was recently accepted in machine intelligence. So typically when we talk about annealing, what we have in mind is actually solving optimization problems. And optimization problems occur in both in science and in different places in industry. So here I gave just a couple of examples. The so-called traveling says my problem, which is a problem of finding the shortest path for a traveler going through the number of city routes. Here we have a problem that is typical income chemistry. For example, you have atoms interacting the asset and potential and you look for confusion that minimizes that minimizes free energy. This is called the involving problem, the problem of finding the shortest state of protein. For example, a portfolio optimization problem, how should we invest, say for example, stocks you have in your portfolio. And there are many more of these. So why are these problems interesting, especially for us as physicists, is because they could be cast into a form that we understand, which is the form of classical. Right, where the interaction between the spins really could be in code the optimization. And solving that problem is actually equivalent to finding just the ground state of this. However, from inside from, say, these other systems in the classical system theory, we know that there are some problems that are notably very hard to solve. For example, this other is in glass and the ground state is in cloud. So usually we went to solving optimization problem, we relax our initial aim of finding the exact solution but finding a specific solution. That's where heuristics come into place. Now, it seems like good to visualize this kind of problem and we don't ask if an energy landscape about some sort of configuration right for hard points is very good. Explanation has an explanation large amount of minimal or subtle points, and you need a futuristic method to design a efficient search algorithm to find if not the deepest valley like metastatic value that that goes with this one. So the famous method that was used. Oh, yeah. The famous method I was using in searching for this approximate solution is a so-called simulated annealing method that uses the Mali-activated process to overcome the barrier in the search of the deepest valley. And it's actually inspired by the old mythological technique whereby by making a material say glass more, more solid, you have to heat it up at a high temperature, so that we give enough genetic energy for the atoms to explore different configurations. And then you slowly cool it down so that the atoms themselves in the material range in configuration that minimize the failure to answer. And this kind of concept has been actually implemented on the dedicated hardware. Another project is inspired by quantum mechanics, where instead of using the Mali-activated process to overcome barriers, you could just turn it to them by using this old quantum effect, and that gave rise to another realistic method called quantum. And this has also been realized on dedicated hardware. So typically for this direct machine, for example, which is a quantum annealer, there has been the techniques that have been used to benchmark first of all how quantum it is and whether it does provide some sort of quantum advantage to some optimization terms. Most of the techniques that have been used to benchmark it are actually Monte Carlo methods, and they are a zoo of Monte Carlo methods, both classical Monte Carlo methods and quantum Monte Carlo methods, simulating respectively classical annealing and quantum annealing. But the purpose of these slides is actually to show that or to remind us that initially Monte Carlo methods were designed to simulate equivalent properties of classical system or quantum system, not forced for annealing. And so we took the inspiration from very recent methods that were built for machine learning that were actually also used to simulate equivalent properties of both classical and quantum systems. And these use neural networks. This is a seminar work by Kali and Troy, where they use the supervised systems machine, which is a specific kind of actual network to probe the properties of quantum learning systems. These neural networks, or these methods were based on machine principles, but estimated equivalent properties, and neural networks were used as spatial concepts. So idea is to actually repurpose this kind of methods to actually emulate. So we ask ourselves whether we could repurpose this method to emulate actually the analytic integration of both classical and quantum annealing. So next, I'm going to explain the bedrock of that method, which is based on the so-called Weischer Monte Carlo method. So the VMC method is a quantum Monte Carlo method that is used to simulate properties of quantum system at the zero temperature. It also by having a good guess of what a Gaussian wave function and that will be, and it minimizes the so-called Weischer energy, which is this quantity here. So, irrespective of the answer that you choose to represent the Gaussian wave function of the Hamiltonian, it always, this quantity always goes on to the exact length of the energy. So this is just completed by using the statistical average where the spins are sampled up to the absolute value squared to meet this vision. And so there's no sign to an interest cycle in the VMC. And next, how do you update the parameters of your ANZAC is simply by following the gradient. So this is the exact formula of the derivative of the variational energy with respect to the variational parameter, which is given by this, and you just replace the quantum, the quantification values by the statistical one. So you can choose whatever flavor of atomizers to update your parameters. I just took a cigarette and some went to so-called solution. But also we use other because it was better for me and it was using the flow. So next rich kind of answer that we use, as you guess, we use a channel networks where there are so many different. We chose actually, we chose neural language models. These are models that I use in neural language processing that powerful model that they were to capture correlation between words in natural language. And for example, giving a word is able to present what it makes for the case. And there are so many application, one of which I believe all of us, most of us are familiar with, is synthesization. For example, I'm sure when writing your email, you've seen like at a Gmail, so they're seeing how to complete this interface. It's because they are powerful neural networks behind that affluent correlation between words in English or whatever French what they're using. A high level explanation of how this could be done before going on a neural network. And that say you're writing an email, you have a name, and you're writing it on the subject of an email. You give it to an iron and cell and it predicts what the first word of your email is going to be in this field. So in this sense it has learned what is a conditional code, this is distribution of sample with first word. And it uses that first word as an input to the sample next word, and so on and so forth. So here it does learn the conditional probability solution. So if you keep on going, you've learned a bunch of conditional probability solution, and if you multiply them with the general probability solution, you have basically learned the probability to sample the first sentence of your email given an input. So how does it relate with also wanting to solve optimization problems? So today we want to sample a spin configuration which hopefully will be the result see of an optimization problem. We could use the same sampling which is an autoregressive sampling to literally sample each spin of our configuration one at a time, and using again the general probability solution actually obtain the joint probability solution of sampling as a spin configuration, which is very interesting in a way of functioning just to do square root of it. So this is how we can use this powerful neural networks that are using in natural language to actually encode the state of the quantum system. The h here is called the hidden state encodes information about the spin, I'll give you more detail in a little bit later. And the way in which this autoregressive sampling is a question. So the way in which the samples are generated, they are generated exactly as node to correlation time compared to if you have to build a mark of change, for example, you always have to go to the correlation time. And for us, it's very interesting in solving like these other systems of systems, for example, I mean we imagine that into question time we mark of change really for very long time, so we kind of avoid this by using this, this kind of assets. So this is one of the reasons why for example, we didn't do something like the circuit machine conversion network. Another reason is that by construction, the way function will probably see this is normalized to unity and as we see it will be very useful to complete some quantities that we otherwise be possible. All right, so next I will tell you what this ironing service that we use in our work. I will use the so-called denserized ironing cell which has this recurrent form and this is kind of the architecture that we use. Here the T is a denser, which is that's fine. It doesn't need to optimize on the train. Beta is also variable that we need to train. And this is the traditional recurrent relation, let's say for the value and where we have the recurrence here. So the reason why we choose this is a little bit inspired by tensor network. So yes, is that here we have a local hyperspace that interact with the hidden state in a very compact way. And also we have this non-mereactivation function whereas in the vanilla ironing you just have a concatenation of them and they only interact with some sort of non-mereactivation function. So we saw that, I mean that this representation is more compact and it has a higher representation of power than just using vanilla ironing. And the check that we compare with its simulations on the point is in chain. And we look at the variance of the local energy, the low 80s, the more accurate your results are. And we compare by nine and respect the tensor, I mean at the critical point and we saw that it is better result. And we also benchmark with some more sophisticated version of RNAs like dry use for those who are familiar with STN and we also find tensor ironings. So next for randomism chains or for systems, local problems that we look at that have this order. For example, we saw that using the so-called no weight sharing that at each lattice point that we generate is spin, we use a different weight embarrasses. And basically we saw that it is, I mean it's way much more accurate than sharing weights across lattice sizes, what's typically done for example in machine learning. So we use no weight sharing for this other system. And the language interacting system like the so-called sharing, sharing the key parts spin glass we use the most sophisticated network architecture that use skip connection they were introduced in machine learning to deal with definition gradient problem. But yeah, we saw that it is accurate to capture the language permission that we have in our program. All right, so next I talk, I need to go faster. So I talk about the quantum annealing algorithm. So what we do is that we want to provide a system to an onion that is easy to prepare a spin point the excitation initialize the RNN with random weights embarrasses and to prepare the system we just do a gradient descent step. And yeah, we learn on the initial ground state right. And next we do an in step that is we change the transfer field, we use it and we introduce a point and that makes us to shoot out of the instant grass state so we need to fall back on the instant grass state. So we have to prevent some gradient descent step. And then we should we are again on the instant grass state so we change again the onion and fall back again on the instant grass and so on and so forth. We reduce the quanta fluctuations so that at the end when all of them are removed we expect to find ourselves in the ground state of deployment. And we came up with a theory that there's also the normal mouse. So if you know most gauges instead of the we need to remain a diabetic that is bounded by the, but the universal we got minimal gap and even squared. The cost function is different is the version of free energy, which is an expectation via my point on our version of state and then we have a time dependent temperature yes linear with this von Neumann entropy. So this von Neumann entropy is estimated with the iron and if the iron was not normalized it won't have been possible to estimate this thing efficiently. So that's one reason why we chose to receive. So next I will present some results on the random is in chain. This is the Hamiltonian we look at this quantity, the result energy which is a different between the expectation value of this from the end of a million minus the exact one for different system size. At the end of the annealing we generate a million samples with the network so we look at 25 years of our realizations. And we see that when you look at it, we see respect to the number of an instep at short and mean instead we mean she's like red link kind of quenching I don't think we're quenching. We see that quantification quantum annealing is superior but at long and any times where we are more adiabatic, we see that this is where the victory and this is in contrast to what was observed this paper where they actually simulated annealing by using Monte Carlo. And they saw that. Wait, so they simulated annealing both classical and quantum annealing now they use. They use a master equation for classical annealing and it did the exact simulation using book of the digital equations for quantum and so not Monte Carlo. Yeah, and yeah for but for the simulations quantum annealing was pure to classic and which is contrary for us here. And basically for them, simulated annealing and also coring quantum annealing the real time dynamics had a logarithmic scaling on this program but for us we have like a preliminary scaling so we do see an advantage of this kind of neural annealing procedure. So next we have a result on the two days on this on spin glass we did use a different RNN auto is something that encodes kind of a snake like something that encodes like the two the architecture of the spin glass. And we saw that here again VCA is superior to to figure that the blue data is brought to which data to the green data here. Another interesting interesting thing is that when we decide we wanted to benchmark whether annealing was actually relevant here. So we directly optimize like this program with Tonya. And this red data just respect the number of gradient descent step and we see that as we do train the neural network, the reason it decreases but it's out of magnitude very far away from from what we do with with with an annealing party party. So we do see that doing annealing is important here. Since VCA was superior to VQA we benchmark it against traditional heuristic simulated annealing simulated quantum annealing. And we saw that at long and in time VCA can be like up to a hundred like here is three other of magnitude superior to to both submitted quantum annealing on 1600 experience. And next, you have the shape of key project model and also here we see that VCA is constantly consistently superior to simulated and simulated quantum annealing. We did for the check we look at the last data points, we drew an histogram of the solutions we have 25 data points here. And we see that VCA also find solution more often than simulated quantum annealing and SA that mostly that stop often to local minima. And when we look at the probability of success for each these other instance which is the ratio of the number of spin configuration that actually find a constant respect to the number of configuration that we had. We see that, except for one instance VCA finds all the solution compared to the other ones. So yes, with this I conclude that you've seen that an insertion version of spin seems to have performed such a complex space that VCA is more efficient than the version and emission of quantum annealing and it's more efficient than the traditional simulated and simulated quantum annealing and we advocate that it could be good candidate to solve the optimization crimes. Yeah, we've seen that one network powerful and that for VC and VQA and they tend to depend on the kind of work we are doing and that we have the architecture. And with this, I would like to thank my collaborators, especially Mohammed who is one of us designed all the networks. And I'd be happy to take any questions if you are. Thank you very much Estelle. A special, special thanks to these last speaker also because she accepted so kindly to give a seminar at 6am local time in Canada. So, it's not it's quite an early time. Are there questions. Hi Estelle, I'm Andrea. Hi. I have a question probably you mentioned it but I'm not sure I got it so view as you are saying VCA is more efficient than other methods. Because I mean if you if I mean when you, I mean computationally speaking. How is it doing so I mean you can determine better the the ground state energy or whatever but I mean if I mean in terms of cost. Oh, that's a good question. In terms of computational cost is actually is actually cost effective compared to rational quantum annealing is running faster. I know I has a bit of skating as well, but compared to simulated annealing simulated annealing runs very fast, like maybe in practice. How do I say, maybe 100 times faster but the point is that even if you run it forever for example here if you keep running it you are not going to. So, yeah the computational cost of the cost of each run. Is it the same or one requires much more computation than the other so that you know. So, SA runs very fast, for sure. But the point is that even if you run it, so the scaling is what matters right so even if you run it for long you are not going to go anywhere right you will still be. I mean stock in local minimum for example for here. Right, but it is so. variational classical annealing and quantum annealing we run it on GPUs because yeah you have to use neural networks and we have to use. I mean all the in builds advantage of being able to do, for example, back propagation in an efficient way. So, simulated annealing submitted quantum annealing we run it on just one single laptop. So, okay. Okay. Thank you. Are there other questions. If not, we thank gain Estelle for this kind talk. Nice talk and all the speakers of this morning session. Let me. That people in presence are probably supposed to have the photo group photo now. Okay. Okay. Okay. Okay. Okay. Okay. Okay. Okay. Okay. Okay. Okay. Okay. Okay. Okay. Okay. Okay.