 Thumb up, so I think we can start So good afternoon everybody and welcome to the first afternoon session of this conference So my name is Valentina and I will have the pleasure to Chair this session that is a bit of a special session. So let's also thank the organizers for Hosting it that is called the quantum session and that as you will see will have perhaps a bit more of a physics flavor With respect to the other talks of the conference. So we are gonna have three talks Today, so the first one is more about machine learning as you see so introduction to reinforcement learning and perhaps applications to problems of quantum control And then we will have two talks after after this one, which maybe have to do more with High-dimensional statistics in some sense. So there we will we will hear a lot about concepts of Free probability and how this structure emerges when you want to compute higher-order correlations between matrix elements of Objects which will emerge from application to quantum problems. So we will see examples which come Both from out of equilibrium quantum physics and from problems of equilibrium quantum physics and and thermalization So I didn't ask permission to the speakers But I think they will be happy to be interrupted during the talks. So if there are questions comments, please Feel free to ask. I think the more this is interactive and the better it is for For everybody So having said this we can introduce Our first speaker of today who is my in book of who is research group leader at the Max Planck Institute in Dresden Max Planck Institute for the physics of complex systems And as I said he will give us an introduction to reinforcement learning and applications to Problems of quantum control For a nice introduction. I hope you guys can all hear me. Well, okay Yeah, so anytime you have a question just raise your hand to just shout out loud It's better to you know have the questions answered than me clicking through a bunch of slides So what I decided to tell you about today is about what we do in my group And how we use their enforcement learning for quantum control and as I was preparing these slides I realized that many of you will probably not be familiar with their enforcement learning nor with quantum control And so I decided to turn the whole Presentation into more like a fun introduction to this and so it's going to be like more like a couple of definitions And then how we do things there and then towards the end there will be also the examples Okay So as the title suggests I need to explain to you what reinforcement learning is and what quantum control is So we will start with With a definition of reinforcement learning and kind of a you know intuitive definition how one could think about it If one wanted to then we'll go a little bit more formal and I'll try to set up the reinforcement learning framework So we'll define the mathematical framework behind it We'll discuss one set of algorithms a policy gradient algorithms because they're very easy and I think at least intuitive to understand by Physicists and then in the last part of the talk we'll talk about applications for enforcement learning to qubit control Which is the simplest of quantum systems. You can imagine Okay So as you I'm sure are familiar with there are three main branches of modern machine learning on the one hand They're supervised learning on the other hand There's unsupervised learning and then there's reinforcement learning and all these three branches are essentially Thought of as being on equal footing. So let me just recap for you what we mean by those Let me start with supervised learning So supervised learning can be defined intuitively as learning from examples or learning from labeled data And there are two main branches of supervised learning. There are classification tasks or classification problems and regression problems And maybe the most familiar example that I'm sure everyone here has heard of is the amnest digit classification problem So you're given a set of handwritten digits and the idea is to construct a machine learning model Which learns to classify these images. So that's an example of supervised learning problem In many body physics supervised learning is actually used to classify phases of matter Or to determine to determine critical points Based on on examples So if you go back to your statistical mechanics class and you recall what the ising model is you may remember that in two dimensions The ising model has a critical point as a function of temperature where if temperature is low This is here. What's on the x-axis? So this is temperature if temperature is low then the states are magnetized So this is supposed to be like an ising configuration everything is basically magnetized Whereas when the temperature is very high then the system is disordered and so what people have done using using supervised learning is to teach a neural network to classify these images based on Training data taken deep in one and deep in the other phase and then try to extrapolate the position of the critical point as shown here So that's one example of how we use supervised learning in many body physics Now let's go to unsupervised learning unsupervised learning is about learning the structure of unlabeled data Or in other words learning the distribution of the data and one Specific example that I want to highlight here in quantum physics are the so-called neural quantum states Neural quantum states the idea is very simple So recall from your quantum mechanics one lecture the so-called variational principle in the variational principle Parameterized the wave function using some unknown parameters data For instance, it can be a Gaussian wave function or an exponential wave function whatsoever And then you have a Hamiltonian and what you do is you compute the energy as a function of these parameters data And then you minimize this energy determining the optimal value of these parameters So that's the basics behind the variational approach now. What does it have to do with? Machine learning well the idea is now that you can use a more complicated Ansatz for the wave function instead of using just, you know a simple Gaussian or another simple function that you can come up with you Can use an actual neural network so now the weights and the biases of these neural network are actually the variational parameters and what you do again Sorry, this is not really displayed properly But what you do it eventually is you minimize the energy in doing so you are finding the optimal weights and biases of your neural network So this is how you're doing your Optimization but of course this can be done not just to find ground states It can also be done To time evolve states in particular when you have complicated many body systems with many degrees of freedom having such an expressive fans That's is actually helpful and why is this unsupervised learning? Well, because the way it works is you're actually trying to learn the wave function Which is kind of you know the square of it will be a proper distribution, right? So you're trying to learn distribution and the learning process you actually sample from this distribution Okay, so all of this is very nice But this is actually not what I want to talk about today instead I want to talk about reinforcement learning and the enforcement learning is different from supervised and unsupervised learning because reinforcement learning is learning from experience now if you've never heard about reinforcement learning before and You know, I had to explain it to you in just one sentence Then I would say that reinforcement learning is the mathematically formal way to teach your dog to sit So how do you teach your dog to sit? You know, you say sit 50 times and maybe by chance once or twice out of these 50 times the dog the dog actually sits Right, but when it does so you actually give it a treat, right? so you reinforce a certain behavior and This is actually the main concept the main idea behind reinforcement learning and I'll show you in a couple of slides You know, the rest of it is basically formal mathematics. So how to make it work? In in in more scientific way So let me then go a little bit more into Examples of reinforcement learning and what can be done with it and it's in particular what has been done in the last couple of years That was that attracted the attention of other people and also myself So I want to show an example of a Game of a video game and an agent that was trained by Google DeepMind personally the references cut out It's here in the bottom So what Google did is they took this game of breakout and then they tried to teach a Reinforcement learning agent to play the game as best as as it can How many of you are familiar with this game? Okay, the majority of you so maybe I should very briefly explain For those of you who haven't seen it So you have a you have a pad you have a ball and then you have a brick wall here And the idea is to basically with the pad you catch the ball and then the ball can bounce off the pad It can also bounce off the walls and whenever the ball hits a brick then it destroys a brick and you get a point Right and so the idea is you know you destroy all the bricks You get all the points you maximize the score and then you win the game or you pass to the next level However, it can also happen that the ball falls below this horizon here And when it falls below the horizon then it's game over and then you have to start over again So what Google DeepMind did was they taught their enforcement learning agent to play this game except it did They didn't really tell it at all what it's doing, right? So the agent has no information about you know, what is a ball? What is a what is a pad? What is a brick any any of these were not given the agent was only given at the pixels from the screen, right? And then it tries through this trial and error approach To become better and better. Let me show you that the movie of this training Process as you can see in the beginning the agent is fairly dumb right like essentially it always misses But if you pay attention, you'll see that every now and then it's actually gonna hit the ball twice in a row And this means that there's actually some correlation that you know this learning algorithm picks up So if you now train it for about two hours, you'll see that already place like an expert So it never misses basically at this level, you know You would say that you've achieved everything you want you just you know wait long enough And then it's going to go to the next level But if you actually keep training it for a little bit longer say for two more hours Then you'll see that you'll find a fairly interesting solution one solution that those of you who actually played the game may might know But a solution that is a little bit non-trivial. So It's gonna punch a hole through the wall as you see and then it's gonna get the The maximum amount of score in the shortest amount of time. I'm losing my pointer for whatever reason Okay, so that's the you know the basic idea behind reinforcement learning now as you see this You know and you're maybe a quantum physicist what you ask yourself is can you actually make use of this? To actually solve some physics problem and this is what I want to be talking about today But before we get there just the following year there was another breakthrough So some of you might say well games are kind of fun But you know, they're not that scientific in some sense, right? But it turns out that you can actually use reinforcement learning to find very comparable strategies in very complicated board games So this is the game of go Where Google DeepMind agent ultimately defeated the world human champion if you don't know what go is you can just think of chess So this is a board game with exponentially many configurations and there's actually a strategy. That's not so Easy to find especially when you're playing against a human But still you can say okay still games, right? We want to do science If you fast forward a couple of years, this is a work from last year Where the these guys managed to train the reinforcement learning agent to control the dynamics of a plasma inside a tokamak So this is already going more the direction a physical system The interesting thing here is that you have a very complex physical system And what you want to do is and you don't know exactly how it behaves like you can maybe try to model it writing down some Equations of motion for the plasma dynamics But overall, you know, we know that this is an unsolved problem and they actually trained an agent to Hold the plasma for as long as as it can I want to say right away that this has been done in a simulator. So this has not yet been, you know, deployed in in, you know in in an actual Tokamak, but you know, who knows maybe in the next 10 years. They're actually gonna be able to do that But this is a session on quantum physics and we want to go quantum And what I want to in the end get to towards the end of the talk is applications of reinforcement learning for quantum technologies And just in case I don't get there. I just want to list them here So the most intuitive application that you can think of is quantum control So reinforcement learning is trying to manipulate is trying to find strategies is ultimately trying to control Systems right and if we have quantum systems, the question is how do we control them? But it can also be used for quantum error correction. This is something that's at the core of Reliable quantum computing. This is something that we don't have yet But this, you know, a very desirable desirable feature that if you would get one day Then you would actually have like a proper quantum computer that you can exploit in the same way that you have classical quantum Classical computers so quantum error correction is an important unsolved problem and people are using reinforcement learning for it One can also use it for quantum gate design. So again, you know part of quantum Quantum physics or quantum research these days is how to design Gates so these are operations on the quantum computer and usually what you have to do is you have to apply some pulses there You can use the reinforcement learning to find the optimal form of these pulses a functional form of these pulses And last but not least you can also use it to design these quantum circuits So basically to design how to interact with with such a quantum device Okay, any questions so far No, so let's then move. So when should you consider using reinforcement learning, right? So let's say that you have a problem sitting in front of you on your desk And then you're thinking about it and you say should I use reinforcement learning there or not? Well, I just want to mention three The main advantages that reinforcement learning can give you the first one is that it is model free What this means is that it doesn't require the concept of a model the same way that that agent was playing without knowing What a ball is and what a brick is etc. It's capable through this trial and error to actually Extract relevant information for solving the task It doesn't mean that it will learn what the ball is in the process Maybe it doesn't have to learn what the ball is right? Maybe you can just solve the task without that but it doesn't require any concept So it's model three second It's adaptive and what this means is that you can train your agent on one system But then you know there's the agent afterwards that's left over out of this process and you can apply it You know to another system and see how well it's going to do and if the two systems are not completely different Then there's a good chance that you'll actually know to do something with it and yeah This is also kind of quite intuitive, you know, maybe You know if you know how to drive a car you can also learn how to drive a truck fairly easily But you won't be able to fly a plane, right? So these are essentially the the ideas and glass but not least It's autonomous and this is particularly appealing if you're an experimentalist because then you can basically run the reinforcement learning in your Experiment and you can go for a coffee, right? You don't have to worry about that. Okay So with this I'm coming to the end of the intro and now I want to go a little bit more into the mathematical details of how Reinforcement learning works. Okay, so if something now becomes unclear then you should stop and you should you should ask So overall in the reinforcement learning We have an agent that learns how to solve a task by interact by interaction with its environment So what the agent does? Is it takes an action out of some available actions think of you know the position of the pad, right? So it can move left right or let it stay Upon this action the environment will change the state So in this case the state of the environment is essentially the pixels of the screen And if you want to be a little bit pedantic then you have to take the pixels of the screen at two times slices But this is a detail we can discuss offline if you're interested in so The state will change right the pixels will move somehow and then there's a reward That's fed back to the agent right so what the agent can do is it can observe the state But it also gets a reward and the reward is whether you know you destroy the brick or not whether you got a point or you didn't get a point and Based on this information the agent is now going to choose another action right and so this thing repeats over and over a game Until the game comes to an end and the game can come to an end if you win or it can come to an end If the if the ball falls below the horizon right at this stage the game restarts and you start training over and over again So this is the so-called reinforcement learning feedback loop So it's a feedback loop optimization framework Okay, so a little bit more rigorously We need to define in order to cast a problem into a reinforcement learning framework We need to define an action space a state space and a reward space So the action space is the set of the available actions in this case This is you know move left move right or state then there's the state space This is the set of pixelized images of the screen and then there's the reward and the reward is plus one if you destroy a brick over here and zero if you don't and Reinforcement learning in its core is a Markov decision process So I want to recall for you what the Markov process is first I'm sure that you've seen somewhere in your undergraduate course It's like such a Markov chain where you have states here at a given time step So there's time steps t t plus one t plus two and then there states s t s t plus one and s t plus two and A Markov chain is a chain where you transition with the probability p From state s t to state s t plus one and here is an important thing now The Markov property means that this transition probability depends only on the previous state right and not on the history Now reinforcement learning is a bit more than a Markov process It's a Markov decision process and what this means is that the this transition probability will depend not just on the previous State but also on the action chosen, please So this is actually a very important problem in reinforcement learning like formally in this case You cannot just apply reinforcement learning without a reward It just won't work But there is something which is called meta learning and which people are Looking into is just very difficult. They are essentially trying to learn what is there to be learned? That's why it's called meta learning. I don't know to be honest What the state of the art is right now in that field, but I know that it's it's being considered But it's very difficult So you need to know at least what you want to do any other questions. Yes Okay, thanks Yes, so one thing that you should know about this transition probability is that it contains basically the loss of physics So this is written in stone, you know, you can't change the loss of physics, right? So this is always there what you can change though is how you select the actions So there's another probability here, which is noted by pi and that's the probability to take an action a being in the state s and This probability is also known as policy or strategy if you find it more intuitive And this is the actual object of reinforcement learning So reinforcement learning is about learning the policy learning the probability pi to take an action a from the state sd Okay, and then of course, you know once you transition to the state sd plus one there is also the reward RT plus one that is given there. All right So the objective now in reinforcement learning is to find that policy you want to you know become better and better at playing the game So you want to be taking better and better actions You want to find the policy which maximizes the total expected return? So this is the sum of all rewards that you got along the trajectories averaged over many trajectories that you played That's your reinforcement learning objective Okay, so now we know what reinforcement learning is let's see how to construct an algorithm and I'll show you something that hopefully will be quite intuitive So again, here's the objective what we want is we want to evaluate this expectation value But of course, this is expectation value over over these trajectories here that we just So right and you know, there's infinitely many or exponentially many trajectories So you cannot compute this expectation value exactly and instead what you do is you sample trajectories and then you approximate this Expectation value. So that's the first thing that we can do Then in order to improve the policy what we can think of first is we can parameterize it using some variational parameters data So that's similar to what I showed you a couple of slides Back and now that we vary now that we parameterize the policy in order to improve the Total expected return all we need to do is we need to figure out the gradients with respect to these parameters data Right, so I have the policy I parameterize it by by some parameters data and then I want to compute the gradient and this is an expression here I'm showing you how you can estimate the gradient from Trajectories here on the on the right and the last step is very intuitive. You just want to do gradient Ascent so in this case I'm maximizing the return That's why I'm not going down the hill, but I am going up the hill in this parameter space So this is the the theta space Okay, so this is very very in cute if you're basically climbing up the hill with this algorithm So this is known as the policy gradient algorithm And this is just the very very basic the most basic version of it They're quite a bit more sophisticated versions that you're if you're interested in you can you can come ask me But that's essentially the simplest thing you can you can write down and that's you know Or a more sophisticated version of this algorithm was used for the plasma control problem Okay But we want to as I said again We want to discuss examples to quantum physics and now I want to show you how one can use this policy gradient algorithm in order to control a single qubit so for those of you who are not doing research in in Quantum technologies or quantum in general a qubit is just a two-level system So that's the simplest possible quantum system that I hope many or if not all of you have seen in quantum mechanics one So you have a two-level system You know the Hilbert space has two states zero and one and what I'm showing you here is essentially a representation of the Of the Hilbert space is so-called block sphere so any point on the block sphere corresponds to one quantum state and Controlling quantum states of a single qubit means essentially moving arrows on on the sphere Okay, now how can you move? You know these arrows on the sphere well you have to rotate them right and rotations in quantum mechanics It are basically given by these gates So these are Exponentials of Pauli matrices as of some fixed angle or time step delta T And you can imagine that you can do rotations about the X Y and Z axis And in this case, I'm going to keep the rotation angle delta T fixed but but small and What you want now Sorry, I yeah, I forgot to say what you want now from this problem is you want an agent that will Initialize your qubit. So let's say that the qubit is initialized with state zero That's the blue arrow. So you want that you know you start your you know quantum device You'll find it in some state pointing along any of these you know directions given by the black arrows And you want your agent to look at the state and then eventually apply these gates such that in the end you get to the To the target state so you want to rotate, you know in a few steps So these are the steps of the game right such that in the end your state points up Now to do that we have to you know frame the problem using reinforcement learning and the first thing We want is we want to define the states So as I already mentioned any state of a two-level system can be parameterized by a vector on the sphere And we know that the vector on the sphere in spherical coordinates has two angles theta and Phi Ending, you know in terms of the complex Well two-dimensional complex Huber space this parameterization looks like that So what this means is that the reinforcement learning state space here? Which is you know this block sphere is the tuples is a space of tuples of angles theta and Phi Okay, on the sphere Next we want to define the actions The reinforcement learning actions are given by these gates So the three rotations about x y and z axis, but also I want to give it one more action Which is the identity so basically do nothing the reason for this is you know if it happens to get to the target state already I wanted to basically be able to stay there and not keep you know drifting away Okay, and every action is applied On the states by just applying the unitary on the corresponding quantum state psi out of this You get a new state psi prime and using this prescription up here You can compute the corresponding s prime so the new angles theta prime and and Phi prime and Finally we want the reward. So what do we want to do? We want to bring the state, you know to the north pole So all you can do is in this case all you need to do is measure angles between the states, right? And you know you fix, you know a reference state the target state in the north pole And if you're anywhere else, you know any other state will have an angle with that state what you want is you want to minimize that angle and Minimizing this angle is the same thing as maximizing the overlap with the targets state squared, right at every given step Your time evolved physical state qubit state psi t will have an overlap with with your target state And you want to maximize this overlap. That's the same thing when I say maximizing the reward Okay, and you know we have to choose an algorithm and in this case We're gonna be using the policy gradient algorithm that that I just I just defined Okay, but the so the reward here is given at every step But of course you can consider situations where it's given only in the end This was the case in the in the game of chess. You can only give a reward once, you know, whether you win or not Okay But there's a problem here and the problem is that you know the state space can be or in this case is actually Continuous, right? So there are continuously many different arrows on on that sphere And if you want to do learning you have to figure out how to control from any of these, right? So you cannot just list, you know make a table so discretize right and then make a table of your sphere and then learn everything For these states in the table. You actually want to know also be able to act from states in between And the question, you know balls down to the fact that the state space has either exponentially many configurations or It has you know continue or it is continuous And what you want is you want to estimate the values of not yet encountered states and you know, you guys know machine learning So the way to do this is essentially to define this variational approximation So again, we come back to this variational approximation to the policy Remember the policy is the strategy with which I choose actions. So there are these variational parameters data And very often in machine learning these parameters data are just the weights and the biases of some neural network And here comes a lot of intuition that you might have about the problem Whether you want to use a fully connected your neural network, whether you want to use maybe a convolutional neural network Maybe you want to use a graph neural network Transformer anything right in principle can be put in there not anything will work equally well, right like Problems as you know in from physics have symmetries and then you better make sure that your neural network architecture base these symmetries and so in the end What you're having for your agent or for your policy is a neural network where you plug in two numbers These are you know, this is actually the the reinforcement learning state or the values of the angle state and phi and what you get out of it is you know a softmax layer if you wish or a Probability for taking every one of the of the four available actions. Okay, so you're learning essentially probability distribution Yeah, so this is a toy problem, right? I'm not saying that this is a difficult problem to solve Maybe I should have I should have mentioned this. This is just to yeah Yeah, so usually problems are more more complicated than this. They might have noise in in them You might consider multiple qubits that interact somehow with one another or with an environment. Yeah, yeah This is just for illustration any other questions So there are different ways so That's actually a very good question because Generically quantum states are not measurable, right? So you cannot measure quantum states But if you actually have a single cubic you can measure the x y and z Pauli x Pauli y and Pauli z operators and if you know them, then you actually know your state yeah, but It would be more appropriate to design a framework that doesn't really know about your state And we can talk about it afterwards. There are ways to do this. I just don't want to go into into too many But this is this is a very important point a similar point by the way is Whether you can get the rewarded intermediate steps because getting rewards and intermediate step means that you have to collapse your state And once you collapse your state you basically lose it you have to start over again So usually in these problems we give the reward in the end. All right Yeah, and then the last thing as I said is you know, you have this neural network And then you just use your favorite gradient Or all to the package to compute the gradients and update the parameters using this policy gradient update that that we derived earlier and yeah, and then that's it You basically keep repeating this over and over again So there is a Jupiter notebook that I prepared Some time ago if you guys are interested you can just you know check it out on on on github So this is a code that uses jacks where I've basically coded up this policy gradient algorithm for this specific problem that That that I show you Okay So now in the last ten Five minutes. I can talk a little bit about other applications ten minutes. Okay. That's just about right a free enforcement learning to quantum physics So let me show you a related but slightly different version of a quantum control problem But the idea is basically very simple So what you have is you have a single cubit again? Which in this case you always initialize in the green state? And what you want is you want to transfer the population into the red state into the so-called target state? And you want to transfer the population by following time evolution? by using time evolution following the Hamiltonian H of t Which has a constant energy splitting Z and an external Magnetic field H. So what you want to do here is your want to find the optimal external field H Say a magnetic field and then as you vary the field your System is going to evolve according to this Hamiltonian And you want to evolve the system in such a way that you transfer the population from the green arrow to the red arrow what you measure again is the overlap squared so the fidelity of being in the target state and Okay, so in principle you can use any time-dependent function H of t But for simplicity we decided to look at the smaller space of function So we're going to look at the piecewise constant protocols. So these protocols take only two values The values can be either plus one or minus one. I'm sure there's something wrong here with the image But basically there's you know here on the x-axis. There's there's a time axis Then you discretize the time axis into some fixed time steps delta t and then at every time step On the y-axis here the blue curve is actually such an example of a piecewise constant protocol H And this protocol can take only two values positive four or negative four And then on every step your agent has to decide, you know, whether to take a positive or negative value If you think about it for a second There if you have like any of those steps then they're two to the end so exponentially many of these sequences So it's not so easy to actually find find the right one But what I want to show you now is a movie of the learning process that we have of this agent And so what you see here is, you know, how the system evolves with an agent that doesn't know anything You see these zigzaggy curves these zigzaggy blue curves come because you know, the control field is is piecewise constant, right? It turns out that this is not that crucial. So out of the space of all piecewise constant protocols There are also optimal ones. There's something which is called Pontiagin's maximum principle, which essentially guarantees this But the important thing or the important parameter here is in the lower left corner Which is the number of training episodes you see after about five training episodes It reaches a reward of about point ninety point nine one or point nine four and then it progressively becomes better so it improves over time and you'll see that In a second that it requires, you know, a number of training episodes Of course this training happens much faster than than than the movie that I'm showing here, right? But I think after about like 14,000 or 50 or 15,000 episodes It's actually going to find, you know, the optimal solution and here already after 10,000 episodes Let's say it finds something very interesting. It brings the state. It learns to bring the state to the equator That's the optimal way by the way It brings to learn to it learns to bring the state of the equator for as long as optimal and then Eventually departs direction the target state and it's actually interesting because the equator is the only geodesic about the z axis on the sphere And as we know geodesics are curves of shortest distance So if you want to go from, you know, one point on the sphere to another point on the sphere Then you'd better find, you know, the shortest path. What is interesting though is, you know With this definition of piecewise constant fields, you cannot turn off the X field, right? So you cannot turn off the X field and say, let me just drift along along the geodesic So what it has to learn to do and this is what it does is it does this, you know These banks up and down Eventually to on average cancel the field so that it can keep it on the equator. So that's basically what what it learns here That's interesting thing Okay, I want to give a few other examples of applications of reinforcement learning this time less detailed so this is an example by a group in Erlangen where they consider the system of Three qubits shown here in blue exposed to a noisy environment And then there's a fourth unseal ancillary qubit red, which is a cubit that they can measure So this goes back to, you know, what do you do like with with these measurements? So what their agent was doing is their agent was measuring the state of this unseal and then based on a Similar policy gradient type algorithm. It was learning how to control the rest of the of the blue Cubits in such a way that Their state is prevented from the coherent or protected from from the coherence So that's the idea and what you see here on the right is essentially such a sequence And it turns out that these sequences are essentially error correction code So here the agent without any knowledge comes up with some very simple though, but an error correcting code And there are other examples more, you know sophisticated if someone has heard the buzzwords of You know the Tory code or the surface code like one can play these games there as well But another application That I want to show is the design of quantum gates So as I mentioned quantum gates, which usually look something like these rectangles Are the basic building blocks of quantum circuits that this is how you operate, you know with quantum computing devices And in particular a very interesting gate here is a so-called cross resonance gate because this is a gate that brings in quantum entanglement And it's the quantum entanglement that would one day hopefully give the advantage of a quantum computer compared to a classical one and What you do in theory is you can you know model your cubits and then you can you know come up with this theoretical Electromagnetic pulses that they actually apply on their physical cubits in the lab in order to realize Physically the cross resonance gate and then what these guys have done Is they used reinforcement learning in order to to improve on the theoretical Curves and indeed, you know the agent gave them like this weird zigzaggy Pulses and it was not clear, you know what these zigzaggy pulses are doing But then if they actually compare the error rate or the so-called infidelity, so the smaller this number the better The gate is then the theoretical one on you know current IDM devices has an infidelity about 10 to negative 2 and you get About almost an order of magnitude better using reinforcement learning if you are able to get just one more order of magnet Here's a 10 to negative 4 then you would reach this fault tolerance limit These devices and last the last example is the design of variational quantum circuits So these are now, you know, this is what you do with these gates now You basically stuck them, you know next to each other in the form of a circuit And you can use their enforcement learning in order to optimize these circuits variationally in two ways And the one way what you do is you try to find the continuous value of this Gate parameter this gate angle or time if you wish But you keep the structure of the of the circuit fixed So as you can see it kind of repeats But you could what you can also do is you can actually try to find the optimal structure and find the angles So basically there's a combination of a discrete and continuous optimization problem Basically how to how to order these gates in an optimal way because no one tells you that this sequence here has to be the Optimal one there might be better ones and you can use you know reinforcement learning and Monte Carlo three searches and all these other Types of machine learning algorithms that you guys are familiar with in order to do this All right with this I'd like to come To the last slide So I just want to mention for those of you who are interested in learning a little bit more about how this works If you're not familiar with first of all, you know on deep learning. There's a very nice already I think maybe outdated but still interesting from the perspective of a physicist book About neural networks by Michael Nielsen then the standard textbook on their enforcement learning is the Sutton and Bartow And then I would also recommend the set of online lectures by Sergei Levin at UC Berkeley And I also gave a lecture course in Sophia a couple of years ago, and you can find here the URL Yeah All right with that I'd like to thank you for the attention There's one Yes So okay as you know was already pointed out, you know this simple problem that I showed Is you know too simple so in that case if you tune the parameter regime There's a single solution and then you better find that solution right but what we actually did in that study is we varied for instance The duration of these control protocol so the duration of the sequence and what you observe there quite interestingly is that you can actually have a series of phase transitions as you vary the This physical parameter so what happens is the optimization landscape Reorganizes and it can actually change its structure completely and you can go from a case where you have you know a single solution to a case Where you have many solutions that are almost the same but not quite so something that something like it's a rugged landscape For the single cubit. It's not quite a glass But if you actually put a few more cubits together you let them talk to each other Then it becomes as complicated as a spin glass and you can actually show that It will go to a local minimum and you have no guarantees of getting the global minimum. This is no magic here Yes, people have explored the so-called quantum reinforcement learning what they understand And there these are agents that actually run on the quantum devices And they're supposed to run on the quantum devices because these advice are actually quite noisy still But the idea there is that anything that the quantum computer can offer you as an advantage as a computational advantage Those agents would have and there are peculiarities into how you interact with these systems You know measurements and then extracting information from them, etc Which you have to take into account in order to revise, you know the classical reinforcement learning algorithms We wait for you