 I am Bono Vermerge, so it's my pleasure to chair this session, this conference, and we will start right away with Marine Bukov from Max Planck in Dresden. And it's first a 45-minute talk, and then we'll have two 30-minute talks. And this talk is about design tangling, multi-qubit states using deep reinforcement learning. Thanks a lot for the nice intro, Benoit. And also thanks to the organizers for having me here. So this is a story that's been ongoing for quite some time, and I think some of you may or may not have seen like a previous version of this. But recently the story started growing, and it's basically very close to completion. This is kind of a nice sign of it. So as the title suggests, I'm going to be telling you about this entangling multi-qubit states using deep reinforcement learning. And here is essentially the menu for today. We'll first go briefly over a short introduction motivation. I basically want to introduce what the disentangling problem is and how you can think about it. Then we'll briefly talk about reinforcement learning and how to set up the reinforcement learning framework for the disentangling problem. And this is where you'll probably see some of the concepts that you guys have seen also last week from Florian Marquardt. And then we'll discuss the results. So we'll talk about disentangling few qubit states, so two, three, and four qubit states. And then multi-qubit random states, particular five and six qubits. And then time permitting, we'll talk about some of the applications of this business in particular for reducing CNOT counts, how resilient these approaches to various sources of noise. And also, hopefully I can show you like a little application to modern these devices. So let's get started with the intro and motivation. So as you probably know, one of the defining characteristics of quantum states is quantum entanglement. If I have a Bell state and I measure the state of, say, the first particle, then this will automatically collapse the state of the second particle. And this is something that was first realized in 1935 by Einstein, Podolski, and Rosen, and was called famously spooky action at a distance. But if you forward roughly 100 years to 2024, we're now thinking about how to make use of quantum entanglement in order to develop new quantum technologies. And to do that, one useful quantity is the entanglement entropy, because it allows us to quantify the amount of entanglement in a quantum state. So it's given here by this expression. So it's the Shannon entropy of the reduced density matrix R01, which can be computed from the density matrix of the state by tracing out, say, the second qubit. And then if I find that the entanglement entropy is greater than 0, then I know that my state has entanglement in it. So besides quantum technologies, of course, quantum entanglement has applications in other fields of physics. And that's also why we care about it. Like, in particular, in quantum optics, experimentalist and theorist use beam splitters in order to entangle photons. And they study states of entangled photons. But also in condensed matter, physics or many body physics entanglement can play a crucial role. In particular, defining ground states of spin liquids, for instance. So over the last 10 years, we've already achieved a number of milestones, trying to understand and manipulate quantum entanglement. And let me just mention here a few points. So the first one is that we are able to implement perfect two qubit entanglers. What this basically means is that on the level of small systems, like two qubits, we can manipulate entanglement with a very high precision. But over the last maybe three or four years, we've also managed to do this, or to scale this up to the many body regime. And in particular, two interesting applications or realizations are the ground state of the Tori code or many qubit GHZ states. And these, there have been experiments, as I'm showing you here, that were able to prepare these states. So this is kind of on the fundamental side. These are interesting states and we care about them because they show us how to think about fundamentally different types of order. But entanglement and controlling the dynamics of entanglement also has practical applications. And I just want to mention here two applications. So the first one, it turns out that if you want to prepare a state on a quantum computer, on a quantum device, an arbitrary state, then the way to go is to first figure out how to disentangle that state and then you reverse that process in order to get to your, to prepare your target state. And what this tells us is that you can view entanglement as a proxy for the complexity of a quantum state. So in some sense, you can classify states according to how much entanglement they have. And the second application is kind of a generalization of this state preparation procedure, which is when you have an arbitrary unitary operation and you want to decompose this unitary operation on a quantum device into native gates, then modern algorithms also do a sort of a disentangling procedure in order to do what is known as transpilation or circuit decomposition. So these are just a few practical and a few fundamental applications of entanglement and controlling entanglement. And as I already advertised in this talk, I want to talk about disentangling quantum states. And so let me define for you the quantum disentangling problem. So suppose that we have a pure state psi of L qubits, which I'm showing here in this picture. Now, if I consider subsystem A to be the subsystem of a single one of these qubits, then I have already shown you. I can compute the reduced density matrix by tracing out over all the remaining qubits. And then I'm left with a single qubit reduced density matrix from which I can compute the entanglement entropy. Now to check if the state psi is entangled, one thing I can do is I can look at the average single qubit entanglement entropy, which is shown here in this equation. So if I find that the average single qubit entanglement entropy is 0, then it means that psi is in a product state. Why? Because the entanglement entropy is a non-negative number. So if I find that this sum here is 0, then it means that its constituents have to be 0 separately. But if all the constituents are 0, then it means that every qubit is basically disentangled with all the others. And so therefore, we have a product state. And the other way around, if you have a product state, then it immediately follows that the average single qubit entanglement is 0. So that's a quantity that I'm going to be focusing on throughout that talk. And now the disentangling procedure goes as follows. So the goal here is to find a unitary operation U such that when you applied to your favorite state psi, we basically get the product state everything down or everything's 0. However, there's always some unitary operation in a huge Hilbert space because unitary operation has some sort of generalized rotations. But there is a catch. What we want to impose on this unitary is that it can be constructed out of local or to qubit operations. And the reason why we want to do this is because this is what's accessible on modern devices. So interactions in nature are local and therefore, we have access only to local unitaries. However, I would allow myself to consider non-adjacent qubits. So basically, I'm not going to be only talking about unitaries that are applied on neighboring qubits, but also unitaries that are applied on qubits that are far away, but only on a pair of qubits at the time. And this is something that is not so restrictive because modern platforms such as Traptions or Rydberg atoms can actually realize such distance to qubit unitaries. Okay, so that's essentially the setup. And now the goal for us is to find the disentangling circuit, which consists of two qubit gates in such a way that in the end, we get like this product state. And this is just an example of such circuit. This gray object here is the state. Then I'm labeling the qubits one through eight. And then you can kind of see these are two qubit gates. And they can be acting on neighboring qubits or also on distant qubits. Now, if you think about it, the disentangling problem is essentially a two-fold optimization problem. So first, what you need to do is you need to find the optimal order of the pairs of qubits to apply the two qubit unitaries on. And second, once you've identified an optimal pair, you also need to find the optimal angles that parameterize the corresponding unitary to act on it. So once again, the discrete part comes from choosing the pairs, and then the continuous part comes from parameterizing the unitary that should act on these pairs. And as it often happens, whenever you have a mixture of a continuous and a discrete optimization problem, then the problem is way too difficult. And part of the difficulty can be understood as follows. So first, if you think about the continuous optimization, then this is a difficult problem because the Hilbert space of my system of L qubits grows exponentially with the number of qubits. But even if you didn't have to worry about the continuous optimization, then the discrete problem itself is also quite challenging, because if you have L qubits and you fix the number of gates or unitaries that you can apply to your state, then you have again an exponential number of different ways you can arrange these unitaries. And so there is another kind of exponential that lurks there. And so we have two types of optimization problems, a discrete and a continuous one. And what we want to do is we want to consider them separately. So very quickly, I just want to mention that the continuous optimization problem can be solved exactly the way it's done basically is the following. So we are applying two qubit gates. And so the relevant object is the two qubit reduced density matrix. And so it turns out, and that's something that I can explain later on, to find an optimal local unitary, all you need to do is diagonalize these two qubit reduced density matrix. So this diagonalization gate is actually a two qubit gate. And we can use that in order to disentangle locally to solve the continuous optimization problem. And one thing or two things that I wanted to remember is that this locally optimal gate will depend explicitly on the state. So if I change the state, then I will change the gate. So it's not universal. And second, it's not unique. So there are multiple ways you can do that. So once again, this is a complicated slide, but all I want you to take home from this is that there is a way how to compute the locally optimal to qubit unitary. And we will assume that we have access to that. And so this leaves us with a discrete optimization problem. And the discrete optimization problem once again is the problem of choosing the correct order and pairs of qubits in which to place this locally optimal unitary on. And so once again, for each pair, we're going to apply such a gate. However, this is not enough. We want to incorporate experimental constraints, which go as follows. So first, we want to be applying as few gates as possible. And the reason for this is that these gates are actually extremely noisy on modern devices. And so we want to decrease the number of gates that we apply. And second, I want to be a little bit greedy. So I want to really be able to disentangle an arbitrary initial state. So you give me your favorite state of L qubits. And then I want to be able to construct like the sequence of unitary to do that. And in particular, we are interested in states that have no immediately obvious entanglement distribution or entanglement structure, such as, for instance, high random states. So for those who don't know how random states are states whose amplitudes are drawn randomly from normal distribution. And so as I mentioned, this discrete optimization problem is a difficult combinatorial problem. And what this talk is going to be about is I'm going to tell you how to use reinforcement learning in order to solve that problem. Yes. Yes, it is. And come to this in a second. OK. Now, why is a disentangling problem difficult? So let me just show you now two different ways how you could approach this problem. And then this is going to reveal why the problem is complicated. So once again, we can see the high random states. And now I want to define what I call a random agent, which goes as follows. So the random agent will select the pairs of qubits, i and j, uniformly at random. So it selects randomly to qubits. It applies the locally optimal disentangling gate. And then what I want to do is I want to monitor this average entanglement entropy, as shown here. So this is kind of the figure of merit and see how it goes down as the number of gates m increases. What I'm showing you here on that figure is on the x-axis the number of gates and on the y-axis this average entanglement entropy. And the different colors correspond to different system sizes. And what you see here are two observations. So first, the average entanglement goes down exponentially with the number of gates. And you think that's good, right? It goes down fairly quickly. However, there is a catch if you consider the time constant, so the decay rate of these curves. Then you find that the decay rate itself that's shown here in the inset as a function of the system size actually grows exponentially in the system size. So this basically tells us that there is essentially no free lunch and that the random agent trying to use two qubit unitaries in order to disentangle the state. Even though it manages to do that, it will take it infinitely or exponentially long. And to see where the exponential comes from, it's very simple. So if you have like a large state, you're applying two qubit gates, then all you care about again are two qubit reduce density matrices. But those guys for how random state are very close to an infinite temperature state, which is essentially the identity. And so all the information is essentially in this little pre-factor here, epsilon, which you know for how random state scales exponentially with the system size. So that's where this exponential scaling comes from. So this is why random agents actually will struggle to solve this problem. Now you say, OK, let me try something else. So let me take what I call a greedy agent. A greedy agent is an agent that tries all possible pairs of qubits for a finite number of qubits. There are not so many. So let's try out all possible pairs. And then we're going to post-select that pair that minimizes the entanglement locally. And you can see here in the lower right what happens with the entanglement. So even though you gain a factor of two compared to the random agent, the scaling actually, the scaling loss persists. So in both cases, we find this exponential. So this is why the problem is difficult. And so the question actually that I want to raise in this talk is, can we do better by using? So yeah, can we figure out how to disentangle states efficiently by using partial information about the state? So what I'm going to allow myself is that I can measure locally properties of my state. And then I want to use this information in order to interactively feedback and correct for my control onto the system. And so why do I think that that's a good idea? So first of all, if I have more information, then I'd better be able to do better than the random agent. But then I can also potentially do better than the greedy agent. And the reason is the following. So I just want to recall last time you tried to disentangle like a knot. You very quickly realized that it's basically a huge mess. And then the order of operations in which you try to do the disentangling actually matters. And so for the same reason, if you actually go for the greedy operation, this is the greedy agent, you're not necessarily going to be optimal in disentangling quantum states. And this is why we think that we can also do better than the greedy agent. So however, as I mentioned, we want to impose physical constraints on the obtainable information here. So clearly what I want is I want access to the full quantum state. But that's not possible. So quantum states, they're nice mathematical constructs, but no one can measure them directly. So in other words, full state tomography is exponentially costly in the number of qubits. So this is an ogo. However, two qubit reduced density matrices can be contained. And they can be computed. And they can be computed with a cost that goes only as a square of the number of qubits. So I'm going to allow myself to do that. And then what I need for my algorithm is I need access to the figure of merit. So I need to be able to compute single qubit reduced density matrices from them to estimate the average entanglement, single qubit entanglement. And second, I also need to construct this locally optimal to qubit gates. And this is something that I went through very quickly. So I want to stress once again, these actually do depend on the state itself. So I would need the two qubit reduced density matrices. So once again, I'm going to allow myself to measure all two qubit reduced density matrices. They're L squared of them, or L times L minus 1 over 2. And then from them, I can determine the single qubit reduced density matrices and then the figure of merit. But I can also compute my local gates. So that's the multi-qubit disentangling problem that I'm setting up. And the question is now, how do we use reinforcement learning in order to find a unitary or a sequence if you wish a circuit of unitary that disentangles arbitrary states? So I just want to very briefly mention what reinforcement learning is for those who weren't here last week. So in a nutshell, reinforcement learning is a way you teach your dog to sit. So you say sit 10 times and the dog sits once, but when it does so, you give it a treat. So you reinforce a certain type of behavior. And because of that, reinforcement learning naturally entails interactive dynamics. So there's an interaction between an agent, here the dog, and the environment, the person who's teaching the dog, and the way that works is the agent is trying to solve a task in this way to sit down by choosing actions to sit or not to sit. But whenever an action is chosen, then there's a reward that's fed back to the agent. So there's this reinforcement loop that goes in. So reinforcement learning is kind of famous for, or became famous for playing games recently. And one of the games that you could imagine it playing is actually the Tetris game. So in this case, the agent would be allowed to take actions like left, right, turn left, turn right, hold, or drop, like this corresponding brick here. And then based on that, there would be a reward. In this case, the score will be fed back to the agent. And importantly, the agent is allowed to observe the state of the game. And based on that, it's supposed to be taking the actions. And the reason I'm making this analogy here with the Tetris game is because recently there was an interesting paper that showed that this entangling state is actually very similar to playing a Tetris game. So in some sense, finding where to place an optimal unitary is similar to finding where to drop a brick here in this Tetris. And so from that perspective, the question that I'm asking here can be phrased as the following way. So can we play this entangling games in a smart way? So let me show you now how that works. So again, we have an agent. The agent has to learn how to disentangle states by interacting with its environment. What does this environment contain? The environment contains the qubit, so the quantum state, S. However, the agent does not have access to the state. The agent only has access to observations of the state, which consists of all two qubit reduced density matrices. So we are going to then consider actions. So the agent is going to select the pairs of qubits to place an optimal two qubit unitary on. So that's what the agent looks at. So if you think about it, in other words, you can think of the agent as a probability distribution over all qubit pairs. And then there would be one pair that would have the maximum probability, and that's the action that the agent will take. And so the agent takes this action, and then it places the gate to the state. This is going to lead to a new state. And then from the new state, we can again compute the observations, and that's what the agent then looks at before taking the next action. And then finally, there's also the reward that we give to the agent. So what we want to do is we want to minimize the average single qubit entanglement. Now, this is a complicated expression. Literally, what it tells is that we are looking at the relative entanglement entropy of the state, so in the relative change of the entanglement entropy. And then we also want to have a penalty for the number of qubits left to be disentangled, because we want to incentivize the agents to use as few gates as possible. And for the experts in the audience, then the actual algorithm that's behind it is the actor-critical algorithm. And there's something that goes behind the scenes there, which I'd be happy to discuss in detail, but I'd rather leave it like that as a black box. So this was a little bit quick, so let me recap it one more time. So what we have is we have an interactive feedback group. The states are the full states of the system, but the agent only has access to observations, and the observations are the two qubit reduce the estimatrices. And then what this agent is producing in the end is it producing an action, which is telling me which pair of qubits do I have to place my optimal to qubit reduce the estimatrices. And then the agent itself will consist of an actor and a critic. And this is a machine learning model that would, in the end, produce the probability distribution from which I'm drawing actions. And if we now look into that blue box here, what we find is that, OK, so there's the actor, and there's the critic. And so these are some, you can think of them neural networks, essentially. So they, in the end, mimic a probability distribution. So somewhere at the top, there has to be a softmax layer to make it a probability. Then we do have these blue linear layers. So these are simply fully connected layers with some weights and biases, so machine learning parameters theta. And then the interesting bit here are the orange layers. So there's these encoders. So we're basically using transformers. And the reason why we're using transformers can be explained here with this picture on the left. So suppose that you have a state of four qubits. Now, if you know how to act, how to disentangle that state, for instance, by using this circuit here on the left, if I now take two of these qubits and I swap them, so I permute some of the qubits, then this automatically will tell me how to permute my circuit in such a way that it still disentangles it. So I want a machine learning architecture that is invariant or equivariant to swaps to permutation. So if I swap something in my input data in my state, then it automatically corrects for the output, corrects for the actions. And it turns out that transformers, the way you know them, actually are permutation-equivariant. The tiny little difference to chat GPT and to language models is that their dispermutation-equivariance is explicitly broken by a so-called embedding. And here we don't use such an embedding because we want to make use of this feature. And finally, just one single word about the reinforcement learning algorithm, I'm sure that last week Florian explained to you how policy gradient works. So the active critic algorithm is essentially a variant of policy gradient. So my agent is trying to maximize the cumulative expected return. And it's doing so essentially by doing gradient descent in this parameter space. So that's all you need to do. Okay, good. So with this, I basically introduced the problem and the method. So now I want to show you what we can do with this agent. So let's start very simple and let's just consider a state of only two qubits. So I'm talking about arbitrary states here. Now, okay, by definition, I'm using local to qubit disentangling gates. So it means that a single unitary basically is enough to disentangle any two qubit state. If I have three qubits, three qubit states, then it turns out that to disentangle it fully, so to bring it into a product state, what I need to do is I need to place my local unitary on qubits one and two, and then another one on qubits two and three. And these two qubit gates are enough to bring any entangled or not entangled any three qubit state basically into a product state. So these are two simple cases in some sense. But the interesting thing happens when you actually consider four qubits. And now we want to test a trained agent. So I took the agent, I train it, and I want to see how it's going to perform. And to test how it's going to perform, I want to give it states that I know basically the entanglement structure of. So what you see here on the left is an initial state, which is a product of two bell states, two bell pairs on qubits one and two, and then the second pair on qubits three and four. And what you see here on the right is the policy of the agent, so the probability of taking the different pairs at the given time. So basically when it looks at the initial state, then it correctly recognizes that with a 50% probability it should place the unitary on qubits one and two or qubits three and four. And indeed it doesn't matter which one of the two it should act on. And then in that case, by chance, the agent decides to pick up one and two. So that's the red color upon which this probability immediately collapses in such a way that then at the second stage, when the agent looks at the observations at the two qubit reduced density matrices, it immediately recognizes that it has to act on qubits three and four. And then the state is fully disentangled. So this is simple. So let's make it a little bit more interesting. So let's take now a GHZ state on qubits two, three and four. And this I'm taking with a product state basically of qubit one. So the agent looks again at the observations and then in this case it realizes that all the pairs two, three, two, four and three, four show up with the same probability of this case one third. So in other words, it correctly recognizes the structure of entanglement in this GHZ state. And then like we had, it's basically applying the optimal to qubit strategy where it only takes two unitary in order to disentangle the state. And notice after it applies the first unitary then it already, and it looks at the state, it already knows that there's only one possible action that's left to go. Okay, so far you're saying not quite impressive because we already knew what the entanglement structures of these states are. So let's now really go to the application of where these agents can be useful. So let's take a three qubit high random state, tends a product with a single qubit high random state. So this R here stands for a random and then the number of qubits that show here in the subindex, this gives you the support of the state. So in this case you don't really know where the entanglement in the state is, right? I just drew this state from some hard distribution and then the agent looks at the observations and then it figures out that it's advantageous to apply the gate on qubits one and three. It applies that gate and then afterwards it figures out it has to apply the gate on qubits one and two and after two steps in the end it finds the product state. And again, because this is effectively a three qubit state here, it only requires two unitaries to do that, two gates to do that. And now you can imagine here on the left is the most interesting or the most difficult situation for qubits where there's absolutely no entanglement structure in the state whatsoever. So this is a fully random, high random state on four qubits and in this case you don't really know what these actions mean but what's actually important is that the agent takes exactly five to qubit unitaries. And you can actually analyze the sequence so now that we've seen this happening what you can do is you can analyze this five qubit sequence and you can draw a couple of conclusions. And the interesting thing is that it takes exactly five unitaries to disentangle any four qubit states, right? So you give me your favorite four qubit state and then my agent is gonna use no more than five to qubit unitaries. And this is already a little bit surprising but okay, there's essentially no free lunch here. The catch if you wish is that these unitaries depend on the state itself. So if you change the state, then I also change the unitaries. What's not, what's non-trivial is that the circuit topology is universal. So I don't have to change where to place those qubits, right? I can always use essentially the same circuit. And again, there's a proof for this which goes like in four lines but in the interest of time, I'd rather move forward to show you some bigger states. Just a tiny remark, so any four qubit state so you can show by just counting the number of C-naught gates that is required to implement the circuit and you can show that actually you need no more than 10 C-naught gates. So this is actually a little bit surprising because you know that an arbitrary two qubit gate requires three C-naughts in general. So you might have expected 15 C-naughts but now it's actually 10 and you can actually do the counting to see how this comes about. Okay, so this was essentially the small system, the four qubit system where you can benchmark the agent and see how that goes. So now I want to show you what happens when you go above to like five and six qubits. We will stop at six qubits because as I showed you in the beginning there's an exponential wall and you know this exponential wall comes from the restriction of using only two qubit gates but that's physical so we're gonna play the games according to physics. But before that, yeah, let's go to the five qubit state. Okay, so now I want to explain to you here a picture that's gonna very easily start getting crowded and so let's do this now step by step. So first what we consider is a five qubit higher random state which I'm going to denote here by these five circles q1 through q5 and these qubits can be entangled in an arbitrary way. On the x-axis here you see the episode step. So this is, as I said, the reinforcement learning works like in terms of this feedback loop so every time there's gonna be a unitary to be placed so this is the episode step is essentially telling you there's one basically step per unitary and at every step the agent will be looking at the reduced density matrices and then it's gonna be estimating what is the probability to place a two qubit unitary at a specific pair of qubits and like in this case for instance the largest probability is between qubits three and five and therefore I'm placing a gate between qubits three and five and then this little number here, 57 shows you the probability with which this action has been chosen. There's no space in the figure to show all the other probabilities but the ones that chosen is there and now there are those circles as well so the circles show you the entanglement entropy between that qubit and the rest of the system. So for the first qubit it's 0.99 so they're all actually fairly entangled with the rest and then this number here is the average basically this expression here so the quantity that we want to measure. Now what you'll see in a second is that these circles are gonna start changing colors when that happens what it means is that the entanglement entropy drops down by one order of magnitude as you go from red to blue to green and then there's like the gray one. So the rest here is not the verb rest it means all the other pairs, okay? So I don't have enough space on this graph to show L squared pairs right? So I can show you the most interesting ones and then probabilities that are smaller than 1% I don't care about them anyway. But there's no action where the agent doesn't do anything. Sorry about that confusion. Okay good so let's see now. So let's take a five qubit how random state and let's see how this circuit looks like that the agent produced. So if you take a look at it, it looks like a complete mess right? And in fact it is more or less a mess but there is some interesting structure that I want to explain in a second. Now let's first make the first observation so you can kind of see even after applying 19 gates there's still some entanglement entropy left in the state. This entanglement entropy is below 10 to the minus three and that's actually the threshold that we set to the agent. So we say okay if you bring my state into an almost product state up to 10 to the minus three then I can see that this problem solved and then we can move forward. So that's the first thing. So we have to place a threshold. This wasn't the case when we considered four, three or two qubit states but from five qubits on it actually is a feature of the problem. Now the second thing that you see here is that the circuit is actually highly correlated and it's correlated both in space and in time. I want to explain this by basically pointing towards the right places of this feature. So what I mean by the circuit being correlated in time is that you see the agent starts applying gates but initially all the gates involve qubit number three. So for some reason qubit number three is special and then it continues to be special until the single qubit entanglement entropy of qubit three is depleted. It goes down to point 18 and at that stage the agent says okay screw qubit three. I have to move over and jump up to other qubits where I can now draw more entanglement from my system. And then it jumps again but then you see again the gates that follow once again always the next gate shares support with the previous gate. So this is also clear here in the end there's like a very long sequence where there's always this shared support. So in that sense there's correlations between space and time but it's actually a little bit more interesting than that. It turns out that even though this circuit is essentially random and it is random there you should not look for structure in a circuit where you start from a high random state because you gave me another high random state and the whole circuit is going to change again but this feature is actually gonna be robust. And so the second feature is a topological property of this circuit. So it turns out you can decompose the circuit into topological segments that I've highlighted here in colors and they go like that. So let me take the five qubits and let me place these five qubits on vertices of a graph. And then what I will do is I will start drawing connections between these vertices but in the order prescribed by the circuit. So let's take now this orange, sorry this purple oval here. So I interest it in qubit three then first I apply a gate between three and five so I go from three to five and back then I go from three to four then I go from three to one then I go from three to two and then if I have to go to the blue part of the circuit I will have to lift the pencil and then draw another connection between four and five. Then I have to lift the pencil again and then I go to the green circuit right which essentially connects qubits two and one and then two and five and then five and one. So there's like this topological property, something that allows you to classify the mess essentially. In other words, it's often the case when any kind of correlations fail then you look for the most robust one and then if there is something it's gotta be topological. But, oh yeah and then there's one more take home message here which I want to make which is also interesting. So you might have thought, well if I had to solve this problem what I would try to do is I would try to kind of factor out one of the five qubits. So maybe what I can do is I can factor out one of the five qubits and then I'm left with a four qubit state and then I already have my optimal strategy for the four qubit state that I showed you before then I can just apply that. But if you actually look at the final stage of the sequence so where you basically disentangle one of the qubits from the rest where you see the gray color for the first time then you see that the sequence here looks quite a bit more different than the optimal one and this essentially tells you that the agent doesn't necessarily follow this divide and conquer strategy. We have seen examples for other states where the agent does use the divide and conquer strategy but it doesn't always have to use it. And whenever it's not using it like in this case it's actually interesting because it tells us that the agent learns how to implement non-local multi-body interactions among all the qubits. So it doesn't really focus on a subset no it tries to involve all the qubits in the disentangling process. Okay, good. So let's move a little bit forward. So these were five qubits. Now you may wonder okay, was this just a luck that it was able to disentangle that specific random state? So what about statistics? And this is what I'm showing you here. So what we are doing now is we're considering a number basically ensembles of different high random states and these high random states are classified according to this product structure. I think this is more or less self-explanatory. Now on the y-axis what I'm plotting is the number of unitary, so the length of the protocol. And there are numbers here on this histogram. So first of all there are colors, right? So there are three colors. The black color is the RL agent. That's the most interesting one. But remember the other two agents, right? So the random agent and the greedy agents. Now they show up again here. These are the two blue colors. And in all the cases you see that the RL agent actually outperforms all the other agents. And this is what I meant earlier. You can do better than a greedy agent and this is because what reinforcement learning is doing is it doesn't locally optimize for the given state and the given action but instead it optimize the long time return, right? So it optimizes a non-local objective and this is why it can actually learn to sacrifice certain actions intermittently but then gain something towards the end. All right, so now, and then okay. And then these numbers are showing us the mean, right? So the mean number of unitary is required to disentangle these states. And there's also the standard deviation around that. We decided to bar it. So this is, you know, what the agent does on high random five qubit states. Now you can wonder what about like four qubit or six qubit states. So here's the data. So it basically behaves in more or less the exact same way. And as you kind of expect, so there's something interesting here. If you look at the five qubit high random states, so states that are supported on all five qubits and the agent takes basically 20 gates to get to disentangle the state. Now if you look at the six qubit high random state which has a product structure in such a way that there's a product of five qubit high random states with a six qubit, then, you know, this also takes about 20 gates. So it's kind of consistent in between. And that's, you know, a way how to test the performance of the agent as you scale it up. Because again, you know, you don't really know what these states are. All right, good. So another thing that you should also, you know, that I should also mention here that you should notice that the number M, the number of gates actually grows and it likely grows exponentially, you know, as you increase the number of qubits. And that's again a feature that's imposed by only considering two qubit unitary. So if you try to disentangle a high random state with only two qubit gates, then it turns out that you cannot be the exponential. And that's actually something that's a mathematical fact that you can show. Okay, so it looks like I do have a couple of minutes, maybe not, for applications. Okay, good. So I do want questions. So let me maybe just show you very quickly. So what you can do now with these is you can use them in order to decompose circuits in terms of CNOT gates. And what happens here is that the protocols found by the agent are the ones that require the least number of CNOT gates. And we compare this, you know, to Qiskit. So this is kind of state-of-the-art tool. And our agent is actually being able to outperform Qiskit in a number of those. So you can find basically compressed circuits by doing this. We also tested the agent in the presence of noise and there are various kinds of noise that you can think of. For instance, noise in estimating the reduced estimator system or noisy channels like, you know, decoherence channels, depolarizing channels, et cetera, in which the agent basically is robust to a certain extent. And this is maybe the last thing that I want to show. So we also try to apply this agent on actual NIST devices where all kind of noises are present. So this, you know, we did this starting from the bell-bell stare and then in this case, you know, the agent correctly identified, you know, the right probabilities after the first, sorry, before applying the first gate, after applying the first gate, the probabilities are still roughly the same. And the reason for this is that the state here is very much entangled with the environment. So it's not actually a pure state as it used to be anymore. Now you can quantify this by looking at entanglement, the formation, et cetera, and you can try other bell-bell states or GZ states. And, you know, I'd be happy to show you these data if you're interested like in a private, you know, in a private conversation. So now I think I'd rather leave some time for questions. Just one thing I want to mention, you know, if there's anyone who's interested in trying out our agent in the lab, please let us know. We'd be happy to collaborate. So thank you guys for the attention. Question. Hi, Manini. I'm missing the initial part of your talks or for that, and I'm sure you probably give the motivation there. But I was wondering, what exactly can I learn about the disentangling process when you found this very nice topological structures in time, if I understood correctly? Well, so these are structures in the circuit. I mean, I wouldn't know immediately what the good use for this is. It just tells us that if you really are looking for an optimal, so for a compressed circuit, then the compressed circuit is necessarily highly correlated. So in other words, you don't have the freedom of placing the gates wherever you want. You have to place them in a very specific way, and the way in which you have to place them depends on the state itself. And this is actually the big advantage of using reinforcement learning that by looking at these local observations of reduced to QB density matrices, you can actually say, aha, it is these two QBs that I need to place the gates on and not something else. Can you put your first transference? I think I found something that I didn't understand at the beginning. Yes, yes, yes. When you were saying, I mean, I think what you are saying is that your method, it's very useful, because if you want to prepare these are unitary, so you can revert. So if you want to prepare something which is very complex, you have the best circuit. It can do it. So if you reverse everything in the other direction, because everything are unitary, so it's indeed very useful. But I wanted to go a little bit before here. You have to stop. Yeah, when you were saying, yeah, at the very beginning, because you were saying, here, no, no, here. When you say, this I was a little bit confused, but probably I understood that. So if I have a pure state, which is multipartite entanglement, and I trace out one of the systems, and I found that it's a product of the state. That doesn't mean anything, no? No, so what you do is you have an L-cubit state, and then you look at the single-cubit reduced density matrix, and that's potentially entangled with all the other qubits. Now, let's say that I find that it's not. Then it basically means that qubit A is in a product state with the rest. Are you sure? So I'm sorry, I'm a little bit confused. I have a G8 set, which is three-partite entanglement. I trace out any of the systems, and it's not entangled. I trace a second system, it's not entangled. But obviously, the G8 set, the pure state, it's entangled. So now I'm a little bit confused. Sorry, I'm saying probably something which I don't understand properly, but if I start by a multipartite system, and I trace out. There is entanglement in the state, but it's among the remaining two. There is not entanglement whatsoever, OK? So I have a G8 set state. I did have the G8, this state as an example. Exactly, but you have 0, 0, 0 plus 1, 1, 1, OK? I do the reduced entity matrix. Here. B and C, they are disentangled, OK? OK, so let's do that. So there's a three-qubit state, right? Then you basically apply a G8 on qubits 2 and 3, right? And then what this is doing is it's essentially, it's cutting that bond, and then it's shifting the entanglement on the remaining two qubits. With a unitary that you are doing, yes, but I just was a little bit surprised. I even have the proof of this thing that I'd be happy to go over. No, no, with a unitary I'm sure. I'm just saying that tracing out, OK? You have a system. Maybe the conclusion comes from the fact that I actually considered the state of the system to be pure at all times. When I apply the G8, I apply it on the pure state, and then I'm getting a new state that is also pure. So at any given point of time, I keep a pure state. That's the statement. If it's non-zero, then you apply a G8. Got discussion. Can we continue over a coffee break, maybe? I'm happy to go over this in detail. I'm pretty sure that this is on. No, no, that's fine. It's good to go over. So I think we have to move to the next speaker. Let's thank Marine again for a nice talk.