 Today, I'm going to talk about the last missing components. I told you our goal is to build a fault-tolerant quantum memory based on LDPC codes. And there is one missing component. It's a decoder. I'm going to talk about decoders, or many different decoders, for classical LDPC codes and for quantum LDPC codes. And we're going to see how classical decoders adapt or not to the quantum setting. So it's going to be mostly drawings and graphs. So it's not too many definitions. And my goal is mostly to give you an intuition on how these decoders work. And the reason is because it's the main thing I can do. I cannot prove much. We cannot prove much. Most of them are heuristic. In some cases, we have some proofs that they work partly, but we still don't understand very well. So it's still a very active research topic. And it's one of my favorite topics, personally. So I'm going to review the whole scheme, what we have done until now, and what is the missing component, where it is. So we know that we need to stabilize our code to encode our information. When we have poly errors, they affect our code state. And we use this tanner graph to represent the code. And in practice, we use a syndrome extraction circuit. So we run a quantum circuit to measure the syndrome bits. We run two of them, one to measure the z part, one to measure the x part of the syndrome. We can run them simultaneously. And then we need a decoder. The decoder will take as an input the syndrome. And it's going to output a correction that we apply back to our qubits. So this is a whole loop. We have the code. We are using hypergraph for the codes. We have a circuit, a syndrome extraction circuit. And we even discussed how to lay it out, a proposal to lay it out. It doesn't exist yet, but it's a theoretical proposal. And what we are going to do today is building this decoder. OK, so let's look at the decoding problem. I already defined a minimum weight decoder. We can define other classes of decoders. And the model I'm going to assume is perfect measurement. I'm going to get back to that after. Why perfect measurement? But it's the most simplest model. You have a parli error. You can measure the syndrome. And your circuit measuring the syndrome is perfect. And we have some noise model. We have a distribution, a probability distribution, over the set of parli errors. So I can tell you which parli error appears with which probability. And a decoder, it's a map that's going to take as an input the syndrome, this bit string we measure, and that spits out a parli error, a correction that we apply to our qubit. And our different classes of decoders are returning an error with different properties. A minimum weight decoder gives you an error with minimum weight with the right syndrome. So we always want to have the right syndrome. That's the information we have. And we can change the property we want about our parli error. The second property we can have is that it maximizes the probability to have this error given the syndrome. It seems like it's the right thing to do. We have a probability distribution. We know exactly what error can occur. So given the syndrome, we should take the most likely one. This is the most likely error decoder. But it's the best we can do in classical information theory. In quantum information theory with stabilizer codes, there is something extra. There is something weird that's happening is that multiple errors can have the same effect. They are undistinguishable. And those errors are those that differ from a stabilizer. If you multiply an error by a stabilizer, it's the same error. As they act the same, they cannot be distinguished. So we should maximize the probability of this error up to a stabilizer. We multiply by all possible stabilizers. And we maximize the sum of all the probabilities of these errors. And this is the most likely coset decoder. So it's the three classes of decoder we may want to build. So I will build one of them. And my goal, let me answer with the next slide, maybe. So how do we compare them? I can choose to build a minimum way decoder or to build a most likely error decoder or a most likely coset decoder. Which one should I pick? And the first answer is that you should pick a most likely coset. That's the most accurate. That's the one that takes into account all the information we have. So we should pick a most likely coset if we cannot, we should pick a most likely error decoder. And if we cannot, a minimum way decoder is a good idea. And sometimes we cannot even build that. We build an approximation of that. And when the noise rate is low, they all behave the same. So in practice, we hope to have good enough qubits to have a low noise rate. So we don't care that much. But the thing is that when we have very high noise rate, when we have bad qubits, the most likely coset decoder is a bit better. So it does buy you something. It's not the regime we want in the long term. In the long term, for a large-scale machine, we will be in the regime where we have low noise rate. For the first generation, we should squeeze out as much as we can from the performance of our machine. And the difficulty is that this decoder is much harder to implement. The complexity of implementing a most likely coset decoder is much more trivial than this one or this one. And even this one, even those two, are NP-hard. This one is sharp PR. So we are not even going to try to do a most likely coset decoder. We will be in a regime where they are all roughly the same. We will choose the simplest one. So roughly, your intuition should be I want to find a correction that has minimum weight. That's the simplest explanation for the syndrome. And we saw that we have three noise models. We have a noise model with perfect measurement. Only the input qubits are noisy. A noise model with noise on input qubits and measurement outcomes. And a noise model with noise on every gate. And I told you we should use this noise model with noise everywhere. But in the previous slide, I just told you we should use perfect measurement. We are going to use perfect measurement. For my definition of decoders, I used perfect measurement. There is a reason. It's because we can basically reduce the problem of correcting fault to correcting perfect measurement circuits. So the syndrome extraction circuit is noisy. And that means we need to correct faults in this circuit. So I should take care of faults. But what's happening is that faults typically behave like poly errors. They are exactly equivalent to poly errors in a larger code, in a different code. And it's something we see with surface codes. I'm sure you have seen this kind of picture where. So because of that, we are going to focus on poly errors. And I'm sure you have seen this kind of picture with surface code where this is basically the surface code graph here. And it represents measurement outcomes. And you repeat the measurement multiple rounds. You repeat D rounds of syndrome extraction with surface code. So you have a 3D graph that looks like that. And in this graph, you have faults that correspond to poly errors and qubits. But you also have faults in the time direction. And those faults in the time direction correspond to measurement error. If you have a flip of a measurement between two time steps, it's represented as a vertical edge. And what that means is that, instead of looking at the 2D surface code, which is one layer here, we can build a 3D version of the surface code. And this measurement error can be interpreted as a poly error. So because of that, I will not talk about correcting faults in a circuit. I'm going to assume that I have a bigger code in which those faults are simply poly errors. So this is what this equivalence means. Fault in a 2D surface code are poly errors in a 3D surface code. So I just need to understand poly errors. And we can generalize that recently to faults in a Clifford circuit, in a general Clifford circuit that correspond to poly errors in the space time code, in a code with a time direction, multiple copies of a color. So I'm only going to take care of correcting poly errors, or a bit flip in the classical case. Yes? Yes, exactly. The stabilizers will be supported on two consecutive layers. They look exactly like the stabilizers of the 3D turret code that we have as an example. Thank you. Yes? It's typically not. So what I'm telling you is that circuit faults must correspond to poly errors. So you must define the code as a function of the gates in your circuit. So it's a code that is defined as a function of these gates. It takes into account the exact structure of the circuit. I will not talk about it right now. I'm happy to discuss that during the week, but I will not talk about it during the lectures. OK, let's do the most simple thing we can do. Let's build a table. And in this table, we are going to store all the possible corrections. That's the first decoder. That's a lookup table decoder. So let's try that. So I'm going to give you an algorithm that builds this lookup table decoder. And we are going to see if we can use it. So our input is a code and some bound, m. The goal is to correct all poly errors with weight up to m. So this is our input parameter. We fix it. We want it to be large, ideally, to be of the order of d over 2, at least. And we want to output a minimum weight decoder for the correction of all those poly errors. We are going to do it very naively. We are going to start with our table. d is going to be our table. We initialize it with a correction 0. This should be the identity. It's poly errors. We initialize it. And then we start looping on all poly errors with weight w. So w goes from 1 to m. I loop over all poly errors with weight w. So with weight 1, with weight 2, with weight 3, up to m. So up to roughly d over 2, at least. And each time I compute a syndrome. And if I see a syndrome that is not in my table, I will add it to my table. So I look at if my table has a value 0, or the identity, I will add e to the table. That means I didn't see any poly errors with the same syndrome before. So that means it's the lowest weight. It's the first weight I see. And at the end, after this loop, I return my table. I give you my table. Does it work? So this is what we did with the Hamming code during the first lecture. We built these tables where for each syndrome value, we are going to have a correction to apply. So we just need to store the integer. This is a classical lookup table decoder. We can do the same thing where in this column we have poly corrections. So how much does it cost to build this table? What is the memory cost? What is the size of this table? And we can compute it. We are going to sum over all those poly errors that we put in the table. And the size can be this number. So it's a sum of n choose w, 3 to the w, for all the weights that we consider. And when there is an exponential and a binomial coefficient, it's a bad sign, right? It's going to be big. But I want to give you an intuition on how big it can be. So in this laptop, there is a flash memory. The main memory is flash. And there is your correction inside. It's hard to know exactly what kind of code they put. But to my understanding, it should be a code with lengths about 8,000. It's LDPC codes. And we can assume they have distance about 30. So do you think that if I try to build a lookup table and put it in the main memory of this laptop, it's going to work? Do you think they use that? So how big could it be? What's your guess? Petabytes? I cannot hear. Petabytes? Jeta. How much is jeta? 10 to the 12. 10 to the 12. No, 10 to the 12 should be tera. No, so do you think it's more than 10 to the 12 bits or less? More? How much more? Any other guess? So let's do the calculation. We need to store all correction with weight up to d over 2, so 15. I'm not even going to count all of them. I'm going to count only the weight 15. And it's already tens of 46 bits. And did you know it's called tens of 46? I could not find it. It doesn't even exist. The biggest I found was a queta bit. It's tens of 30. So it's 20 trillion queta bits. So maybe to protect my main memory here, there is a 20 million queta bit memory to store the lookup table decoder. I haven't opened the laptop. I don't know. But it's possible. So let's try to look for something a bit cheaper in memory. So we're going to look for the belief propagation decoder. It's the most popular decoder for LDPC code. It's very efficient. It's also used in statistical physics to estimate local properties of statistical physics systems. And we are going to start with a simplified version with the binary erasure channel. When we have the erasure channel, we have an input bit, 0 or 1. It's sent through the channel perfectly with probability 1 minus p. But with probability p, it's erased. And it's mapped onto a question mark. So this is our noise model. And our goal is to correct this question mark. And if we have a code, a 2-3 code, so 2-3 code, I mean that the bits have degree 2 and the checks have degree 3. And I'm going to look at one of the bits in my code. And I'm going to try to correct it. To correct it, I told you at the beginning of this lecture that the reason we care about LDPC code, it's because they have a local decoder. So we're going to look at the neighbors. We're going to look at the local tanner graph around this bit. We have two checks. It's a 2-3 code, so we have two checks. And each of these checks give me a syndrome bit. And this syndrome bit is the parity of all the neighbor, all the incident bits. So this one is the parity of 3 bits. And now can I use this check to correct this question mark? Sorry. So let's say what we know here, we have some value, 0 or 1. We computed this syndrome. But what is the syndrome here? How do you compute the syndrome? What you receive is a code word where you have some bit values here, 0 or 1. 0 or 1 everywhere. And you know that the sum should be 0. So the sum should be 0. And my 0 here is a question mark plus some value plus question mark. So I have two question marks. I cannot compute the value of this one. But I can look at the other side. I have only one question mark incident. The sum should be 0. So if I sum those two bits and I add this one, I should get 0. So this value should be given by the sum of the two other bits. Now we can locally correct. What's happening if those two bits are erased? Can you still recover my top bit? No, you cannot. Do you have any idea what I should do? So question mark means that the bit was not received. So here I got some value, 0 or 1. Here I got nothing, nothing, nothing, nothing. It's my previous channel. So either I receive a question mark or I receive the correct bit. So here I have four bits erased. I cannot use that, obviously. OK, great. Good idea. Let's look at the next checks. We can look further in our code. And we look further. Each bit has a grid 2. So there is another layer of checks. And each check has a degree 3. So there is an extra two bits. And when we do that, we cannot use this pair because there is a question mark. We cannot use this pair because there is a question mark. But we can use this pair to recover this question mark. We can use this one. Oh, yeah, you're right. This one, I can use those two to recover this one. That's true. Thank you. So I could recover this one. But then I will still be stuck with those two. So from this side of the tree, I'm stuck. And when I look at the other side, I can use those two pairs to recover the two bits. And this check will give me the bit. So I can locally reconstruct my bits. Does it work in practice? OK. So if p is small, I have a small number of question marks. So it should work. But there is another assumption here. Sorry, I cannot hear. They are IID. It's true that it's not always the case, but I'm still going to assume that. But we have seen some tanner graphs, right? Do they look like that? They don't look like that. This is a tree. Our graph does not look like that. So maybe it's going to be hard to use that. But in practice, we can make it look like that. We can pick a graph with no short cycle. And then it's going to look like a tree locally. And we can use this exact structure. OK. So I'm going to go a bit further, a bit more into details. Let's apply it to a graph with cycles, a realistic graph. That's the tanner graph of the Hamming code. We receive a bit string with some missing bits, some question marks they were erased. And now we want to correct that using our checks. We're going to use a notion of dangling checks. A dangling checks, it's a check that is incident to a single question mark. Do you see one here? So is the first one dangling? Exactly, there is two incident question marks. Is the second one dangling? OK, let's use it. How do we use it? We look at the value of this check. It's x1 plus x2 plus x5 plus x6. And it's supposed to be 0. We can put in the value we know. There is only one missing because it's a dangling check. So we can recover the value of x2. OK, we erased one question mark. What do we do next? Do you see it? OK, the third one is dangling. So we can correct our. This is our algorithm. This is a peeling decoder. It's very simple, right? It seems too simple. It's not going to work. So we could go back. It's a good idea to improve the performance. But it's a bad idea in terms of complexity. It's going to make the algorithm very expensive. And it's similar to what we do to solve three SAT problems where we have exponential time algorithms. Yeah, what would you like to do? I'm going to do nothing. Yeah, we are going to leave it. We are going to fail. But it's still good enough. That's the thing. So when does this decoder fail? It fails when there is a stopping set. It's a set of erased bits with no dangling check. If there is no dangling check, we cannot do anything anymore. So this is what's happening in the situation you identified. And then we just stop. The decoder stops. We want to keep a very fast decoder. So it stops. And what we are going to try is to design a graph where there is no dangling checks. And it's actually possible. So this is the claim I just made, that the decoder fail if and only if the erasure contains a stopping set. So we want to avoid stopping sets. And it's possible. About 20 years ago, Richardson and Urbenke designed very carefully. This is an informal statement. A family of LDPC code in such a way that the probability to have a stopping set, to have a failing configuration vanishes. Yes. So I define a stopping set as a set of erased bits. I mean, just one bit can be a stopping set. It is possible that one bit is a stopping set. I'm not claiming that this decoder is good for this code. So here, we can check. So this one isn't. No, if you have just one question mark. Yeah, so if you have one question mark only. And those two are removed. Is that what you mean? You are saying is this a stopping set? So the configuration we just used was a bit simpler. There was no stopping set. So you're right. This one is a stopping set. That's why it's here. Here, we cannot apply the decoder. And it's a stopping set. So if we try, we don't find a dangling check. We cannot even start. It was just two of those bits. So a stopping set is not one question mark. It's a subset of locations, a subset of bits. So the subset that is the set of three bits that we have here is a stopping set. If you tell me I take just this one, this is not a stopping set. Or if you tell me I take those two, this is not a stopping set. Any subset of the bits, any subset of. I don't take any subset of that. I fix my question marks. Here, the subset that I look at is a subset 2, 3, 5. This is a subset of question mark. I want to know, can I correct it? And to know that, I'm asking, is it a stopping set or not? So to know if it's a stopping set, I look at the three checks. This one is incident 2, this one is incident 2, this one is incident 2. So I cannot correct this set. We can chat more about that if it's not clear. Yes? So my motivation is two folds. First, I want to give you an intuition for local decoding because you will see the equation for the belief propagation. It's a bit more technical. And second, it's a useful tool in practice. I'm going to discuss the union find decoder, which is based on reducing the problem of correcting poly errors to erasures. And I'm going to discuss two LDPC decoders, which can be improved using an erasure decoder. Yeah, after we can discuss in which physical model this kind of thing applies, for now my model is depolarizing channel. What I'm aiming for is depolarizing channel and noise in a circuit, which is depolarizing the qubits. But I like the loss of a qubit, loss of a photon, as an erasure model. It's where a erasure decoder applies also in practice. OK, so the point that they, so now the point I want to make is that they manage to design families of codes by selecting random graphs and removing short cycles that removes stopping sets. And those codes are optimal. They achieve the optimal performance, they achieve capacity, and they come with a linear time decoder. So even though we are often stuck, we can design codes for which we are rarely enough stuck. And the basic idea is to remove short cycle again. If you want more about information theory, I was not very careful with the references until now. But those three are great books for classical LDPC codes and classical information theory and the link with statistical physics. I'm going to add more references at the very end of the lecture. OK, now let's look at a channel closer to what we care about, the binary symmetric channel. Each bit is flipped with probability P. And when we have our channel, we apply it to our bit string, we receive another bit string where some of the bits are flipped, OK? And what we are going to compute is what is the probability to have a bit with value 0 in our input message when we receive Y. So I see only Y and I want to know what was the input bit. Was it a 0? What is a 1? Was it a 1? To know that, I just compute those two probabilities and then I can use this information to correct. How do you think I correct? If I tell you what is the probability that bit number 1 was 0 and 1, I give you those two numbers. So most likely, yes, so we can pick the one with the largest probability. So this is what the decoder does and this is what we mean by computing marginals. It's only computing the marginal bit flip probability for a given bit. It's a local decoder. It's not doing a global correction. Yes? That's true. If we correct the first bit, it may change the probability for the second bit. Is that what you mean? Yes, that's true. We are going to correct all of them simultaneously. OK, now we need to compute this probability. And to compute it, we want to evaluate something that looks like that. It's a sum over all codeword where the value of the first bit is 0. I want to compute the probability that the first bit is 0. And I sum over all the other variables. And I compute the probability to have y given x. And by this sum, I get my probability to have my bit equal to 0. This sum, we can rewrite it a little. This probability to have a flip here is independent of each bit, so we can write it as a product. It's a probability that we get y i given x i. Each bit is sent through the channel one at a time. And this 1x in C, what is it for the revision code? This function is simple. It's 1 when the bit string x is in the code. And it's 0 if it's not. So for that, we can use the checks of the code. And we know that x is in the code if and only if x1 plus x2 is 0 and x2 plus x3 is 0. We can rewrite that as 1 plus x2 plus x3 times 1 plus x2 plus x3. These two things are 1 when the two values are 0, when the two checks are 0. So what I want to say is just that this thing we compute, it's a sum of products. And typically, those products have a small number of bits, a small number of variable. So it looks like that. It's a sum of a product of function. And in addition, because we have LDPC code, each function f i, it comes from a check. So it's acting on a small number of variables. So now we want an algorithm to compute that. How many multiplications do we need to do that naively? So this sum is over all the possible code words. So there is two to the k code words. And when we have two to the k code word, we do all of the order of m multiplications. Oops, it should be a m. So it's many multiplications. It's exponential in the number of logical bits of the code. We can do better than that. And the goal of belief propagation is to do this calculation with fewer operations. k is the code dimension, because we sum over all the bit strings that are inside the code. Yeah, it's a number of logical bits, or qubits in the quantum setting. So for instance, if you want to compute a sum like that, do you have a way to simplify it? We can extract, okay, yeah, we can extract the second term or the first term. And it's faster, right? But it's hard to see in general what kind of transformation we can do. If I give you this one, in which order should we evaluate those functions? Should we multiply them? Should we add them? It's hard to see. And we're gonna do that with a graph. This graph is called the factor graph. We have two types of nodes. It's gonna look like a familiar graph. There is a reason for that. We have some node corresponding to the fi to the function, and some nodes corresponding to the variables. It's called a factor graph. And we're gonna use the topology of this graph to compute, to compute this sum. So I'm gonna show you this graph. What we want is a sum where X1 is missing. So I fix the value of X1. I want to know what's happening to this sum when X1 is fixed to zero and X1 is fixed to one. So I want an output here on top. And I'm gonna do the calculation by flowing through this graph toward X1, towards the variable I want to evaluate. And to do that, we do two... This one's not working. Oh, thank you so much. Can you hear me? Okay, great. Thank you. Okay, so we have two types of operation in this graph. We are gonna go from variables to functions, from bits to checks. And when we go from a bit to check, we take a product. So this one is gonna send a message to the check, which is a product of all incoming messages. And when we go from a check to a bit, we do a sum over all the inputs. And when we do that, we are gonna do our sum of products. So I'm gonna show you what it looks like. From a bit to a check, we take a product of everything inputs. But those bits have no input. So the empty product is gonna be one. So there is nothing to do for now. Now we go from checks to bits. So here, I take the sum of all the values. There is no input bits, so we don't sum over anything. It's also a trivial case. Now let's look at one non-trivial case. We take the product, the sum, so it's a check. So we take the sum over all the input variable. So I sum over all value of x5 and I multiply by the previous message. And I do the same thing all along. It's a bit, so I take a product of all the inputs, a product of this side and this side, give me this product, evaluated when x4 is zero. And here it's evaluated when x4 is one. So I look at both values simultaneously. And I keep doing that for all the edges and this give me my exact formula. But the way we factorize this function is much more efficient that we will have done it naively. And this is what belief propagation does. It can compute this thing, which is a marginal error probability of this bit. It's telling you if there is a flip on this bit, by sending messages through this graph. And this factor graph is actually a tanner graph. We saw that the function we want to evaluate is a product of the checks. So by sending messages through the tanner graph, we can get our bit value. Now I'm going to ask you the same question as in the Erasure case. Is that realistic? Can we apply it in practice? I do it by look, so this graph represents a function we want to evaluate, right? So we build this graph by looking at this function we want to evaluate. In our case, it's always a sum of product. And we are going to put one bit for each of the value, X1 to X6, and a check for each of the F. So F3 act only on X4. So F3 is connected to X4. F4 is acting on X4 and X5. So FF4 is acting on X4 and X5. So the relation between the bits and the functions is represented by this graph. Okay, so now is that realistic? Yeah, I heard the answer. Like in the Erasure case, it's not a tree in practice. In practice we have cycles. So BP works exactly in a tree. It's a good approximation in a graph if we have large girth, if we have no short cycle and we know how to build that. So for our DPC code, we select random graph, typically with large girth, and we get a good approximation of those bit flip probabilities. And we can, we have even an algorithm to produce those large graph with no short cycles that look locally like a tree. So in practice we do it in the graph simultaneously over all vertices. So it will all send a message to their neighbors and constantly each vertex is sending a message to the neighbor. So it's equivalent to looking at a ball. Okay, now I can talk about the quantum case. So in the quantum case we have the depolarizing channel. So it's the same thing as our bit flip channel, but it's supplying the identity X, Y, or Z to our quantum state. Yes? Yeah, if you have, if you know for instance that your knowledge rate is extremely low, then you can build a lookup table and you can assume that there will be only one bit flip out of a thousand. But does that mean that you're using the right code if you know that there is one bit flip and you use a very large distance code? So typically we have enough bit flips to make the lookup table impractical. Yes, otherwise that means you are using a code that is too strong for what you need so you could store more information. So it's, yeah, but it's a good point. If you have a very low noise rate, you can deal with a very simple lookup table. Okay, so now let's, so in the classical case we computed those two bit flip probabilities and we select the bit with the largest value. So we correct the flip that is the most likely. In the quantum case, can we do the same thing? So we have four things to compute. We have a syndrome and we want to compute the probability that a given qubit has an error. It can be I, X, Y, or Z. We have four probabilities to compute. We can compute that, but it's not gonna work well. And the first reason is that, look at the telegraph, it's full of short cycles. And it has two because of the commutation relations. Because those stabilizer generators commute it creates four cycles. So it doesn't look like a tree and it cannot look like a tree. The second thing is that even asking this question, what is this probability to have an X on qubit I, does not mean anything. It's not meaningful because we consider errors up to a stabilizer. The stabilizer does nothing to our quantum state. So if there is an X on one qubit and I apply an X stabilizer, it doesn't change my state but it changed the value of the error. So this question is not well stated. And that's what makes BP underperforming for stabilizer codes. So we're gonna look at two different decoders. The union find decoder for LDPC codes and BP OSD. In the case of BP OSD, in the case of LDPC code, I think the BP OSD perform better. But it will be good to have more, to understand that better and to have a comparison. And in the case of, yeah, but it's fully heuristic. Our goal with the union find decoder for LDPC code was to try to prove something. So we can prove that we correct the polynomial number of errors and we are trying to push it further. I don't know, I think it's very, it's still research questions, what I'm gonna discuss about now. And there is still a lot of improvement needed for these two decoders. The first thing is that you will see they are too slow. The complexity is very far from linear. So I'm gonna first give you an intuition for the surface code. Then we are gonna move to LDPC codes. So in the surface code, we have this tanner graph with our qubits and our two types of checks, X checks and Z checks. And we are gonna focus on Z errors to simplify. So we have a new tanner graph where we look only at the X check. And what's happening when there is an error in this graph is there is a Z error on one of the qubits. The two incident checks light up. They see a non-trivial value. There is a non-trivial syndrome. So we measure something. And when there is a path of errors, then we measure the endpoints. Those syndrome along the path can sell. So basically what we see about the error is the end of chains of errors. We see the endpoints of those chains. And our goal when decoding is to recover this chain given the endpoints. And we have also this notion of stabilizers that's gonna help us. There is some error configurations, some chains, some loops of errors that are trivial. And those that look like a loop, they correspond to a stabilizer. And those that connect the same boundary. If they connect left to left, they have a trivial syndrome, but they are stabilizers. They are not non-trivial logical errors. The one that is problematic is the one that connects the two opposite sides. Those have no syndrome. They have no end. There's no check at the end. They are not detectable, but they apply a logical operation. They change the quantum state. So we want to correct up to a loop, but not up to something that connects the opposite sides. And I'm just gonna give you the basic idea here. The basic idea is to say we see those red vertices. We want to find a correction, a minimum weight correction ideally, or a low weight correction that matches those vertices by pairs. What information do we have when we see that there is one vertex that's lit up? If there is one check that is not satisfied, what do we know? Exactly. If there is a check with non-trivial values, there must be an error in one of the four incident qubits. Let's use that. What we're gonna do is erase them. I picked the wrong one. If this check is on, one of the four incident qubits must contain an error. I'm gonna start growing clusters around the information I have, around the syndrome bits. I do the same thing, starting with a small cluster. Those two have ground already. I'm gonna grow the others, and sometimes two clusters meet. When two clusters meet, I'm gonna look at this entire thing and I'm gonna ask, can I correct this erasure? Can I find a correction inside this cluster, restricted to the cluster? Can I, in this case? Yes? Okay. I can find a correction. I correct this qubit here, and it's turning off those two checks. I can find a correction locally. I'm gonna keep this cluster alive, and I'm gonna keep growing the other clusters, those that have not been corrected yet. I'm gonna grow those two clusters, starting from the smallest one, and at some point they meet again. When they meet, I ask, can I find a correction inside this one? How do you know it? Sorry? There is a connection between them. So, is there some information I can track to know immediately if there is a correction inside this cluster or not? Yes, exactly. If there is an even number of them, I will be able to find a path connecting them. They are in the same cluster. So I just need to start one bit to know if I can correct that. So we start one bit, we update it when we grow the clusters, and when we see that this bit is zero, we stop growing, and we keep growing the other clusters. And at the end, we only have to correct an erasure. And correcting an erasure is easier. We previously worked out a linear time erasure decoder. We just have to plug it in, and it gives us an algorithm whose complexity is dominated by growing clusters. You can think of erasing all these qubits in this cluster. I know that there is a correction inside the cluster, so I just have to find it inside a subset of qubits. So you can think that, I give you a set of question marks, and I know that inside, you have a correction. Yes? I'm saying if you want to know if this cluster is correctable, you just need to count the number of syndrome bits, of red vertices inside. If it's even, that means you can find a correction. If it's odd, you cannot. Because a correction is going to be a path, a chain of errors that connects two vertices, that connects pairs of vertices. Because each correction connects a pair, you must have a pair to correct it. Exactly. If it's four, I'm going to pick whatever two pairs. But because I do it locally, I know that all paths are equivalent up to a stabilizer. Because those local loops have no effect. And I know that it's going to work. Okay. So now... Oh, sorry. I'm not assuming anything. It works for any fault configuration. Here, there is one error here and one, two, three, maybe. In this case, there is one error, but I don't assume it. If there was more than one error, I would have grown further and found a different configuration. It's hard for me to visualize that maybe we can talk about this technical case after. What's the other question? So it does happen, and the same thing happens, let's say, three cluster meet. You just compute the parity of those three. If the parity is even, we can find a correction in the union of those three clusters. Okay. So now I want to tell you why for a long time we're thinking it's not going to work with LDPC codes. I told you they look like that, right? LDPC codes. So they look like that, and that means when we start with a cluster with vertex, with one vertex, if we grow it with adding the neighbors, we get this red part. If I grow it a second time, I get that, and if I grow it a third time, I get almost the entire graph. So if I can grow only three times to find the entire graph, it's not going to work, right? And that's what we were thinking, but actually we managed to prove that for some LDPC code with enough expansion, we can correct a polynomial number of errors with this strategy, with the same strategy for LDPC code telegraph. And the key ingredient is to look at this notion of covering radius. I'm just going to give you an intuition and then I'm going to state the result. The idea is that you start with some syndromes. Those are some red checks in the telegraph, and you have an hydrate like this bulb. And you're going to look at the size of the balls you need to grow to cover the entire arrow. So you keep growing your balls until they cover everything. We stop here, this is the covering radius. This is the minimum radius we need to cover the arrow. And what we can prove is that if this radius is small, then we are going to be able to correct with the union find decoder, because we don't need to grow too many times. Even though we grow fast, if we can prove that we only need to grow a small number of steps, it's enough. And with that we can apply the union find decoder to correct some polynomial number of errors for quantum expander codes or for hyperbolic codes or for toric codes in iDimension. So we can expand the range of the union find decoder. But still, it's not fully satisfying. It's an unopened question to push it further and to reach the full distance, distance over 2, instead of this polynomial. And we also lose the fast complexity. So in the case of surface codes, the complexity is n alpha of n where alpha is this inverse sacrament function, which means that if the number of qubits is smaller than the number of atoms in the universe, this alpha of n is so slowly growing that it's at most 5. So it's basically linear function. It's very slowly growing. And that's what we call almost linear. And for ldpc codes, the complexity jumps to n to 4 or n cubed log n for expander ldpc codes. So there is a big jump. And the issue is that we cannot count the number of qubits and the main issue is that we cannot correct this array's cluster efficiently. So the main issue again is erasure. That's one of the reasons why I like the erasure decoder problem. Okay, I'm going to stop here. I think I'm running out of time. Or maybe I'm going to give you the intuition for the BPOSD in one minute. So BPOSD decoder, by Pentelayev and Kalachev. And the idea is more solving a linear system. So we have an error. We have a parity check matrix. It's basically our tanner graph, right? And we have a syndrome. The syndrome of this error. And our goal is to find a likely error whose syndrome is s. The way they do that is by estimating all the bit flip probabilities using BP. So you need BP. You need the classical BP. Or you apply it to X, Y, Z. But it's the same algorithm to compute those probabilities. Here there is a probability 0.12 to have a bit flip on the first bit. And then we select a basis of the columns of H, the column space. So we need three vectors here. We select a basis with the highest probability. So the first one will be this one, the second one is 0.31 and the third one is 0.17. And once we get this matrix we are going to solve a simplified linear system which is based on this matrix. When does this matrix give me the right syndrome? The advantage of this matrix is that it's smaller and we can invert it typically. Because of that we can find X prime and we can reconstruct X from X prime. So doing that is great. What is the cost of doing that? What is the complexity of doing that? N N square. So the most expensive will be to solve a linear system. And solving a linear system by Gaussian elimination it's going to be N cubed. So it's expensive. We would like to get it to linear. That's why it's still a research question. There is still a lot of work to go down there. But it performs very well. So it's already a good first step. And now I'm going to leave you with this question. I gave you a few decoders. BP, so you need to find BP OSD. All have issues. BP doesn't work because of short cycles. So you need to find it doesn't achieve the distance and it doesn't work that well numerically for some codes. BP OSD works well in practice, but it's heuristic and it has N cubed complexity. Which one do you pick? How did we generate those plots? So we use the first one, the most terrible one. Because it's fast and because it has some fault-tolerant structure by nature. So I'm going to discuss I'm going to summarize how we put everything together in the next lecture. Thank you.