 We're very pleased that so many of us joined us today for the first ECS Plus of the fall. So I really apologize for the time delay. We keep trying to figure out what is going to be the next big thing with Google Hangout. But it's always a bit of a challenge. But anyways, we have a speaker today. So that's excellent. Before moving on to the talk, you're all muted. But remember, please ask questions. Just unmute yourselves so that you can ask the question. And while maybe in the interest of time, Odette, did you want to go around the table, or should we? Otherwise, I'll feel useless. Seriously, I can just quickly mention the group from UCSD, University of Michigan, UCI, University of South California, Caltech, Columbia, NYU, and many others. But I think in the interest of time, as you say, we should probably continue. So hi, everyone. Thanks, Odette. And yeah, thanks, everyone, for joining us. So we're really, really happy to have so many people coming for the talk today. Let's not forget the people who are helping out, but did not show up today. So there's Anindya Teh, Ilia Rosenstein, Tremont Kanon, and Coman Kamat. So thanks, guys, for organizing TCS Plus for us. And so for this first talk of the fall, we're very, very pleased to have Yuval Perez give the talk. So Yuval got his PhD in 1990 from Hebrew University. He's now a researcher at Microsoft Research in Redmond. Yuval is very well known for his work on probability theory and interplay between probability and geometry, in particular, for the analysis of proculation on infinite graphs and in general study of combinatorial structures on infinite graphs. But today he's going to tell us about the trace-to-construction problem. And so thanks a lot, Yuval, for joining us. I hope your sound is still working because it's two years now. Okay, so thanks a lot. And again, apologies for the delay. So I'll discuss this joint work with the Fedya Nazarov and Alex Zhai, and I'll mention other people's work as we go along. So the deletion channel problem can be described as follows. So Alice wants to send to Bob a message. Alice transmits the bits one by one, but each bit can be deleted with a fixed probability Q. Bob, who receives a message, doesn't know which positions were deleted. So all he sees is a shortened compressed string Y. So this schematically depicts the situation. Bob's goal is to reconstruct X. So of course he can't do that from a single output Y. By the way, this output is called the trace. So we have one trace. And the question is, how many independent outputs will he need to reconstruct X? So DQ of X is just a distribution of strings that Bob receives of the deletion channel. And he has T independent samples from this distribution. And the question is, for which T can Bob reconstruct X with probability delta? So there's a closely related problem which appears much easier, but is not that much easier, which is that the input to the channel was one of two vectors X or X prime that are given to you. You just have to decide between these two. Certainly it's easier, but it turns out if you can do this hypothesis testing in T steps, then just n times T steps will suffice for the reconstruction problem. The reason is that suppose you can distinguish X and X prime with probability three quarters using T samples or using T traces, then if you my traces by n, then you will just by using the majority decision, you can make the decision right and probability exponentially close to one. So by using a large constant times n times T samples, you can ensure that the probability you will decide wrongly between X and X prime is much smaller than two to the minus n or four to the minus n. So all the decisions between the different pairs of vectors correctly. And so that is you can handle this union bound and so you can reconstruct the original string X with high probability just by multiplying the number of samples by n. So just to preview where we're going, the big open problem in this subject which is still open despite all the work on it is the nominal number of traces suffice to reconstruct. This we absolutely don't know and what I'll tell you is what we do know. When I say polynomial is not known that's for the worst case problem. So the problem comes in several flavors. Two of them are where X could be an arbitrary string. Maybe that's the most important variant. And then when we say here probability three quarters we want that to be true uniformly no matter what the string is. And when we say that then there's the version variant which is the average care X is chosen uniformly at random. Or in other words, we don't have to succeed to reconstruct X for all strings. It's enough to succeed for low of one of the strings in the reconstruction. So this problem arises naturally in a variety of applications in seeing there's a nice survey on the deletion channel by Michael Mittenmacher that you can read which mainly concerns the capacity of the channel but also talks about the trace reconstruction problem. So the trace reconstruction problem was raised first in this form by Batu, Kanan, Kana and McGregor in 2004. And they also note for bound that is the number of traces needed to distinguish is at least order N even for two specific sequences. That is if the sequence X and X prime both concern the vector of zeros followed by a vector of ones then do it between the 50 and come back to that. So that's a lower bound of order N. And yes. So just a quick question. Your question. Are you going to say anything about the computational aspect of this so far it's been only. Yes. Yes. So just a quick question. Your question. Are you going to say anything about the computational aspect of this so far it's been only. Yes. Yes, I'll say something. So it turns out the main thing is really the sample complexity and you can using a linear programming trick handle the computational complexity. I'll say something about that. You know just to make sure because there's some audio chopinance that if you can just close your VLC window it might help by feeling there's some overload on the computer there. You close it. No, those just minimize it. Just close it, yeah. Even though we won't see you. So shut it down. Just you can shut down the VLC this video because completely close the window. It's overloading the computer. Yeah, that's it. Okay, thanks. I didn't close it, I just reduced it but is that okay now? I think you should try to close it because I think the computer is slowly collapsing under the load of the hangout. Yeah, I think you should just close it. Yeah, we won't see you but that's fine. Yeah, thanks. Okay, so how is it now? Yeah, so far so good, yeah, thanks. So the, okay. So the previous, so one observation is that if X and X prime are two n-bit strings with different hamming weights then you can actually distinguish them in order n traces just using the hamming weight of the output as a test statistic. So if the input has different hamming weight it means the output in expectation has different hamming weight and the difference in expectation is going to be large enough that you can distinguish the, right? So it's going to be half if we look at deletion with probability of half and then you can easily compute just using the test statistic and using mean and variance and cherish of inequality that order n traces suffices to distinguish. So really the difficulty in the problem is if you have a problem with the n-bit string and the problem is if it's distinguishing between strings that have the same hamming weights then in particular they can't differ just as a single position. The previous best upper bound was e to the order root n they actually had the extra log there so e to the o tilde of root n in the worst case and the polynomial bound in the random case but the latter bound was only for q less than 1% only for deletion probabilities less than 1%. So this was in work of Hohlenstein-Mitzenmacher-Panigrachian Wider in 2008 and there was no progress until last year and last year there were two papers one by Nazarov and myself the other by an India Day, Ryan O'Dolle and Rocco Servedio both papers are in stock this year which reduce the e to the root n to e to the cube root of n. Although we have no idea if this is optimal we do know it's optimal for a certain class of tests namely the n-bit string. The linear or mean based tests and you'll see more precisely what these tests are later in the talk. So the new result, so I'll tell you the proof of this and also the new result which is appearing in this year's Fox which is for q less than half a random x can be reconstructed in sub-polynomial time. So in fact e to the constant root log n and at the very end I'll mention more recent work which is unpublished where the restriction q less than half is removed but most of the talk I will make this assumption when dealing with the random case. Any questions so far? Yeah, just to make sure you said this but so here when you presented the last result you said in sub-polynomial time but it's really the sample complexity right? It's really sample complexity then focusing on thank you but in fact the time of course you need at least end just to scan the whole sequence. So the sample complexity thank you. So I wrote t but t represents the number of samples to see to the constant root log n. The time in this case will be actually close to linear so it will be n multiplied by a sub-polynomial factor. So almost linear for this result in terms of complexity but we'll focus on the number of traces. Okay so then let me first recoup the basic lower bound. The lower bound comes from these two strings x which is half zeros, half ones and x prime where the boundary between zeros and ones is moved by one. And if you think about distinguishing between the outputs of x and x prime, the information you get is basically in each output is also going to be a bunch of zeros followed by a bunch of ones and the information you get is how many zeros do you see? And so the problem basically reduces to distinguishing two binomials, one with parameter n over two and p and one with parameter n over two plus one and p and it's classical factor statistics that the best you can do in this kind of distinguishing is just to use, take a lot of samples and compare the sums from this distribution and that distribution and then it's very easy to see you need order n traces to distinguish these two cases. So that's just by looking at mean and variance and the central limit theorem. So that's for the worst case, for the random case there is a nice argument by McGregor Price and Vroth Nikova that gives you a log squared n lower bound. The trivial lower bound is again but one can get an extra log in there. Okay, so now I want to talk about the upper bound first for the worst case. And here for notational reasons I want to focus on deletion probability a half. The proof in general, which you can find on the archive and in stock is very similar. So I'll write Q here but then we're going to set Q equals a half. So maybe the first easiest thing is what if I just see the first bit of the output? Then I can look at the expectation of the first bit is just a combination of the bits of the input with probability half I get X zero with probability a quarter I get X one and so on. So the expectation of the first bit in the output is just the input written as a binary decimal number, right, point X zero X one and so on. And in particular, if the X and X prime agree in the first K digits, then seeing the expectations will be two to the minus K. And think of two sequences that agree say in the first N over two digits then the difference in expectations will be very small. And so if you just want to distinguish using the first bit of the output you will need exponentially many samples and exponentially many samples will be enough. So, but we want to go lower than exponential. So we have to use more than the first bit. So you can get an improvement if you try other output bits besides Y zero. So let's first calculate what's the expectation of YJ. So why the J's bit in the output? So for YJ to come from XK, the bit XK has to be preserved and exactly J bits among the K earlier bits should be retained. And so the probability of this will be half for this bit and one over two to the K choose J just for the binomial distribution from the earlier bits because we want all these K choose J exactly J to be retained. All of these K bits exactly J to be retained. So we get this formula for the expectation of YJ and this formula really suggests you should look at the generating function and we're doing that here. So if you take this multiplied by W to the J and sum over J which is what we're doing here and change the order of summation you just get this binomial expansion. So the factor in front of XK when you do that will just be the binomial expansion of W plus one over two to the power K. So is everyone with me? So this is kind of a key step because this formula will be the key to analyzing the worst case X to the best of our current knowledge. Any questions on this? Okay, I'll move on. I wonder, okay, how do I, okay. All right. So we're going to call this expectation psi Y of W but notice that this just means this is an expectation. So it doesn't depend on a particular output Y rather just on the distribution of Y. So this is the expectation we saw before which satisfies this formula. Our next goal will be, so maybe moving ahead our goal in the end will be to find some bit in the output where the expectation of YJ and the expectation of YJ prime will differ significantly. And towards that end, we will want to find the W which is not too big where psi Y of W and psi Y prime of W differ substantially. So writing Z for W plus one over two we have the difference between this expectation for Y and for Y prime. So these are two, the random inputs coming from X and X prime, the random outputs coming from X and X prime. This difference is just has the simple expression in terms of C to the K and I omitted a factor of half here but so there is another half which won't matter to us. So suffices in order to use this formula, our goal would be to find the Z which is not too large where the right hand side of this expression is large now or better to say not too small. And the key to doing that is the following fact from complex analysis which I'll briefly recall the proof later. It's a theorem from 1997 but it could have well been proved 100 years ago or more. So it says suppose you have a polynomial with the first coefficient being one and all the others being at most one in absolute value. Okay, so we're looking at this polynomial. It can't of course vanish on the whole unit circle because it's not the zero polynomial and it cannot even vanish on any arc. So we want to quantify the latter statement that can't vanish on any arc. So suppose you have a small arc of length one over L on the unit circle, then the theorem of Borwen-Edelay, there's a point Z on the arc where F of Z is at least exponentially in L. So at least E to the minus CL where C is some universal constant. Okay, so let's accept this for now and see what it means for us. So this is from 97. So we're going to apply it to this series we were just looking at where the coefficients. So what can we say about XJ minus XJ prime? So we X and X primer are different. So these coefficients are not all zero. It's not true that the zero's coefficient is one but what we can do is if there are a lot of zero coefficients in the beginning we can just factorize out some power of Z to bring us to the situation where the first coefficient is either, well, in this case it's either, I'm looking at this difference. So the first coefficient, non-zero coefficient is either one or minus one. If it's minus one, just flip the sign. So in his conclusion, we can, all these things don't change the value on the unit circle, right? If you remove a power of Z it doesn't change the values of F on the unit circle. So from this theorem after factorizing by a power of Z and maybe a plus or minus one we can conclude that there is a Z. Remember Z was W plus one over two where the difference between psi Y of W and psi Y prime of W is at least E to the minus CL. So remember the connection between Z and W and this is depicted in this picture. So Z is running on the unit circle and Z is W plus one over two. In other words, W is two Z minus one. So as Z goes on the unit circle W goes over a circle of twice the radius which is tangent to the unit circle at one. So this is a circle or center that minus one but the important thing is that these two circles are tangent at one. So we're going to take a small arc of length one over L near or order one over L near zero on the unit circle and restrict Z to be in that arc. So that's where we're going to apply the Bowen-Erdely theorem and if Z is in that arc Z has absolute value one. Now W is on the outer circle. So it's absolute value is bigger than one but if we're on this arc close to the close to this close to one then W is not much bigger than one because of the tangency the absolute value of W is one plus order of theta squared. Okay, so this is easy to see from the tangency or just by a simple trigonometry. Just to know Oded how long do I have for the? I think we should go slightly over time but definitely 20 more minutes but you can go 25 or 30 more minutes. Okay, thanks. Okay, so all right. So with the theta which is going to be in this arc of order one over L we get the W is one plus order one over L squared. So the conclusion is that for such a W which is as we said order one plus one over L squared. So you can think of this it's useful to think of this as E to the one over L squared. We can find a W so which satisfies this inequality. So let me recoup. We know for Bowen-Erdley there is a Z on the unit circle and on this arc here, on this arc there is a Z where F of Z is bigger than E to the minus CL. We use the identity, the basic identity we verified here to deduce from that that we can find a W where this series, some EYJ minus EYJ prime W is the J is bigger than E to the minus CL. This is just rewriting what we had before in terms of the variable W. Now remembering that W is order E to the one over L squared give us that any power of W less than N is going to be bounded by E to the CN over L squared. And so if you use this inequality and move the W to the J to the other side we deduce that for some J the expectation of YJ minus expectation of YJ prime is at least one over N times exponential of minus CL minus CN over L squared. So again, the E to the minus CL comes from this and E to the minus CN over L squared comes from the bound we have on W. So all together we get this expression and when you look at this it's clear what choice of L you want to use. So that's where the cube root of N comes to just balance these two factors in the exponent you want to take L which is cube root of N and then you get E to the minus constant cube root of N and of course the one over N you can just absorb into this by just changing the constant in the exponent. Okay, so now we're almost done because what's in at least distinguishing between two specific strings X and X prime because what we found is for any two strings X and X prime there is some location in the output J where the different expectation is at least this value of epsilon which is E to the minus constant N to the one third. So if you take, so now you have a lot of independent outputs each of which comes from one of two distributions and these are just bits. So we're just looking at the J bits. So if I have a bit and I want to distinguish which distribution it comes from and the two possibilities, different expectation by epsilon you know that you need one over epsilon squared observations to distinguish. So one over epsilon squared is still this E to the sum other constant times cube root of N. So that number of samples suffices to detect the difference in means and probability of choosing wrongly between X and X prime is you can write the turnoff bound. So it's E to the minus order T epsilon squared. So the bottom line is if you choose T to be really larger than one over epsilon squared again just by adjusting the constant in front of the N to the one third then the probability of choosing wrongly becomes much smaller than two to the minus N. This just again, I'm saying again what I said in the beginning that if you can distinguish two specific strings any two specific strings in some number of observations just by multiplying that by N then you can actually do the reconstruction problem and tell what is the right string. So because you can handle the union bound. So this proves the upper bound E to the cube root of N on the sample complexity. Any question? Yeah, I've got a question. So the theorem you're using like the complex analysis theorem you were using is it tight? Is it known to be tight for some particular function? Yes, that is coming in the next slide. So I'll say something about that. Okay, that's a very good question but I have a slide about that. Anything else? Okay, so we'll move on. So there are several issues that were raised here. One was the complexity and there's a trick already in Hohlenstein to tell how to reduce the running time to the basically to the sample complexity. So using linear programming. So instead of the way I described it so far you would need to compare all possible strings of length N so you'd get complexity two to the N even though the number of samples is smaller. However, suppose you instead reconstruct bit by bit. So the first bits are very easy to reconstruct but let's just do the general inductive steps. So suppose you've reconstructed the first M bits and now we want to determine the next one XM. Okay, so well, we just have two possibilities. Either XM is zero or XM is one and we'll write two linear programming corresponding to these two cases. So we have a bunch of output bits and let's call YJ bar the average of the output bits that we've seen in location J, okay, in the T samples. As before we'll write L to be a cube root of N. And now consider these two linear programs, one where XM is fixed to be zero, one where XM is fixed to be one in the variables. So the remaining variables we haven't determined XM plus one to XN, we're going to relax them instead of being bits, they will be real variables in the interval zero one. And for each such choice of X's I can formally compute what would be the corresponding expectation of the output. So I have this formula for the expectation of the output and then the restriction I want that the expectation of the output minus the empirical average has to be smaller than E to the minus constant cube root of N. This restriction is now a linear program in the relaxations of the undetermined variables and the complex analysis lemma basically tells you that only one of these two linear programs, namely the one corresponding to the correct choice of XM will be feasible. So in the complex analysis lemma, all the coefficients, the first non-zero coefficient should be one, the rest should be bounded by one. So the complex analysis lemma applies to tell us that of these two linear programs, only one will be feasible. So we just determine in polynomial time which of these is feasible and just go with that choice. So the sample, so the running time complexity is the sample complexity and it is consumed by just computing these empirical averages. Once you have these averages for the output, then the rest of the running time is polynomial. Okay, any questions? All right, so let me just quickly sketch the reason for the Bowen-Erdely theorem. So this requires a bit of complex analysis. So remember we have an arc in the unit circle of length one over L, we have this function f which was a polynomial with a zero equals one. And we use a basic fact that for any analytic function log absolute value of f is a sub-harmonic function which means the value at a point will be less than the average on the boundary according to harmonic measure. Mainly if I take this function log f at the origin which under my assumption is going to be zero, it is less than the average of log f along the boundary of the domain. But the average is taking not with respect to the arc length measure but with respect to the harmonic measure which is the heating measure of Bryanian motion. Luckily here our domain is so nice that there's this heating measure and the big measure only differ by constant factors. So you can ignore this difference. And so you get that zero is, which is log of absolute value of f at zero is bounded by this integral of log f on the boundary of the domain. So what is, so we have this arc of the, okay, we have this arc of the unit circle and so the unit circle here is on the outside and we take this arc of the unit circle and complete it with the circle which is inside the unit circle, the green one. Okay, so the domain we're going to use will be contain this arc of the unit circle and then the rest will be the green curve which comes from a smaller circle inside. And then, so rearranging this inequality here gives us a comparison of the integral of log f over the blue curve which is in, so, okay, so here I'm going to, okay, maybe this is the colors are, so the colors are coming in strength. So it's really, I don't know what color you see this but the point is that the, okay, what you get is a comparison of the integral on this curve to the rest of the integral and just by plugging that in, you get the bound, I think in terms of time I'm going to skip the rest of this point in elderly proof, but they come back to the question that was raised. So the boarding elderly theorem is sharp, so there exists in polynomials that saturate it and this actually implies it takes a little bit of work, this implication, but that is in both of the papers that there exists input strings of length n so that in their outputs they differ by, okay, this should be a negative. So in the outputs they differ by less than e to the minus constant cube root of n, so the minus sign there is omitted, but it should be there. So the expectations are very close in all the digits. So in other words, if you're going to use any linear test of the type I described, then you need e to the order cube root of n, if t is e to the little of cube root of n you just can't tell apart the outputs from x and x prime by such a linear test. So it would be really lovely to look at such an x and x prime and decide can we distinguish them by this particular x and x prime by other tests. The problem is the proof of this existence is a pigeon hole proof, so we don't have a constructive method such x and x prime for large n. So that's one challenge. Okay, so in the last 10 minutes I'm going to move to the random case. Sorry, can I ask a quick question? But this polynomial for which Barway and Erdely is tied, so this polynomial is explicit, some concrete function, right? No, no. Oh, okay. That's what I was emphasizing, it's coming, so they use a pigeon hole argument. So no one knows how to construct this polynomial. Exactly, there are lots of polynomials, two of them will be close on some net, and then you bound, hello, is everyone still there? I saw some message. Okay, so the proof is a pigeon hole proof on the polynomial side, so we don't know such a polynomial explicitly. Okay, so let me switch to reconstruction of random strings, and so fix a, so now I'm going to assume that the deletion probability is less than a half, and write p for the retention probability which is bigger than a half, and the strategy is based on alignment, so we're going to construct, reconstruct the bits one by one, and for that it will be important to align the input and a particular output given what we've reconstructed so far. And, okay, so the first step will be a greedy matching, so again, what is the basic strategy? We're going to reconstruct the bits of the input one by one, so suppose we've constructed some subsequence of the bits, and now we're looking at one output that's supposed to help us and we construct the next bit, so in order for this to really help us, we'd like to know each bit in the output, approximately where it came from in the input, and we can't know this exactly, so one naive thing which works for Q less than half is to be greedy, so just map each bit in the output to the first location it could be in the input, so this one, it really came, the first one in the output, it really came from the second one in the input, but we just, the greedy algorithm will map it to the first bit, and similarly, the second one, if we want to use the greedy algorithm, the second location in the output, which is a zero, the greedy choice will map it to the third location in the input, even though again, that's not the true origin, and we can continue this way with the greedy algorithm, and you see that sometimes, even if the greedy algorithm is wrong at some locations, it can catch up and be correct in later locations. Okay, is it clear what's the greedy algorithm? Okay, now we're going to think of each of these, the true location of a bit in the output, and the greedy location as some kind of random walk, so when you go from the input to the output, you compress by a factor P, so P is the retention probability, every bit is retained with probability P, so reversing that, if I'm going along the output and trying to see each bit where it came from, that is going to be a random walk that's moving forward at speed one over P. So that's for the true reconstruction. Now, how is the greedy going to behave? Well, the greedy is always capped by the true location, but if the greedy falls behind, if the greedy, in other words, if the greedy points to an earlier bit than the truth, then it starts to move at rate two, because if I have a sequence and I'm looking for a sequence of one zeros and sequence of bits and I'm looking for it in the wrong location, then every bit, namely somewhere where it didn't come from, so for every bit, I have to wait on average two bits until I find it because it's a geometric one half variable. So I have these two trains which are moving at different speeds, so one train is behind the other and it is not allowed to overtake it, but when the greedy train is behind, it is moving faster at rate two. So the difference between them becomes a biased random walk and so it can never exceed a gap of log N. Okay, so this means that the greedy reconstruction will be at most log N off from the true location. So this by itself together with the discussion earlier from the polynomial type arguments from earlier would allow you to do polynomial time for all q less than half. And I want to sketch how to get sub polynomial. So this uses an idea which is also present in a simpler form in the Hollenstein work. So we need to align more precisely than log N. So consider a block of length log N and focus on the middle root log N bits. And so we write A here for root log N. After the deletion channel, this becomes a sequence of length about P times A. And we want to use this to align. In other words, we look at a block of length log and say the last block we have reconstructed in the input, focus on the middle A, middle root log N bits and look now at traces where these root log N bits are reproduced exactly. And then we would like to hypothesize that these root log N bits actually come from the input without any deletions. So to do that, we have to rule out that they come from somewhere else. And so this would be a bad event. And again, in the interest of time, you can just show that this bad event has probability e to the minus root log N just by considering how this string could come from somewhere else rather than just directly from the input. So there are two possibilities and in both of them, the chance that this string would come from somewhere else rather than just from the input without deletions is exponentially small in root log N. So again, we look at the traces and we send them, we just throw away all the traces that don't have this desired block of length A. And those that do, then we can be sure, except for exponentially smaller probability in root log N that this block did arise from the input without deletions rather than as in this picture by some other mechanism. So the problem with this is e to the minus root log N is not a small enough probability for us because we're going to be scanning this whole string of length N. So we have to really make sure that the probability of error is smaller than one over N for this process to succeed. So define a block to be good if the middle root log N is really, can't be found as a subsequence nearby inside the block. And the chance that the block is bad is e to the minus constant root log N. So not all blocks will be good, but if we look at order root log N concept to blocks with high probability, one of them, at least one of them will be good. That is the chance they'll all be bad is going to be smaller than one over N. So the idea is this is the point we want to reconstruct the bits that we don't know yet. We look back from this point in blocks of log N and moving back at most root log N such blocks, we will find a good block where we will be able to align within root log N. So the final story is that we will be in some location which is at most log N to the three halves away from our target and we will be aligned within root log N in that block. And then using the previous polynomial type arguments, we get e to the root log N for about. So again, I'm not giving much detail on this. This paper is on the archive and if those of you who attend Fox can hear more about it there. I want to finish in my last two minutes by mentioning that this summer we spent. We started late, it's okay to take five more minutes. So I realized I kind of sketched this argument quickly but is there some aspect someone would like to ask about? So I hope at least the earlier argument about how you can align to log N just using this random walk is clear and then the refinement with the root log N takes a little more thought. Any questions about this? Okay, so I'm going to move to the last thing which is work done this summer with the Nina Holden and Robin P. Mantel. It's not yet on the archive, but it should be very soon. And this was removing the assumption that Q is in zero one. I mean also, I'm sorry that Q is in zero half. Of course, it's always in zero one. So also the previous argument where that relied on the greedy algorithm, one other drawback of it is that it couldn't handle any insertions or substitutions. So I didn't emphasize this aspect, but the work with the polynomial arguments with the Nazarov, that extends readily to allow insertions and substitutions just because those correspond to some other change of variables and the same arguments go through if you have insertions and substitutions as well as deletions. But the argument I sketch for you for the random case involving this random walk, it relied on the fact that the greedy is always on one side of the truth. And this is going to be false if we have insertions. So this greedy algorithm really fails under insertions or substitutions. So in the more recent work, we had to get rid of the greedy algorithm and replace it by another alignment mechanism. We also managed to improve the bound. Instead of e to the root log n, we now have e to the cube root of log n traces suffice. And let me say that this is the best you can hope without improving the worst case bound because in any block of long length n, of course, you will see all blocks of length, small constant log n, including the worst one. So since we don't know any worst case to construction algorithm, that's better than e to the cube root of the length in terms of samples. This means that there's really no hope of progressing on the random case beyond this bound e to the cube root log n without first improving the worst case bound of e to the cube root n. So we've kind of reached with the random case, the boundary of what one can get without improving the worst case. And kind of the very last slide, I want to just sketch what is the new alignment method that replaces the greedy algorithm. So I'll only talk about alignment to within log n. So with uncertainty of log n, further refinement takes more work and time. So we're going to take, so again, we've reconstructed up to some known location. We've reconstructed the input up to some location and then we have unknown bits in the future. We'd like to align the known part of the input with a particular trace. So we look at the window at the end of the current output and the length of this window, when optimized, you take log to the five thirds of n and divide this window into blocks of length log to the two thirds of n. So you divide this window into log n blocks. Now we're looking at a particular output and again, we're trying to align this output with this input. So we have some candidate location, some candidate window W tilde, which we'd like to align with W. So we also subdivide W tilde into log n into log n blocks that are somewhat shorter. And we want to check, is it reasonable that this window W led to this window W tilde? So how are we going to check it? Again, after we divide it into blocks, we just compare the majority bit in each block of W tilde to the majority bit of the corresponding block in W. And our test succeeds if the number of agreements is strictly bigger than half, okay? So note that if this block really was mapped to this block by the deletion channel, then the number of agreements is going to be a constant fraction of a block. And this means that the probability that the majority bits agree should be strictly bigger than half by some absolute constant. So we do this test, so we count how many times we had agreement and we pass the test if the number is bigger than log n and no otherwise. So, and this will be our sign that W tilde came from W. Now, this is a very, very weak signal that is even if W tilde did come from W, we might not see it in this test because of possible shifts that are created by the deletion channel. So only if we're very lucky, will we get a positive signal that tells us that W tilde came from W. But because we have a lot of traces, we are just thrown away all the traces where we don't get such a positive signal. And we just have to check that the chance of failing is only e to the minus cube root of log n. So we have to handle that the chance of true positives is at least e to the minus cube root log n. And the chance of is going to be also e to the minus constant cube root log n, but with a larger constant. So that's why the true positives should overcome the false positives. And there are several reasons why we might get false positives here. So of course, one is that just we're looking at a completely wrong location and then we can get the probability of e to the minus constant log n because we have log n different blocks. But the more subtle reason might be that the W tilde has some overlap with the correct location. And in that case, you have to handle the possibility that there is, you know, there are a lot of deletions somewhere in the beginning which creates a picture of alignment without really the level of alignment we want. So the rest, you know, is very computational but again, the rough idea is we replace the alignment via the greedy algorithm that was used in the Fox paper with Alex Rai by an alignment looking at correlations of blocks in the input and the output. So besides cube being allowed to be bigger than half here. So half is no longer a barrier. This analysis is more delicate because we're getting to the boundary e to the cube root log n, the boundary of what can be done by this type of method. Let me conclude by emphasizing that despite all this work and I've been obsessed with this problem for the last year and so have some other people, you know, the basic problem we really still don't know, say in the worst case, does the polynomial number of samples suffice? We don't know. In the random case, does polynomial in log n suffice? We know that it's sub polynomial in n but we don't know if in the random case, polynomial in log n suffice. So the problem, despite all this work is still, you know, very much open. Okay, thanks for your attention and I'm ready for any additional questions. Thank you. It was a bit of a challenge to make the claps heard, but thanks to the... What? I didn't hear. What was a bit of a... Well, exactly, you didn't hear. It's a bit of a challenge to make the clapping heard. Okay. Okay, I heard at least one clap. So let's say this was good. So questions for Yvonne? Let's speak up. Okay, I have question one. Oh, okay, go ahead. Yeah, so you mentioned those two problems, polynomial in n for the worst case and polynomial in log n for the average case. So can you remind us which of them implies the other? Or you don't get it now, no? No, no, so there is an implication. So if you were to do polynomial in log n for the random case, this would imply polynomial in n for the worst case. But the implication is also the wrong way to go about it because, you know, you're not going to do the worst case by first doing the random case. So it doesn't make any sense. So really, so I think the problems now have reached a situation where to make further progress on any of them, you'll have to make progress on the worst case. So... And can't you hope for reverse implication? It sounds like it might work. So you can hope that indeed our proofs are based on... I think some alignment, right? Yeah, with some alignment. So there's, in the random case, there's a part of the proof that they didn't show, which is when you refine, you use the alignment to do the reconstruction, you actually have to redo the argument for the worst case in a slightly stronger form. So we still use some variants of the complex analysis that we have to just, you know, do ourselves. So morally, I would say, yes, if you get a better, for the worst case, a better algorithm, you should be able to build on it and get a better one for the random case, but I don't see a black box reduction, especially given, you know, these arguments that, you know, succeed in some sense of doing the reduction for the random case are technically much longer. So the first paper with Nazarov, when I submitted the camera-ready version to stock, you know, the publisher wrote back, say, you know, am I sure that I submitted the whole thing because the paper is, you know, five pages in the stock format, you know, and we're allowed to have 14, so that's five pages. But the new papers, you know, are more than 20 papers. More than 20 pages each. So, you know, analysis in the random case is more technical, but really the worst case problem is the kind of where you can see, you know, you can have the most pure suffering banging your head against that wall. Anything else? And do you have, like, do you have a general, like something similar for the greedy algorithm for when Q is greater than half, because they don't like the random, like the- No, no, so when Q is bigger than half, the greedy algorithm really breaks and we don't have anything like the greedy algorithm, which is why we were stuck for several months. And indeed, I gave some public lecture about this a few months ago, where I said maybe we have a phase transition in the difficulty of the problem, you know, when Q is less than half and bigger than half. But now it seems the phase transition is just in the particular algorithm, the performance of the greedy algorithm breaks down once Q equals a half. And indeed, you know, you can run it and you see that the true location and the greedy location just separate and no, you know, can go to distance root n or worse when you have Q a half or more. But again, we have a substitute for the greedy step, which is looking for, again, correlations between sub-blocks. Yeah. So which really gives us the same performance of being able to align within a log n. So just a naive question. So in the average case, you mentioned polylog, but is there a lower bound for... So I mentioned a lower bound of log n to the log n squared in the paper of McGregor et al. Oh, okay, sorry, I forgot about that. Okay. Okay. Right, so log n lower bound is straightforward just from the linear lower bound in the worst case, right? But they were able to use the fact that, you know, you have a lot of... You don't just have one block which looks like the worst case, but you can have, you know, a power of n number of such blocks of length log n, and then they could use that to get log squared n. By the way, there was some improvement of the lower bound. The summary is still unpublished, but Nina Holden and Russ Lyons have improved the lower bound of n to the 1.5. And this leads an improvement in the random case to log n to the 2.5. So, you know, we have this huge chasm between the upper and lower bounds, and we are, you know, there's some nibbling at the edges, but the big chasm remains. Any other questions? And is there anything like just following up or this question, is there a general, like a generic connection between, like if you give a bound on the worst case, then you'll get a corresponding bound on the average case? Like... So I think I answered the discussion. So if you have a bound on the random case, it will give a bound on the worst case, but this is, you know, not the way to go about it. And a bound on the worst case with, you know, a few months of work will yield a bound on the random case, I believe, but we don't have a black box reduction. Okay. Thank you. Let me take us offline, but everyone is welcome to stay and you can ask more questions. Also, thank the about 20 viewers we had on the YouTube channel that couldn't get in. So we're staying with us there and I'm going offline, but everyone... Wait, let's... Yeah, next one. Yeah, next one. I just wanted to remind everyone who's still around to join us in a couple of weeks. November 11, I think this would be most of the show I can talk. So thanks for joining us. All right, see you in two weeks.