 So I think now we can really start it. Okay, so welcome everyone. Today's talk will be by Lena Schulman. Before I begin, I should thank my fellow organizers. So we have Nindia Day is actually here with us today. Hi Nindia. And we also have Thomas Vidic, Gautam Khamat, Clemon Kano and Elia Rosenstein, all helping this TCS Plus run behind the scenes. Maybe also before I continue, I should quickly go around the table like we usually do. So we have several groups today. We have André Nusser from Max Planck Institute. Hello. We have Clemon Kano from Sanford. Hello everyone. We have Diksha Adil from University of Toronto. We can only see some hands but yeah, it's probably connected to your bodies. We have Fangi from the University of Michigan. Hello Fangi. We have Janish Mehta from Caltech and Vito Balder too, hello. And we have Joshua, joining us from Colorado Boulder. Hello everyone. We have K Gopala Krishnan from Miss Kola University. Hello, good to see you again. And we have one more group. We have the group from Shahid Behesti University. Hello Said, good to see you. And I think we're basically ready to start. So and one more thing I shouldn't forget to remind you that we're hoping to still have one more talk this spring and that's going to be next week, not in two weeks, next week, talked by Michael Kearns. So we're hoping it will work out this time. We had some trouble last time. So let me introduce the speaker. So we're again, very happy to have Lena Truman today. Leonard got his PhD at MIT under the supervision of Michael Sipser and currently is a professor at Caltech. He won many awards. Let me just list a few NSF career award in 99, the IEEE Shelkunov Prize for his work on wave propagation in antennas. So as you can see has quite a wide range of interests. He has ACM notable paper in 2012, the UII base paper just recently, 2016. His interests are in information theory, coding theory, some quantum computation and many other topics. He's actually currently joining us from Israel. He's visiting this year, the Institute for Advanced Studies in Jerusalem. And I think that's probably all I have to say. So welcome, thank you and you have the stage. Thanks a lot, and thanks to all the organizers. So this is gonna be a talk about some joint work with Gil Cohen, who was at Caltech and is now at Princeton as postdoc and will be at Tel Aviv next year and with Bernard Halpler of CMU. And I'm gonna be talking about constructing tree codes. So the plan of the talk in brief is, I'll tell you what a tree code is. I'll tell you what's known about constructing them. I'm actually going to explain also why we want them because I'm really sort of not assuming that people have any sort of backgrounds in this area. And then finally, I will actually get to talking about our new construction. And as usual in talks, I really welcome interruptions to a degree, pretty large degree. So what's a tree code? Let T be, throughout the talk, T will just mean infinite binary tree. Rooted binary tree will denote the vertices of the tree in the natural fashion where the root is called epsilon and any vertex is labeled by some finite binary string just indicating whether you've gone to the right, that would be a zero or gone to the left, that'll be a one as you go down the tree. And we're gonna care about the binary tree as a metric space. So we're gonna care about distances between vertices, but we actually only care about distances between vertices that are at the same level of the tree. And if we have two vertices here, X and Y, the distance between them is just going to be the distance to their least common ancestor, which we'll also call the split for reasons that will be apparent maybe later on. So it's sort of half the path distance, but we'll normalize it so that it's the distance to their least common ancestor. So for X and Y as in the figure, that distance is two. Now, we're going to be labeling the vertices of the tree by labels from some alphabet, capital Sigma. So that's going to be the essential task we're concerned with how to label such a tree. And when you label the vertices, so as you see in this slide, of course there are quick letters of the alphabet next to the vertices there. When you label vertices of the tree by the alphabet, you by concatenation get a labeling also of simple paths from the root, right? So you just, so the vertex Y is now, receives the word ABB, right? And the vertex X receives the word ACA, just concatenating what you read as you start from the root and walk down to the vertex. And we're going to be interested in the hamming distance between those two words. In this particular case, that hamming distance is between the words ACA and ABB, that hamming distance is two. That hamming distance cannot be larger than what we defined as D sub T, the tree distance, right? But it could be as large as it, as in this example, the hamming distance is equal to the tree distance, right? That's sort of the largest it can be. And we'll say that the labeled tree is asymptotically good, or is an asymptotically good tree code, if it has, first of all, finite alphabet, and if all of the tree distances are preserved up to some constant factor. So the equation there is sort of just expressing this. It says that the hamming distance between the word leading to X and the word leading to Y is at least a factor one minus delta of the tree distance between X and Y, okay? And we'll be happy basically with any delta less than one. That means any constant factor. And you could also ask for delta tending to zero. So that's the essential, asymptotically good will just mean that you get some constant factor, right? I hope that's clear. This would be a good place to interrupt if any of the definitions aren't clear, because this is the object you care about for the rest of the talk. So just to make sure this also applies when the tree distance is one, right? Just when you split. That's right. So siblings must be labeled with different elements of the alphabet. If it's not hard, it's difficult for you to shrink the window a bit because I'm afraid some viewers won't see the bottom line. Okay, I can go ahead. That's a really good enough. That's good enough. Is that really good enough? Yeah, maybe. Okay. Yeah, and since you've been able to page through the slides like this. Okay, if people are happy with that, then no. Yeah, I think it's good enough. Okay, thanks. Okay, now, is this an easier or a hard object to construct? So first of all, let me just point out a simple thing, but just to sort of to get a, you should sort of get oriented to how hard is this to do. So building such an object implies construction of an asymptotically good block error correcting code. Okay, so block error correcting codes and I'll define them formally in a moment. And for the moment, they'll just assume that you know, everybody's sort of familiar with this. So if you build such a tree code, you immediately read off a block code format, how? So take a tree code and just chop it off at the nth level. And I claim from chopping off like this, I can just read off an asymptotically good block code with respect to hamming distance of length n over two. Why? Take the full tree down to, and here, let's hope this mouse kind of system works, down halfway down the tree. You go halfway depth, okay? And from that halfway down the tree, at every vertex that's halfway down the tree, just choose a single path that descends, let's say all the way to the left. It really doesn't matter which single path you take, okay? So this will give you two to the n over two different code words. And they're guaranteed to be at a constant fraction, hamming distance from each other by because of the tree code requirements. Now we do know, of course, how to build asymptotically good block codes, but it's not trivial, right? It took people about 20 years to actually build these things. So this is going to be a harder object and the tree codes are gonna be a harder object. And for them, we really, even till today, don't exactly know how to build them, but just to give you an idea of what implies what. Okay, is that clear? Okay, so you can also ask, do these things exist? And they do. This is something that I had to prove way back in 1993. And I'll tell you in a few minutes why, but here's the theorem. So it says, for any delta greater than zero that you fix, delta means the same thing as what we've been talking about. So one minus delta is the relative hamming distance. For any delta that you pick, there is a finite size alphabet, such that you can label the tree, such that for all vertices X and Y at the same depth, but this means anywhere in the tree. So that no matter what, no matter how far down into this infinite object you like and how far and what the D sub T of X and Y is, how far apart they are from each other, the hamming distance is at least this factor one minus delta times the tree distance, okay? And the, I know three proofs. I don't know actually any new proofs in the last 20 years, but very soon there were three proofs, which I'll just abbreviate. I won't talk about them now unless I get questions, but you can sort of build trees at an increasing depth and then plus one and so forth by a kind of a re-randomization process. You can also just convolve with a single random seed and you can apply the local lemma as well. These are three distinct ways of proving this theorem. And just to mention in terms of the size that you get, it scales like E to the one over delta, okay? So the problem that's been open since, well, I guess it's more than two decades now, is to actually just explicitly build these things. And even with the weaker definition of just getting some delta strictly less than one, be even nicer to be able to get delta to 10 to zero, of course. Okay, and that's the problem that this paper is about. Let me just mention some really trivial things that don't work, maybe just to kind of orient your thoughts. And this is, one is you could just say, why don't I use the trivial labeling where I just label every vertex by whether it's a left child or a right child, right? So that would be incredibly greedy trying to get away with a binary size alphabet. But even if you label by the, let's say, the last three steps, whether there would be eight types of vertices that are left, left, left, left, left, right, and so forth, right? Any purely local definition of the labeling like that cannot work, it will be a constant size alphabet, but it will fail the distance condition. And that's just a little counting argument you can go through in your head. You'll see that some very shallow depth already, you will get two vertices with identical labeling and from there on and down, the subtrees beneath them will be identical. So no definition that's purely local can succeed at all. The other trivial thing you can do is you can make the labelings just very comprehensive and give the entire path to the vertex, in which case you never really reuse any character of the alphabet, but that's a horrible solution because I mean, the hamming distance is great, but the alphabet grows exponentially in depth. So this is sort of some kind of simple-minded things you can try. There is something non-trivial that's been known since a long time. And before I do that, let me just formally say what black codes are. Black codes are established by the following theorem which goes back, I'm actually sure, but let's say 1948 and say that for any positive delta, there is some constant that's the alphabet will be of size, this constant. So it's for all large enough and you get an explicit, namely we can really build these things, block error correcting code. So it's a mapping from two to the end and to sigma to the end with relative distance one minus delta, okay? And we'll, I'm actually putting this lemma kind of formally here in part, not so much because I think that anybody in the audience is not familiar with this, but because I actually wanna use this mapping ECC later on. Okay, so by now the state of the art is that we know how to build these things really well. So using this fact with Will Evans and Michael Klugeman long ago, we built a relatively elementary or maybe in very elementary construction of tree codes which maybe in the interest of time I'll just describe very briefly. It's that sort of for all at all scales and all powers of two, you know, so it'll be sort of one scale where you listen to two characters and then decide to kind of map them under an error correcting code. The next scale, you go listen to four characters, that's what's like marked in late blue and you map what's in those four characters to an error correcting code. But of course you wait till you've heard those four characters, right? Because we need to sort of decide what to map as it comes along. So you sort of listen for time two to the K, you gather kind of a partial word of size of length two to the K, then you wait half that time and then you print out in sort of compressed form the error correcting code that's appropriate. I don't know if that was clear, but in any way you sort of have to go through the proof and it's as you see, it's very elementary, but you do have to kind of think about it for a moment. This will give you a tree code that uses an alphabet size that's polynomial in N where N is however long you've been communicating for or whatever depth of tree you want. We'll talk about interchangeably doing infinite tree in the tree of depth N, okay? So, and the reason the alphabets are sized polynomial in N is basically at each distance scale you're printing out in real time an error correcting code that uses constant alphabets. So you'd have sort of log N of these constant alphabets being printed out. Let it just a question. So the existence theorem was something the alphabet size was independent of N, it was just dependent. That's correct. There's no N at all in the existence theorem. There's two to the one over the, some dependence on the distance criterion naturally, but there's no dependence on N at all. And that's what we really want. But here we get polynomial in depth. And if you translate that to using this for communications which I'll talk about in a moment, this simple-minded sort of construction gives you a rate that goes like one over log N instead of constant. What else is known about explicit constructions? I think I should go through this rather quickly, but except for one important point which is what do we mean by an explicit construction? So I'm asking you to construct a tree of depth N. And a question that people sometimes ask when they first hear about this problem and maybe I'll anticipate that question here is you just told me that one of the proofs, the existence here depends on the lowest local Emma. And we know that that has a very beautiful algorithm. And why doesn't that work? And the reason is simple. The local Emma, the tree itself is an object of size. If you build a tree of depth N, the tree is an object of size two to the N to really write down the entire code. Tree code is the object of size two to the N. But we want to be able to sort of have random access. I wanna be able to point to a vertex in the tree that's at depth at most N. And you should tell me in time polynomial N or maybe even linear N what's the label at that vertex. Okay, that's what we need for the applications of these tree codes. So it's the wrong, the short version is it's the wrong complexity measure. It's off by an exponential from the complexity measure. And building these things in time two to the N is not a problem. Okay, the problem is polynomial N. All right. So just to briefly say what's in the literature, because of the difficulty of building tree codes, there's a few kind of dodges in the literature. I'm partially responsible for one of these dodges. There's a class of applications where we don't quite need tree codes. Gilles Moitra and Sahai do another kind of dodge. They give a tree with a definition of a kind of a code with slightly weaker properties, still good enough for most of the applications. And it still doesn't give an explicit construction, but they can at least build these things with high probability, not just existence but high probability. Mark Braverman gave something where he shows how to in an interesting way trade off computation time against alphabet signs. And finally, Chris Mournay a few years ago wrote down what will be a explicit construction if certain exponential sums converge in a way that we would like them to. And to me, this is really intriguing. And I really hope that that sort of analytic number theory here can catch up, but the conjecture that we need is pretty far, it's very far from the types of things that are actually known in analytic number theory. So it remains purely conjectural. If no questions on that, I'll tell you why we want these objects. The main reason, there are other reasons, but the main reason is for an interactive version of Shannon's coding theorem. The problem here is that Alice has some message in Shannon's problem that she'd like to send to Bob. She's dealing with either an adversarial channel where the adversary can flip some constant fraction of the bits or with stochastic noise. And we know from 60 years or so, maybe 70 is a better approximation that there's a coding theorem for this that Bob can recover the Alice's message if the number of modifications is less than the half the distance guarantee or if the rate of communication is less than the capacity with exponential quick probability. So these are things that I expect people are very familiar with. But the issue here is that this Shannon task, the one-way communication is really not sufficient. It doesn't adequately express the full richness of communication because interaction is essential to efficient communication. And I'm gonna spend a little bit of our precious time showing you what I think is like a wonderful piece of literature, it's very short piece of literature. So I hope you'll excuse me. I hope you'll enjoy it actually, that expressed this, this is something about communication complexity. And it was expressed, absolutely in a concise and beautiful fashion in a completely non-mathematical literature. So let me show you. This is a Calvin and Hobbes comic strip, which some of you may be familiar with. Calvin's a six-year-old who gets into a lot of trouble. He's calling at a county library and he asks for the reference desk. And he says, hello, hello. I need a word definition. The librarian responds, well, that's the problem. The librarian responds, well, they say implicitly, what is the word? And Calvin says, well, that's the problem. I don't know how to spell it and I'm not allowed to say it. Can you just rattle off all the swear words you know and I'll stop you when you get to that one. And then Calvin expresses a lot of frustration. So we don't have a laugh track and I don't know how this went over. I found this strip incredibly funny when I encountered it. And the second thing that happened to me when I encountered it is I realized it was about a topic near and dear to my heart. So let me express in communication complexity terms. Let me explain the joke, okay? Because you can't throw tomatoes at me so I can explain a joke here. Here's the sensible protocol for what Calvin's task is. There's a bipartite graph. It's got words and word definitions. Calvin says a word that he would like explained and he says, let's say darn, okay? And then the librarian should respond with the definition of that word. Happens to have two of them, one of the swear word and one is an innocuous interpretation. Okay, so this is the sensible protocol. It's a two round protocol. Why isn't the joke funny if you're on my side? Why is the joke funny? Because Calvin's asking for a one-way protocol in which the librarian just reads out the definition of all these definitions of many, many, many swear words and we all know, and not just mathematicians, we all people know, because this gets printed into some day comics, right? We all know that this is a ludicrous in terms of communication complexity protocol. So I actually don't know if the strip came out before or after Christos, cup of the mitrio and Mike Sipser proved their theorem, but they proved that the joke is funny, okay? So Mike and Christos proved that for every K, here K is two in this example. There's a communication problem that you can solve sort of efficiently using a K round protocol. In this case, a two round protocol that requires exponentially more communication if you insist on K minus one round, okay? And the theorem was later extended in the various ways that are on the slide that I'd like to move along. Okay, so Bill Waterson, who's a genius, wrote that comic strip, I'm sure not having seen Mike and Christos's paper and I've asked Mike and he hadn't seen the comic strip. So this theorem was independently reinvented in the two different literatures. All right, so the bottom line here is in order to carry out interactive efficient communication, we need a highly, highly interactive communication, very short rounds going back and forth. And we can't efficiently encode that using block codes. Okay, so this is why we want tree codes and this is why the theorem comes and I'd really like to get to the construction of the tree codes, but I'll just say the main theorem that required or that calls for them. And it's the Analog of Shannon's Coding Theorem. It says if you wanna simulate an n-round noiseless channel protocol and you're dealing either with adversarial noise. So again, sort of some constant fraction of bits can get flipped by the adversary, but now there's sort of lower limits that were later improved by Groverman and Rao. Or you're dealing with stochastic noise just as before you get exactly the same kind of theorem. There's a deterministic protocol that only slows down by constant factor, right? It uses O event transmissions and it uses a tree code for those transmissions and it either achieves zero probability of error when in the adversarial case or exponentially small in the random noise case and it has this constant rate, right? Because the alphabets are constant size or everything that you communicate in this protocol is just labels from a tree code. So if you have a finite size, sigma and then you have a constant rate, okay? The two ingredients that go into proving this theorem, one is purely combinatorial tree codes and the other is a simulation protocol, okay? And the simulation protocol is exactly what was improved by Groverman and Rao with the very ingenious way in 2011. Maybe just to say a word about related work. This wasn't computationally efficient. This is just communication efficient. Rekersky-Kalei and Nora are showing how to do this also in a computationally efficient way, although with randomization and that's a kind of big piece of progress. So we now even know how to do these things efficiently in the computational sense apart from the computation side if you allow randomization. And there's a lot more work. Just read the top line. There's a monograph of Gellis from last year if you wanna read more about the subject and for lack of time I won't really talk about any more of this work. So the content of this paper is to improve on that polynomial size, alphabets of this very elementary construction I mentioned before and I'll give you a polylog alphabet instruction, okay? So the main theorem is going to be this, again for any constant delta and any depth and I'll give you an explicit binary tree code explicit means efficient computationally the way that you described before achieves distance one minus delta. This is actually true for any delta positive and the alphabet size is polylog. Okay, and so in other words in terms of communications we can rate we can achieve a one over log log N communication rate. Is the theorem clear? Can I any questions for it? So now I'm gonna actually start the construction proper. They're gonna go in three steps. They're not all equally interesting in some sense the first step and sort of the main idea of the second step are the main thing and I'll try and allocate my time in that way. So I'll sort of prioritize doing the first step carefully and then go a little more quickly for the rest of it. Yes, unless you interrupt me and change how things work. All right. So okay, just to say what this is again so the point is now I'm going to be imagining for some time to come that the alphabet that I'm using for my encoding is the integers. Obviously this is an infinite alphabet. We already made the point that infinite alphabets are really not interesting but later we'll get into bounding the size of the integers that we use. So somehow that is at the back of our mind. So I hope you'll just forgive me for awhile we'll just talk about the integers and then later we'll talk about how big they are at all. Okay. So right now I'm going to do some work that's purely algebraic and I think is really the most interesting part and everything will be happening over the integers. So here's what we do. In a binary tree code, what is the input alphabet? It's really just zero and one because as you're communicating you're just saying, do I go left or right down the tree and then you have to send whatever is written on the vertex of the label with something from Sydney. And our first step will be to sort of handle a larger input alphabet, okay? While, so maybe just have that in mind that we'll get to that sort of on the next slide. Let me remind you what's the step one of coding theory is Reed Solomon codes or polynomial interpolation. So in Reed Solomon, what you do is you say my message is given as sort of as the coefficients of a polynomial. So whatever field I'm working over I have a bunch of coefficients, a naught, a one and so forth. And then what's my code word? Well, I go and I evaluate this polynomial at somewhat more than N locations in the field, right? I take one plus C times N distinct locations in the field I evaluate the polynomial and that's my code word. Okay and I just feel just has to be big enough not to be in the algebraic geometry here just straightforward. The field has to be large enough to have that many distinct values. And because we know that polynomials have at most N roots there's going to be distance at least C between any two codes because just subtracting from each other you get a non-zero polynomial and it has to be non-zero and C N places. But then we, of course this doesn't work for us, right? It doesn't work for us because in order to apply this kind of encoding and start evaluating a polynomial somewhere you actually have to know all of its coefficients. So it doesn't have this sort of online or causal property that we need that we should be able to start out giving outputs when we only know a few of the inputs. Okay, so the first idea is to use a different basis than this standard basis. This standard basis is just one X, X squared, X cubed, X fourth and so forth. There's another basis that's better for us. And that is the basis of binomial polynomials. So these are the polynomials X choose I. And when I say polynomial X choose I I mean you should think of I as a constant. It's the variable polynomial and X as a variable. Okay, so it's this polynomial that I'm sort of running the mouse over. In the Newton basis, we can write degree N polynomial in this form. So gamma is our polynomial and gamma of X is some coefficient in the field gamma naught or think of the field now as the rational, okay? So gamma naught times X choose zero plus gamma one times X choose one. So we can expand any polynomial degree N in this fashion. It's actually in some ways a nicer basis than the standard basis and this is useful for us. Which is what's called an integer basis. So what does that mean? If you think of the usual basis if all of the coefficients of a polynomial in the standard basis are integers then of course the polynomial will take an integer value everywhere in Z, right? But the converse of that fails, right? So you can have a polynomial that is integer everywhere in Z but has some non integer coefficients. And over this basis, it's an if and only if. So that's what it means to be an integer basis. So that's one thing to note. The next thing to note is really, really trivial but it's essential to understanding what's going on for you to note this sort of consciously which is that the roots of the polynomial X choose I are at zero, one, two up through I minus one. Okay, let's just read off from the expansion polynomial. So this gives us an if and only if that the values of the polynomial at the points zero through J determine the lowest coefficients, gamma naught through gamma sub J. And conversely, the lowest coefficients, gamma naught through gamma sub J are all you need to know in order to determine the values of the polynomial at the points zero, one, two up to J. Okay, now, no, any questions on that? All right, now, just writing down the values of this polynomial is not good enough. That would be sort of trying to do a rate one code which doesn't make any sense. We somehow need to go down to rate less than one to hope to have an error correcting code. And given the line, given sort of this line right here that you can go back and forth on this information, there's an obvious method which is that you're sort of stream of inputs. When you get your stream of inputs that let's say are gamma of zero and then gamma one and so forth, you transmit both those values of the polynomial but also the coefficients that you're discovering in real time, okay? So that's the code proposal. You have a stream of integers, Z naught, Z one and so forth. You think of those as the values of a polynomial with unknown coefficients gamma naught, gamma one and so forth, but the first, you know, Z naught through Z sub T determined gamma of zero up to gamma sub T, that's what they are. And they also determine the coefficients gamma sub zero up to gamma sub T. So some easy notes. First of all, the mapping from Z from these evaluations to the coefficients is a linear mapping that's sort of obvious. And at any finite stage of this process, we have defined the polynomial which is sort of an unbounded process. You could think of gamma as a kind of a power series over the Newton basis, but in any finite stage, you have this polynomial gamma sub T which is this kind of prefix of the power series and the gamma of T's tend to gamma as their limiting object in the sense that the prefixes agree in longer and longer prefixes. And a really important thing is this last line which is that Z sub J, the value of gamma of J at J, this doesn't change when you get to gamma of T for larger values of T. These polynomials never change their mind about their values at small points J, right? Once gamma of sup J of J takes on the value of T sub J, so does the next gamma and the next gamma, the next gamma, all the polynomial expanses along this power series. There's a lot of easy things but they're kind of building up. So here's the tree code proposal and it's still working over the integers. If I give you an input Z naught, Z one and so forth, you transmit these pairs. So I've written the pairs sort of vertically. Z naught, that's the value of the polynomial and gamma naught. That's a pair, I think of that as a pair in Z squared, okay? And then the next value, the polynomial Z one and the appropriate coefficient gamma one and so forth. Because it's sort of a rate one half proposal. If you just think of them as integers and don't think about sizes. And the main question, the most interesting question here, it's not enough to solve our problem but the really interesting question is, does this proposal have a distance guarantee, okay? It's a linear code. So in the usual way, we don't actually have to think about distances between the encoding of two different messages. Or in other words, distance between the encodings of two different power series, all right? Instead, we can just look at the distance to an encoding of zero. And unlike what the slide says, it doesn't have to be an encoding of two encoding of zero, okay? So for non-zero polynomial gamma, we'll let the split point be the index of the first non-zero coefficient, okay? So you might actually be evaluating zero, zero, zero, zero for a while and at some point you evaluate the polynomial at, it takes on some non-zero value. So if there were two messages, this would be the first place where they differ from each other, okay? But now it's the first place where this polynomial is different from the zero polynomial, okay? And by the way, maybe just see if anybody's following or in fact, if there's any audio at all the other side, the first time that Z sigma is non-zero, does anybody know what gamma sigma is? Well, I'm just guessing maybe it's Z sigma? It's exactly Z sigma, okay? And that's because of this property that we had before, well, abstractly it's because of the integer basis property and more concretely, it's because this is just the coefficient that's multiplying the binomial polynomial, sigma choose sigma, which is one, okay? Z sigma here is equal to gamma sigma times sigma choose sigma or one, okay? Now, of course, later on Zs and gammas are not equal to each other. It's only a very first time that they're equal to each other. So what is this distance guarantee? If we can achieve a distance guarantee of delta, what that means is that for any m greater or equal to zero, once you, from the first time that you're non-zero, which is occurring at this point in time, where I'm moving the cursor, from that time and on, you start having these pairs, this vertical pair, the next vertical pair and so forth, and at least a fraction delta of these pairs are non-zero. Okay, actually that should say one minus delta, okay? Leonard, is it obvious why you need both Zs and gammas and not just gammas? Well, maybe it's not completely obvious because we're sort of not, because these are, this is an infinite alphabet. So I'm not sure that sort of, imagine all this was happening over a finite field and that the Zs could really be arbitrary and this then what you would be asking for is a rate one code. So you're not gonna be able to get any distance about, okay? Now, that would be over if we were trying, if we were saying that something like this was happening over a finite field, then that would be impossible for that reason. Honestly, over the integers, that's less clear because you could be sort of have the gammas, the integers that are much bigger than the Zs. And if you don't put any bound at all on that, then no, then you can imagine schemes where the gammas would be enough, but then you'd really be hiding something in the size of the integers and we're really not trying to hide anything in the size of the integers. Now, if you're asking concretely about this scheme, would it work only with the gammas? Then I think not, that's certainly not. Actually, yeah, okay. I could probably rig the Zs so they cancel. You can rig the gammas to cancel out subsequent, sort of make all the subsequent Zs zero, oh wait a minute. Yeah, wait a minute. Yeah, yeah, yeah, yeah, no, no, no, no, other way, of course. Let me... They could define, yeah. Yeah, let's take it offline so we're already getting late here. Okay, so I wanna translate this statement, I wanna get to the cool lemma. So I'm gonna translate the statement that I just made to a statement about the polynomial gamma super sigma plus m. So again, sigma is defined to be the split point, the first time when the evaluation of the polynomial is non-zero. And I'm gonna break this delta, sort of distance guarantee into sort of two different pieces. One is the density of coefficients that's like the bottom row, okay? So how many of them are not, or what fraction of them are non-zero once, from the first non-zero entry and on. And also there's like an evaluation density which is what's the fraction of non-zero gammas, again, in that size, okay? And the distance guaranteed delta that we're aiming for is sandwiched between the maximum of these two numbers and their sum. So it really doesn't matter if we talk about delta or we look directly at these two parameters. And we're gonna aim for a very, very strong distance guarantee, okay? So we're gonna be aiming to show an online, so sort of why, which I mean the sort of causal encoding that we have here, but an additive uncertainty principle, okay? So for those who know what the Chebotaro lemma is, it's analogous to that, but that doesn't have this causal structure that we need. Okay, so I'll be more precise about this in a second when I state a lemma, but the blue is sort of the property that we want. We want that if gamma is sparse after this point sigma, so if there are very few non-zero coefficients, then we want to force that it has very few roots, okay? And the question you should ask is, is this kind of a reasonable thing to expect? And it was kind of strange to me, and I want you to experience that it's a little bit strange. So it doesn't seem like a reasonable thing to expect, and let me show you why, at least from a certain point of view, it doesn't seem reasonable. Look at a polynomial like this, is the polynomial with a constant term? And then it's very, very sparse. It really has only two non-zero coefficients, one is the constant term. So in other words, the split point is at sigma equals zero, it's immediately non-zero. And then this polynomial is quiet for a really long time, nothing happens in the coefficient land. And then finally it has the coefficient x choose n for some large coefficient c, okay? So ideally what we'd like to show is that because it was so sparse, it should have very few positive roots, okay? So let me show you what the polynomial actually looks like. Here's the case n equals nine. So I took the polynomial minus one for 5,000 times x choose nine. And it's a nice real polynomial that I can actually show you, I can plot out where the roots are. And here you see there's a root, this root is just to the right of zero, then there's a root very close to one, then there's a root very close to two, very close to three, very close to four, very close to five, next to six, seven, eight. So it's actually got a lot of roots, I'm sorry, on the positive axis, and they're even very close to integer points. We would like there to have, we would have liked it to have just like only two positive roots, although we really don't wanna care if they're close to, if they're at integers, okay? It's obvious why this example comes out because I put a big coefficient on the x choose nine and this minus one was just a little perturbation. So it looks a lot like the binomial x choose nine, but for this polynomial, I need to prove something very, very, very different than I would just for the polynomial x choose nine. The polynomial x choose nine has all of these integer roots. I need to show you that this kind of polynomial that's very sparse here doesn't have integer roots like that before nine, okay? Or has very few. So it's a strange thing, and when we were sort of thinking about this, it's just a really strange thing to expect. There could be a lot of roots on the positive axis, but somehow it's sort of not a continuous property, right? They're not going to be at the integers. So here's the lemma, this is in some sense the main lemma of the paper or to me the most interesting, there's some more work to be done, you can't get away just with this, but this is I think really, really interesting. So the theorem says this, take a real polynomial, let it non-zero, let it split point b sigma. So of course we're looking at it in the Newton basis and suppose it has s non-zero coefficients in the Newton basis. Then from the split point and on, so here in the interval sigma through infinity, it can only have at most s minus one distinct roots at integers. It can have lots and lots and lots and lots of roots. There's no limit on the number of roots, but at integers there can only be s minus one. So lemma coefficients can be real numbers, they don't need to be integers. Yeah, this is fine over the reals. We really don't care about the rationals, but yeah, you'll see the proof in this, I'm gonna prove this to you, there's nothing to do actually with the reals versus the rationals. An exactly equivalent way of talking about this is not in terms of polynomials, but in terms of these sort of Newton basis power series. And it says suppose you have this power series with coefficients gamma sigma one, and then sigma two is larger than sigma one, and oh, so there's actually an error on this slide. It's not that the coefficients are increasing, it's the locations that these coefficients are increasing. So sigma one is less than sigma two is less than so forth. This should have at most s minus one distinct roots in the interval closed at sigma one open at sigma s. So I wanna prove this lemma to you, and it looks like we'll be out of time maybe for doing some of the lower order stuff, but I'd really like to prove this lemma to you because it can be proven in the remaining time. So for contradictions, suppose s is the smallest value for which you get a contradiction. So here's some polynomial with sparsity s, and here it is, so sparsity, I mean the number of nonzero coefficients in the Newton basis, and it contradicts the lemma. This can't happen for s equals one, s equals one means that your polynomial is just a binomial polynomial, it's in the binomial Newton basis. And those polynomials will form x choose sigma are nonzero for all x greater than sigma, okay? So it can't happen for s equals one, so s is at least two. We claim we're supposing that the lemma fails for gamma. Then failing means that after the point sigma one, weekly after this point sigma one, there are these roots at t one, t two, at t s distinct roots, okay? So gamma of t sub i is zero. So first I'm gonna prove something useful to you. This is not proving the full lemma yet, this is just a claim. And it's that for every value of j, sigma j is strictly less than t sub j. Okay, so these t sub j's respectively occur to the right where the nonzero coefficients are. So suppose there's counter examples, so there's some j at which t sub j happens to the left of or equal to sigma j. This can't happen for j equals one because of something I sort of pointed out beforehand that the very first time the polynomial is nonzero, the coefficient is equal to the value of the polynomial and it's also run through. So this has to be happening for j at least two. So let's look at this prefix polynomial that uses only the first j minus one coefficients, okay? So that's what I'm sort of circling with the mouse here. I look at this polynomial where I take the polynomial gamma which has sparsity s and I only take its sparsity j minus one version, right? The first j minus one coefficients. Now we've already learned, I made sure to point this out in some earlier slides, that this polynomial agrees with gamma up until sigma j minus one, right? That was how this thing about how these sort of polynomials that are over the Newton basis, they never change their mind about their early values, okay? So this prefix, j minus one prefix polynomial agrees on the point zero to sigma j minus one. Now, since we're supposing contradiction that t sub j is less than equal to sigma j and then we also have t sub j minus one strictly less than t sub j because these roots are distinct. So the prefix polynomial therefore actually has roots t one up through t j minus one, all distinct roots and they're all larger than sigma one. This is more than j minus one minus one. j minus one is the sparsity of this polynomial that we're looking at, minus one is the limit, okay? So this j minus one prefix polynomial already has the property for which s is supposed to be a minimal, all right? So it already occurred for j minus one and j minus one is definitely less than s, okay? So this proves by contradiction, this proves the claim. Fine, so now let's go on to prove the level. We know that from this claim that we just proved that the roots are strictly to the right of the coefficients, right? The non-zero coefficients. So sigma j is strictly less than t sub j. We know this is true for all j. We're still looking at this polynomial gamma and we form the following matrix, it's an s by s matrix and its coefficients are t sub i. So you take the upstairs where the i-th root of gamma is and downstairs you put sigma j, the location of the jth non-zero coefficients, okay? Now, this is a nice matrix to look at because if I now form the following column vector I just take the column vector or call gamma bar, it consists just of the non-zero coefficients of the polynomial, right? Then matrix by vector multiplication is the same thing as evaluating this polynomial, right? So if I wanna look at the polynomial at the point t sub i, I see it in the i-th row sort of of m, I evaluate it. I evaluate it. So gamma of t sub i is equal to m times gamma bar, you know, the i-th entry. The ti are roots of gamma. So in fact, I have zero bar, the zero vector is m times gamma bar. So this vector gamma bar has to be in the kernel of m. Okay, is that clear? It's just the usual thing. This was certainly a non-zero vector. Actually every entry was zero, although that doesn't matter to us. So gamma bar is a non-zero vector in the kernel of m. So now I get to tell you about a classic fact, usually associated with the names, guess I'll be in them. And they cared about this matrix. So they said, they proved, and it's a really, really nice proof that if you define the matrix m as I did just above, and if we have the condition of the claim, actually even beaker, it's enough to have weak inequality. So if sigma i is less than or equal to t sub i for every i. So this is the condition you need in order to guarantee that just the diagonal entries of the matrix are non-zero. Okay? Then this matrix has non-zero determinant. It's a very beautiful lemma. And that is suffices to prove our lemma because we just established, this isn't the qualities up there in the claim. Okay? Maybe just as a sort of historical note, recently, and I owe both David Zuckerman and Klima Fremantko, thanks here. Recently sort of discovered that this theorem of guess I'll be in them is actually implied by a theorem of polio from 1931, which also has a beautiful, but completely different proof. All things I'd be happy to explain to people offline. Okay, so this proves, no questions on that. This proves what we wanted, the main algebraic thing that we wanted, which is that this code that we've built over the integers, we still don't have what we really want with finite alphabets, but this code that we've built over the integers achieves distance one half because we have this additive uncertainty principle that the sum of the densities of non-zero coefficients plus density of non-zero evaluations is at least equal to one. Okay, so that gives us distance one half. There's a version of this argument that works. This gives us distance one half. We can get this up to one minus delta by a sort of perturbation of this construction. Okay. This is the algebraic core of the theorem. I can move on to saying maybe very quickly something about what else goes into it. If there aren't questions on that. One quick question. Yeah. Is the polio lemma the only place where you use the fact that those are integers, the t's? Yeah. Okay. Yeah. It's baked into both the Guesselvian node proof and the polio proof that they're working with integers. There's no, yeah. And that's where it gets used. Okay. So I'm going to say, I don't want to go, how much flexibility should I actually, let me ask the org? I've only got like four slides, five slides to go. So maybe I can even get through it. And depending how strict you want to be about time here. Yeah. We should not very strict. Yeah. We can take five more minutes. Okay. All right. I'm just kind of looking at the label which says I'm on slide 29 of 33. So I guess I can go through this. So how do we get this down to a finite alphabet, right? So somehow what I've shown you at the logical level still didn't prove anything because we're recording in the integers. So we have to bound the size of the numbers that we get here. And this is a proposition. I, you know, there's absolutely no reason to sort of waste your time proving it here. It's very straightforward actually. That says that when you interpolate, okay, remember to remind you the Zs are the, those sort of the inputs, they're like the polynomial evaluations. And you ask yourself, how big is the T coefficient to the Newton basis as a function of the Zs, right? Cause the Zs are actually allowed to not just be zero one. They're allowed to be integers. And here's a simple bound two to the T times the maximum of the Zs squared. And you would expect some exponential dependence in T because there's these binomial coefficients here. So this is what you would expect to be true. And it's true, okay? And, you know, to begin with, I should say, this looks really bad. Your alphabet size is growing exponentially in depth. And I already told you on one of the very first slides that alphabet size somehow growing exponentially in depth is really bad news. Okay, so this is not, this should not make you celebrate. This should make you maybe think that you wasted your time with. But it's actually this really interesting you can do here. The trick is to say, well, instead of viewing this as bad news because I have these really small Zs, they get inflated into really big gammas as time goes on. Instead, let's view this as an excuse to hide bigger Zs. So we'll sort of go along for some time, T, let's say. And if we're going for sort of time, T, we could afford to have Zs that are much larger. And I'll show you what I mean that this is, I'll show you that I'm not trying to sell you a bridge. So the idea is to tie together the sort of the size of the alphabet with a certain leg that will allow. So here's a corollary of what I wrote above. And it's really truly a corollary. You can work it out from what's written on the slide that for any integer L, there was an explicit tree code that achieves distance one half because it's just based on the same argument where the input alphabet is L and the depth of the tree is L. So you can see here in the argument to this tree code. Usually we've had said something like, it should be a mapping from zero one, the alphabet zero one to the end, right? So there the degree of the tree and its depth are decoupled. But here we're coupling them. We're saying we want to go to depth L with an alphabet of size two to the L, okay? And if you do this, then actually you don't have a rate problem. You just get rate one third, okay? So this is an immediate corollary of the sort of bounds that are written in proposition 10 because I won't belabor that. So now this motivates defining something weaker than a tree code. And we'll later show you how to go from this weaker definition back to tree codes, okay? So this is a notion of a lagged tree code. So a lagged tree code has the same kind of definitions as before. There's an input alphabet, sigma in, there's an output alphabet, sigma out. There's a distance parameter delta, but there's also this parameter lag capital L and the requirement in the lagged tree code is that, okay, this expression that's written here is the hamming distance, okay? And what's written on the right here is the usual sort of distance requirement. But what we require, let's go back here, is that this distance requirement only hold once the distance from the split is large enough, okay? So the only once this lowercase L, which is how far away, you know, how long in the past X and Y split from apart from each other, that has to be at least capital L before we care. So we don't care what happens, let's say among siblings like Odette asked earlier. We just happen, depends what capital L is, we only care after that amount of time, okay? So with honest to goodness tree code has capital L equals zero. But, you know, here we're relaxing that. At the extreme of relaxing the criterion, we really know how to build these things. If I give you lag N, then basically I'm not requiring that you achieve any distance at all until you get to the very bottom of the tree. Okay, so that's just a block code. We certainly know how to do that. And I've written down the usual kind of lemma here with some parameters, which for lack of time, I will get into why we want N over three and Y five, six. That's not important at all. Okay, so with that extreme, we know how to do things in trivial. We're gonna want lag that's, you know, non-trivial. So it'll be much smaller than N, but it won't be zero. So the corollary of what I wrote, what it was called like Proposition 10 or 11 or probably the thing I had two slides ago, is that we can achieve a lagged tree code. So it's sort of constant rate. So this C sub ECC is a constant. So it goes from two to the L to this lab, achieving distance one third, as long as the lag scales like square root of the depth of the tree. Okay, so that's, and the idea is simple. You bats the input into sequences of square root of L bits. Then it looks like what we had on the previous slide, you sort of have alphabet two to the root L, depth root L. You consider each of these batches as an integer. So all your inputs, your exabytes are bounded by two to the square root of L. And then you apply the appropriate sort of lagged tree code. And finally you encode the characters in an error correcting code. Okay, so this picture here at the bottom of the slide is supposed to sort of illustrate what's going on. The, here everything's really truly in binary. You have these streams. The streams are identical to each other, X of I minus, until time I minus one, X of I minus one is equal to Y minus one. Then, I'm sorry, each of these ovals is a block of length root L, I'm sorry. Then they split from each other in the middle of one of these blocks. Okay, but then we don't care for a little while. We only care if we're like 16 blocks have gone down by and now we impose a distance condition. Okay, so this we get. These corollaries are all very immediate. There's nothing fancy that we get them immediately from what I've already really shown you. Okay, so notice that this lagged tree code, we really know how to build at constant rate. So this is really what we want in terms of rate. It's really what we want in terms of distance except for the lag. The L was a parameter here, right? The amount of lag was a parameter. So the finishing step here is straightforward. We're gonna combine lag tree codes of all different scales. But the main idea to, the main thing to realize about the calculation here is, what does it mean by all scales? Every time you can go from a scale L up to a scale L squared. So you can build a tree code. If you build a tree code of depth L with lag square root of L, the next kind of tree code you need to build only needs to be able to deal with lag about L. So it can give you a depth L squared. Now you've dealt with L squared and you only need to go up to next thing you need to build only has to deal with when the lag was L squared. So that means you get to go up to L to the fourth and so forth. So there's really only log log in scales you have to deal with. So the whole construction is in this sort of colorful book laying here at the very bottom level, which is here depicted at the top. You just do periodically some explicit tree code of depth two to the 20. So that's a finite object. You build it once and for all. And then you have those kind of alternating at the first kind of truck pair of tracks. Then you have your first level and every time you, for every distinct track you have a tree that's of depth two to the two to the K plus one and lag two to the two to the K. And you just do this log log in times and that gives you distance one third because no matter where and how far apart X and Y split from each other, there exists one of these tracks of some color if you will in this picture at which the desired distance one third is achieved. All right, that was probably a bit fast but I wanted to focus more on the algebraic stuff, which is I think deeper or more interesting in there. Is that clear enough questions on that? Okay, so just to keep in mind, why don't we need usually in this type of construction and there's an analogy here if you caught my description of that old scheme with Kluberman and Evans, there was also this kind of thing where you're encoding at all different levels but there wasn't this notion of a lag. That was an idea that didn't exist back then. This notion of a lag and the fact that we can actually explicitly construct for the case where the lag is square root of the depth is the key to why we only need to do this log log in times instead of log in scales as was in that simpler construction. Okay. So if there are no questions on that, I will, this is my last slide, I will just mention some of the questions and first I'll mention there's an even stronger property you could ask for. Tree codes ensure the following condition. At any time, T, if for every suffix up till now you have received most of the code word correctly, fraction one minus delta, then you can recover the correct message. Okay, that's exactly what tree codes give you. There's something even stronger that you could ask for. It seems really reasonable. If you're listening to me talk and you take off your headphones for a minute and talk to somebody else and then you come back in and resume listening to me, you'll understand what I'm saying just fine, right? You might be missing some context but you'll certainly understand the sentence and I'm saying from when you put the headphones back on. Okay. So it's an analogous thing you could ask from tree codes. You could say, suppose that there just is some suffix of the message in which you receive one minus delta of the message correctly. Can Bob who's listening recover what Alice's last, you know, characters were? Can he recover what she was saying in that suffix of the message? Maybe he can't recover the earlier stuff that was, there was too much corruption but just about that last part. It's a very reasonable thing to ask for but we can prove this is with Gilt coin and with Peter Stereostava. We can prove that you cannot beat rate one over log in for that property. And in fact, this property is achieved by that elementary construction I mentioned earlier. Okay. So there's already a gap here. We still haven't achieved constant rate but this is already achieving something that cannot be achieved for the, it's a sort of real separation of what these different kinds of constructions can give you in terms of definitions. All right. Let me just mention the open questions. Efficient decoding of adversarial noise. So stochastic noise is not really a problem and hasn't sort of never been but efficiently decoding adversarial noise is seems to be a really tough problem, beautiful problem. You're sure naturally want to attack this with brilliant Welch style ideas but they don't work directly anyway. Something we've devoted some thought to. But it remains open. You of course want to reduce the alphabet size further. And you could just, you could keep knocking down the function of N or you could head right for a constant and constant exists. And I think it's realistic but it won't, it won't, it's great challenge. That's all I have to say. Okay. Thank you. Thank you, Leonard. And it's time for a few questions. I already see Clemence setting up. Clemence, do you want to ask anything? Something. I have a question. I mean, yeah. So the kind of uncertainty principle that you mentioned. So is there some analog known for say finite fields or other context? There's this theorem of Chipotaro but I sort of just flashed it up there. I mean, there's not that much reason to say this to go back to slide. There's a theorem of Chipotaro from maybe 30s, 40s. That's been rediscovered about half a dozen times but it's, but each person who's rediscovered it has been very clever that says this is true of the Fourier matrix over Z modulo of prime, over Z mod p. Okay. So it can't be true when you have non-trivial additive subgroups but in the case of Z mod p, it is true. That's the one case that I know of and over the last few months I've been talking about this in various people. That's the only one I've heard of since then. Oh, should I turn on my screen so people can see me or what's better? I'm just, I see a question. Is this a standard conjecture in analytic number three or a conjecture you pose? No, it's not a standard conjecture in analytic number three. Okay. So, sorry, this is a question I must have missed before. Let me just, well, the conjecture that one needs for the construction with Chris Moore is as far as we know, neither implied by nor implies. Standard conjectures on exponential sums. Maybe I'll just say this briefly for the people in the know and I'm happy to explain in greater length, maybe, well, I don't know, it depends on the audience once, but the usual bounds of Borghain and Kunyagin and Sperlinski and some other people are something like this. You have an exponential sum over Z mod p and you take, let's say, square root of p terms and you take their average, then that's enough terms. Once you go past square root of p, the average starts tending to zero, okay? And these are incomparable conjectures in which you take, so it's a slightly different, there's some differences in the exponential sum itself, which I should probably only explain offline, but these are very incomparable conjectures because you take, those are called short exponential sums when you take, let's say, square root of n or something like that, number of terms. Here we want to take only log n terms or log p terms in that language, so it's much, much, much, much shorter. We do not require that the sum 10 to zero, however, or the average 10 to zero, only that it be bounded away from one, so it would be strictly less than 0.9 or something. So, yeah, so that's the answer to that question. It's neither, it's not a standard conjecture, but it seems like a believable conjecture. I've talked to some people in the area and but it's out of reach of current methods. I have some question about one of your lemmas. So when you bound the size of the coefficients, you get something of the form like two to the, like the upper bound is two to the L times max of z zero to z squared to z t square. So is that tight? And if not, do you think that could be improved and what could that, like, would that immediately lead to some improvement? The two to the t is tight. I don't remember offhand whether the quadratic, it's tight enough, you know, you're not gonna get something completely different qualitatively. So you would need, yeah, so it's tight for this construction look, you know? I mean, one should be creative, but yeah, because you're looking at things like, you know, t choose t over two or something like that. So you can sort of naturally get an experience. Thanks. Since we're already a bit late, let me thank you just once more. Thanks for the great talk Leonard and remind everyone that next talk is next week. Not in two weeks and the speaker would be Michael Kearns and that would be the last talk of the spring. So everyone is welcome to stay and take us offline. So again, thanks everyone for attending and also those who watched us on YouTube live. So I'm going to take us offline right now.