 I'll tell you about something that I consider very interesting in the arena of sort of algorithmic results in coding theory. This is based on some work of Swastik Kopati, Shubhangi Sarraf and Sargeya Khanen. Swastik and Shubhangi were students at MIT in those days, I think, but all this work was done at Microsoft in Silicon Valley. So let me start with a very gentle introduction to error correcting codes. These are objects, combinatorial objects that we use to store and recover information under the presence of noise. And the basic notation that we will be using today is that there is an encoding function which takes some message, let me in fact include the terminology here. So there is a domain of this encoding function which is what we call the set of messages. And this encoding function takes the set of messages, it's an injective map, but it's in fact going into a much, much larger space, n is much larger than k. And so the set of things that you map to over here, the ones that actually have pre-images are what we call the code words. And now the rate of a particular code is this ratio of k over n and in codes that we like to design and use, we would like this quantity to be as high as possible. Just to give you a sense, I think every sort of storage media that we know of today uses some form of error correction. And pretty much all of them are working with explicit read Solomon codes or some variations thereof. And their rates are in the 80 to 90% regime, so pretty high. And the distance of a code is in order to make sure that you recover from errors, that you even have the information available to recover from errors, you need to make sure that if you'd have two different messages, two different elements of sigma to the k, then the encoding maps them to strings which are very far apart. How do you measure how far apart things are? It's this hamming distance concept. Throughout the talk today we will be normalizing it and this has become more or less the norm in the theoretical computer science literature that we talk about the relative distance between x and y is the fraction of coordinates on which they are different. So this is a very simple notion of a measure. If x and y are different you don't ask how different are they, they're just either equal or not equal. And the minimum distance of a code is this quantity delta of c which is the minimum over all possible distinct messages of the distance between the encoding of the message u and the encoding of the message v. So this is a quantity that's fundamental to a code. In a sense what it's saying is that if an adversary picks two different messages u and v and wants to confuse you then you can make u, the encoding of u look like the encoding of v by changing only a delta fraction of the symbols. So in the codes of interest we would like rates of codes to be sort of asymptotically growing family of codes, one for each n let's say. And we would like the rate of the sequence to be sort of always greater than some fixed constant and the delta also to be greater than some fixed constant. This is what we would like in these codes. So these are the codes that are of interest. Now coding theory has always had associated with it some very basic algorithmic questions and these are the first things that you would think of and you say think about its usage in error correction in making information reliable. You would like to store some information then you would like to know given that information what should I actually store, how should I encode it? So given the message you would like to compute the encoding function e on that message. A basic task once you have this say compact disk which on which you stored information is to say well is it now corrupted or is it reliable? That corresponds to the question of error detection or testing. So if you are given some word w in sigma to the n, you can ask the question is w a member of this code or not? So that's the very basic version of the question, a more sophisticated version of the question and significantly harder in most cases is to ask the question how far is it from the code. So what fraction of symbols in w need to be changed in order to get to some string x which is actually in the code. And you can be asked to measure this quantity approximately also. So that's a reasonable thing to ask. These are the kinds of things questions that we will be that sort of hover around the topic of this talk today. And then there is the error correction question which is given this w which is promised not to be too far from the code. So there is a string x such that the distance between w and x is small, smaller than some constant delta. You can ask the question, now compute x for me. And we will always consider settings today where this x is determined uniquely because two things, any pair of things in the code are sufficiently far. You won't have two different elements x1 and x2 which both satisfy this equality. That's the choice of delta that we will be working with today. So these things I hope are clear because these are the kinds of questions we will be talking about today. Now, so I should say a little bit, these are the algorithmic questions in coding theory, what has happened in the literature on coding theory. Most of the literature started in 1948 and 50 in the works of Shannon and Hamming. And Hamming's theory was more or less sort of considered more constructive, but really from our point of view, there was no algorithmic tasks that were dealt with particularly efficiently over there. Shannon definitely sort of almost explicitly went in the non-algorithmic direction and solved every question that was of interest in sort of as much time as he could find. So exponential, double exponential times were all considered fine in those original papers. Very soon coding theorists started looking at efficient algorithms for some of these coding tasks. And coding typically ended up being very efficient for most codes that were of interest because very often this encoding map would be some linear map. It's just given by matrix multiplication. So it's efficient in the sense of being at least polynomial time. The first polynomial time algorithms for decoding codes, which were sort of in this range of parameters, happened in Nervat 1966 due to a result of 4D. And he gave some very interesting algorithms at the time over here. And subsequently, I may be used to certain in about 1972, made these codes very explicit in the process. The decoding algorithms were all polynomial time. Polynomial time might not be efficient enough for some of these tasks for us. So what about making it say linear time? That was done mostly by Spielman and by Sipser and Spielman who showed constructions of codes for which you could do all of these things in linear time, encoding could be done in linear time. And for certain constants delta over here, the error detection and the error correction question could also be solved in linear time. So we might say this is the end of the story, but no, we can go much further, which is what leads us to sublinear algorithmics. We would learn algorithms which run in time even less than linear in the input size or the output size. So if I give you a generic function mapping, say, A bits to B bits, can this function be actually computed in time which is little low of A and B? So this is the generic question that's typically asked in sublinear time algorithmics. And the obvious answer is, of course, no, but we don't like this answer. So we'll go to something else and we'll say, okay, how should I change the question so as to get the answer that we like? The answer that we like is yes. So in sublinear time algorithmics, pretty much says these are some basic conditions are necessary and very often they are also sufficient. So what are these conditions? Well, you don't want in this setting to ask to output a long string of length B. But you just say, can I determine any coordinate of this string efficiently? So now there goes your B as a lower bound of the running time. So maybe you can go a little low of B. Similarly, you can do something very similar to the input. You can say rather than saying that the input is given as a long string given on a Turing tape, let's use a compact disk or some random access device, an oracle. And so we replace the input by an oracle and now you get to ask questions. What is the jth coordinate of this x? And you get back as answer x of j. So now you don't have to read the entire input either. It's available. If you want to ask if it's relevant to a given answer, maybe you read it. So an easy function to compute this way in sublinear time is the identity function, which says given i output x of i, very efficiently knowable. However, for more sophisticated tasks, this is not usually good enough. So one tends to have one more weakness or weakening rather than saying that I will compute the function f on the input that's given to me. I'll say I have computed the function f on some input which is very close. To the input that's actually given to me. So x tilde is going to be some string which is close to this x that's given. And that's all I will guarantee. So you weaken your guarantee this way. And once you rephrase your question with these three caveats, many, many interesting tasks can be solved in sublinear time. So here's a very brief history of where these kinds of results were first studied. Blum and Kanan, and then Blum and Kanan, sorry. And Blum, Luby, and Ruben felt initiated these kinds of questions in the context of what's called program checking. They were looking at a program as implicitly representing a large function. Say, let's look at the truth table of this program on every n-bit string as a possible input. Well, this is a two to the n-bit string as a thing. And you want to say, well, is it computing the correct n-bit string, two to the n-bit string, et cetera? So in that sense, it makes sense to say, well, I don't want to take time which is polynomial in two to the n. I want time which is polynomial in n. And those are the kinds of things that they were looking at. Then Blum, Luby, and Ruben felt gave very interesting extensions. These kinds of questions were also looked at in the context of what's called interactive proofs of probabilistically checkable proofs by Bob I. Fortlow and Lund about the same time, 1988, 1990, et cetera. And many interesting algorithms have been developed in the last 20 years for a large collection of graph theoretic properties for things like sorting and searching, et cetera, lots of statistics, entropy computations, high dimensional computational geometry. Many of these different arenas have now started seeing algorithms which are running in sublinear time in the length of the input. A very interesting thing is actually before almost all of these things started happening, we started seeing coding theoretic results. In coding theory, there is a very happy marriage of relevance of this question. I mean, imagine you have this stick which has about 32 gigs of storage in it. If I wanted to rewrite everything on it or read everything that's on this particular USB stick, it would take probably my laptop about 10 minutes. I don't know exactly how to do a particular, I've never actually done this computation, but probably 10 minutes of time. And you might want to say, well, look, suppose I wanted to find out if all this information on this disk is sort of intact. Is it correct or is there too many errors in that? It makes a lot of sense to ask this question. Can I detect errors very quickly? I don't want to take 10 minutes to figure out if this thing is roughly okay or not, I'd like to do it rapidly. Or, alternately, I want to read a particular file which is stored on this disk, I don't want to sort of read the entire information on this disk or stick and decode it. So these kinds of questions make a lot of sense. So let's ask the question, what happens when you mix sublinear time algorithms and coding theory? The three basic tasks that we considered, encoding, decoding, and codes. Let's see what happens to them. Encoding, it turns out, it's not reasonable to expect anything to happen in sublinear time. Why? Well, the very aspect of the nature of the question is such that you should not be able to do anything efficiently. Why? I want to change one bit of the message. Or if I want to change the message which is stored on the disk, the information that's on the disk in one coordinate or any number of coordinates, it doesn't matter. The properties of the error correcting code say that you must change at least 10% of the stick, right? Now, 10% of the stick is not little low of the length of the stick. So I cannot get little low of n for encoding. That's pretty much ruled out by definition of the problem. On the other hand, the two other questions that we thought about. So I give you the stick, and I say, is there too much errors or is that not? That's a testing question that makes a lot of sense to do in sublinear time. It can be done for many interesting codes. This was a large collection of initial results. And then how about decoding, saying, well, I want to recover some particular file on this disk. And I do it efficiently. Once again, there are many interesting codes for which this can be done. And the codes that admit efficient testing are what are called locally testable codes in the literature, and they are extensively studied. There is an almost parallel body of work with some initial overlap, but pretty soon there is divergence. Is a collection of codes which are called locally decodable codes where you can actually recover the information efficiently if you know that the number of errors is small. Now, for various technical reasons, these two questions do not turn out to be the same. In this talk today, I will be talking mostly about locally decodable codes. The codes that we will describe explicitly in the new results, we suspect are also locally testable, which is nobody's bother to prove that yet seems to be a question of lesser interest. But certainly, what we'll be focusing on is on that aspect. For a little while, I'll also talk to you about locally testable codes. Maybe there'll be one slide on why we started looking at these things. Okay, so in the rest of the talk, I'll give you the definition of what's a locally decodable code formally. So it's a combination of the decoding question and the sublinear time algorithmics. But let's put the two together and make sure the question is precise. I'll give you some background, some basic instruction and sort of describe a particular barrier that we were seeing in the past. And then I'll describe these recent constructions, which are very beautiful. I feel they're sort of really remarkably surprising. I mean, they were very surprising when I first realized that these things existed. But they are remarkably simple at the same time, which is sort of a very nice combination. You get surprised by something which is very simple. And so let's start with the definitions. So a code C with its associated encoding function will keep sort of switching back and forth between the encoding and the code itself. Is what's called decodable with delta fraction errors with L of n queries, let's say, or L of n locality. If there exists an algorithm, a decoding algorithm D, which does the following. It's given as input, a coordinate that you want to recover. And this will be a coordinate of the message saying, here is a message which was encoded. I want it to know it's ith coordinate. And you're given an oracle for a corrupted encoding of this information. So w is an oracle, is a function which maps the coordinates one through L to some letter of the alphabet. So you have oracle access to this function, you ask for w of j. You get back some letter of the alphabet and says this is what's written in the jth coordinate of the string. And the promise is that there is some message whose encoding is sort of at most delta far from the string that's written on over here, w, the string that's written on. And what this decoder should do is output the ith coordinate of the message m, m sub i. And it should do so, it's allowed to be probabilistic, but its error probability should be bounded away from one half. So at most one third, so at least one, two thirds probability, you get the right answer. And if you want, you can sort of boost this up by repetition, okay? And the important parameter here is this locality. How many queries are you allowed? It's only L queries to the oracle for w, okay? So this is the concept of interest, and this is what we're going to be talking about today, okay? Does this make sense? All right, so what's the history? Now, terms are even in 19, I think this is 1950 and 1955 are these two papers. It's really old papers, Reed and Muller, two different papers. Reed, I think, defined the code and then Muller gave the decoding algorithm. These are called Reed-Muller codes. These codes were actually, in the sense, in spirit these were local decoding algorithms. Now, the fact that these were local decoding algorithms is very, very, very implicit in the paper. It's somewhere buried in the analysis. So we would probably call it proper, well-written proof for local decodability, but the intuition is there. It's a simple enough property that it does, simple enough code and the simple enough decoding algorithm that you can verify that it would satisfy all the expectations that you would have. The modern era started with the work of Bob I. Fortlow learned in Segady in 1990, who basically started establishing, in fact, probably where one of the first set of papers to start connecting up computer science to coding theory significantly. I'm sorry, that should have been Levin, not Lund. Levin is, in fact, one of the principle people behind this marriage of the two theories of error-correcting codes and theoretical computer science. So I'm sorry about that, that should have been Levin. So the construction of the implied definitions, but even while they gave constructions which were clearly satisfying the axioms of locality and so on, the definitions were very, very implicit. And it took a much longer time before, so in some work with Travisan and Vadan, and then later Katz and Travisan, two different motivations we actually gave explicit definitions of locally decodable codes. The reason we came up with this definition here was to show that any locally decodable code was usable to get some complexity theoretic effects. So usually a good reason for you to be able to define what's a locally decodable code to be able to say, for every locally decodable code you get something interesting. And Katz and Travisan also had this reason to define it. They said, well, every locally decodable code shows some limitations on its rate and so on. So these are two reasons to introduce definitions. Whenever you see a for all quantifier, you need to know what it's quantifying over. You have to have a definition. So these things sort of happened in around 2000. And in between, there were many constructions, by the way. Between 1990 and 2000, there were many constructions of these objects, but nobody bothered to define them. So why were people going around working with these codes? So let me start with locally testable codes very briefly. There are these objects that we've been studying extensively and called probabilistically checkable proofs. And these are intimately related to combinatorial optimization and our ability to understand why they cannot be solved well, approximately in many cases. PCPs, it turns out, are sort of complexity theoretic analogs of locally testable codes. Locally testable codes are combinatorial objects. PCPs are their complexity theoretic analogs. There's a very, very, very close relationship between them, but it's not formal. So there is no theorem which says a PCP, existence of a PCP implies the existence of a LTC with corresponding parameters. There's no theorem which goes in the other direction, but these two objects have been very intimately connected with each other. Every time we want to construct a PCP, we look around, try to find a good locally testable code, and vice versa. Every time we find a PCP of some different parameters, we say, oh, could I have also gotten a locally testable code by these things? The current state of the art over here is pretty good, but not great. So you can find codes which are testable with just constant number of queries. So the number of queries that you make into these objects is a constant independent of the message length, which is remarkable. And the penalty that you pay is not linear, which is what coding theorists would really, really love, but right now, the best ones are coming pretty close. So k bits of message would be mapped to k times polylog k bits of encoding. This is pretty strong results over here. This thing was based on some work with Ben-Sasson. It comes up in the paper of Erie De Nure. Now, this is all I'm going to say about locally testable codes. The objects that we're going to be interested from now on are going to be these locally decodable codes, and these are the ones where you say, well, I want to recover some coordinate of the message, and I'd like to do that quickly. What do we know about them? Here are some sort of complexity theory, cryptographic motivations for studying locally decodable codes. This is what they've been used for in the literature so far. There is a result going back to, I think, observations of level, and then Russell and Paliazzo, and then I formalize this in some surveys, which says that, well, whenever you want to construct, so complexity theory is often asking questions like, how can I create a function which is very hard? Now, if I give you a general n-bit function which is hard, can I create a function which is Boolean, which is sort of relatedly hard, about as hard, represents the essence of the hardness of this thing? This is a study which has been very important and useful in cryptography, in the foundations of cryptography. It leads to something called hardcore predicates, and turns out locally decodable codes are very useful in getting a very modular reduction between these two classes of questions. I won't define them, but I want to make a point which I'll get to shortly. They've also been used in creating what's called hardness amplification. I start with a function which I know to be hard on worst case instances. I'd like to get a function now which is hard even when I pick random inputs to the problem. How can I do it? Well, coding theory comes in useful over here. It builds sufficiently good locally decodable codes. They lead to constructions of these things. Another application of locally decodable codes has been in schemes called for private information retrieval. I would like to store information in multiple databases which are not pairwise, not talking to each other, and recover some information from these databases. Say I want to do a patent search, and I want to find out about what's known about the utility of using aspirin to solve certain forms of cancer. Well, maybe I don't want to reveal the nature of the query to these patent search engines because then they might figure out, oh, this might be an interesting deploy, and if Microsoft is interested in it, maybe we should be too. So we would like to hide our queries, and private information retrieval schemes are designed to do that. And they, again, turn out to have an intimate connection to locally decodable codes. So what's the point of all of these things, all of these applications, they're all basically disabling results. They're not enabling results. Each one of these says locally decodable codes can be used to not do something, in order to create a system where you cannot do something. What about the more obvious application that we thought about, saying, look, I have this stick, and it has a lot of information, I'd like to recover information quickly. We've never ever, other than the theoretical possibility that you could use codes to do this, we've never actually, I mean, this has never been, nobody ever proposes a scheme and says, well, now I will try to use it in practice. This has not happened so far. Today's talk, I think, should change this. It's really, I think, a game changer in that sense. And why is that? And the reason is very simple. The best known locally decodable codes that we knew till now, with some linear time decoding, had rate at most one half, 50%. So you have some amount of information. You have to at least double it when you store things in order to get any local decodability. 151%, we did not know of a single scheme. And in fact, when you think about this question of local decodability, that the message should be recoverable, no matter where there are errors, you would really think maybe, I look in the first half of the code, or the first 10% of the code, I should be able to recover the bit number one. I look at the second 10% or anywhere the errors are, I should still be able to recover this information. So the information about this one bit is all over the place. You think such a scheme should be inherently redundant. And these kinds of the constructions that we knew supported this theory. So the best known locality with some linear decoding, the some linear that you could get over here was a good some linear amount, not n over log n or n over something but square root of n. But if you wanted to do better, say I wanted to enter the one third, well, your rate dropped. You wanted to enter the 0.1, your rate drops even more. And how was the rate? Well, if you want to enter the epsilon, the rate that you get, the ratio of k over n, the best known codes had epsilon to the one over epsilon. So if you want n to the 0.1 as your decoding complexity, that's a forbidding rate. This is like 10 to the 10. At this stage, you're not going to be using this code. So clearly, all these codes that we were looking at were inherently limited from the point of view of anything other than use and complexity theory. I mean, this is not going to be useful in a practical storage device. So there are some provable lower bounds, but they are very weak. So Katz and Trevisan showed that if you want to decode with L queries, where L could be any function of n, but if you put L greater than log n, you don't get anything interesting here. Then you need at least n to the 1 plus 1 over L as the, sorry, n should be at least k to the 1 plus 1 over L. n should be k to the 1 plus 1 over L. So if you're thinking of L as some constant, I want to recover the information with 15 queries, well, n has to be at least k to the 16 over 15. But that's the most that was known. It doesn't rule out anything when you're saying, well, I want to recover with n to the 0.1 queries. There's no lower bounds over here. Even log n queries, in principle, you could get linear rate at very, very good linear rate. And like I mentioned earlier, practical settings, we're thinking of codes of rate, 80%, 90%. Now, I don't necessarily insist that this is the only reasonable rate at which practice could work, but that's where they're looking at right now. Almost any known storage device, if it's not using things at this rate, I would suspect they're not considered sort of usable. So what are some basic constructions? Where did this rate barrier of 1 half come from? So let me tell you what these basic constructions are, and then move along to the new results. So in this talk, I will sort of switch from locally decodable codes to an even stronger concept called self-correctable codes. What are self-correctable codes? Rather than saying I want to recover one coordinate of the message, I will say I want to recover one coordinate of the code word. Now, this is not a very big difference, but it makes things a lot simpler for us when we are explaining things, because it's much easier to not worry about how did you represent the message. We just say, OK, look, if this is a set of code words, then what can we do with it? And so it's a simpler concept to look at self-correctable codes, and it implies the existence of an encoding function that leads to a locally decodable code. So it doesn't really change the question much. It just makes it a little bit harder, supposedly, potentially, but for all the constructions that we see, this is an easier question to describe, easier task to describe. So this is the concept that we are going to be looking at. Why is this a stronger object? If you give me an error-correcting code, and let's say it's a linear error-correcting code. So the set of code words is linear, then there is a subset of coordinates which are linearly independent. And if you took this subset of coordinates to be your message locations, then there is a linear map, E, which will map from this set of coordinates to the full thing. And now, recovering any coordinate of the code will, in particular, also recover any coordinate of the message. What makes this easier is in many of these practical, these codes that we will be talking about, figuring out which subset of coordinates is actually linearly independent, take some work. It's only linear algebra, but it takes some work, which we don't need to do, and we don't want to do. So we won't talk about it. Thanks. So this is the concept that we'll be talking about, and let's describe to you the most basic and interesting class of locally decodable codes. These codes are also known to be locally testable, but we won't talk about that today. These codes come from multivariate polynomials, and use a very, very basic property of multivariate polynomials to prove that they are error-correcting codes, and relatively elementary property also to prove that they are actually locally decodable. So what are these codes described by? In order to describe the codes, I have to pick three parameters. One of these parameters is a finite field, a field of size q. So f sub q denotes a field of size q. Then I pick two integers, m and d. m is the number of variables, and d is the degree parameter. The message space, the set of all messages, is a one to one correspondence with the set of all polynomials and m variables of degree at most d over this field of size q. Throughout the basic portion of the talk, I will be talking about polynomials of degree less than q. Later we'll change that. Now the encoding of a message is very simple. So the message, a message is a polynomial in m variables. Its degree is at most d, which means, so the degree of x times y squared is 3. So this is the total degree function. So there are no polynomials of degree more than d in this thing. The encoding is going to be all the evaluation of this polynomial over the entire space. Notice we are working over a finite field. So there's only finite number of points in the space. You evaluate it over the entire field to the m, f q to the m. Now the resulting parameters that you get, whoops, typo. The resulting parameters of the code. So what is the length of the code? It's q to the m. You're evaluating this function everywhere. What is the alphabet? I forgot to mention that the alphabet is the finite field. Every evaluation is some value in the field. And what is the dimension of the code? How big is the message space? Well how many coefficients do you have? In m variables of degree at most d. It's a very simple quantity to compute. And it turns out that the number of coefficients is exactly m plus d choose d. So you get this quality over here. So the message space is m plus d choose d. n is q to the m. And the distance that you get out of this is at least 1 minus d over q. This is what we call a Schwarz-Zippel lemma in our community. So sort of an age old factor about polynomials. If I take a degree d non-zero polynomial and evaluate it at a random point, then it will evaluate to 0 with probability at most d over q. And that leads immediately to the statement which says that two different polynomials agree with probability at most d over q. They disagree on at least 1 minus d over q fraction of the input space. So that's the distance. Now you can also use d greater than q with care. In these cases, in fact, the original paper of Reed Muller that I described did do it. The original Reed Muller course were obtained by setting q equal to 2 over here, but d much larger. So where do you get local? So we've analyzed all the classical parameters. The rate is whatever k over n you get. And we will talk about it in explicit cases shortly. The distance is 1 minus d over q. If you want this quantity to be some absolute constant bounded away from 0, then you pick d to be like 1 minus delta times q. So if you bound it away from q, then you get this quantity is some constant. Now what about locality? How do you do local decoding of the function? So let's look at the problem statement. What is the problem statement? So let's say this is some fq to the m. And what is a code word? A code word is a value on fq written at each one of these points in the space. So there are q to the m points in the space. At each point, I'll give you a value, saying this is the value of the code word. What's a received word? The word w that we usually talk about. Now I'm calling it some function f. It's the value of the code word, but it has been corrupted in a few places. Some tiny constant fraction of the places. And let's say sort of pictorially in this set of locations, its value has been changed. Of course, the decoding algorithm does not know this. It doesn't know where things have been changed. The recovery question is simple. The self-correction question, what is it? It says, I give you a point. Let's say this particular point, which happens to lie in the corrupted space. This could have been somewhere else. And I want to know what is the value of the code word, not the received word at this location. So the reason I put it in the corrupted set is just to make sure you understand that I can't just read it off from the value of the received word. If I read it from the value of the received word, I'll get some corrupted value. An adversary chooses what is the set of errors and an adversary chooses where I want the recovery to happen. These are for all quantifiers. So I cannot assume that that location will not be an error. Now we want to find out how to decode this thing. How do I find out what is the value of the polynomial p at this point where I have an oracle access to the function f, which is usually equal to p, but not over here. So the idea over here is very simple. What we'll do is pick a random line passing through this point. The random line is a very well-defined, simple algebraic object. We'll look at the function restricted to that line. And what happens to the function restricted to the line? It turns out to be a degree D polynomial in one variable. And I'll say, okay, let me read the value of this function f everywhere on this line. So that's q queries to the value of this function. q is much smaller than q to the m, as long as m is greater than one. And m greater than one will start becoming an important issue later. And so you made q queries. Now you say, well, what do I want to find out? What is the value of a, and if it's a random line, it shouldn't be passing through this orange set too often. It'll pass through it, the adversary gets to pick it, but even if the adversary gets to pick it, it cannot make, you know, force too many of these values, too many orange points for too many lines. So for most lines, you get a very few number of orange points. And the question that you're now asked is, can I find the value of the degree D polynomial, which tends to agree with the value of f, often on this line? And this is a classical decoding question. It's called the Decoding of Reach Solvable Codes. It can be solved very efficiently, polynomial time, et cetera. So you can solve this question easily. And once you solve the question, you say, I know the value of the function p on this entire line as a polynomial, as an abstract object. I just evaluate it at this point to get the value. And the locality that I get is q, which is n to the one over m. Recall, n was q to the m, so it's, so you get n to the one over m, which is sublinear. So this is makes sense. So very simple, basic idea in locally decodable codes and works very, very clearly. And the analysis is all very simple. Now, question is what kind of parameters do we get? And here are some parameters that we've looked at in the literature. On one extreme, you can say, what is the maximum locality I can get for a decoding algorithm? And you get locality two, which is provably the best you can do with anything interesting. And the locality that you get that way is, so you get two locally decodable codes by setting the degree parameter to one. And you pick a field of size two. And your m now, I need to find enough degrees of freedom. I need to make sure m plus d choose d is at least k. Well, you just pick m equal to k and that gives leads to this value. And n equal to, this l e is a value of n which is two to the k field size to the power of the number of variables. So k bits get mapped to two to the k bits. That's a huge loss in rate. However, you get two local decodability. Now, for a long time, it was suspected actually that this kind of exponential dependence between l and k is inherent in any local constant query local decoding algorithm. Of course, we were not able to prove it. The lower bounds of Katz and Trevisan basically say that if you want three locally decodable codes, then this thing needs to be at least k square or something. So very weak does not prove anything close to exponential, but it was long suspected. And there was another family of breakthrough results in the last five years or so, which led to some exponential codes over here. I won't be talking about that today. How about if we switch from saying, look, I don't care about the best possible locality to saying some sublinear locality. How can I get that? Well, like I said, your n is q to the m and the query complexity is q. If you want to get anything sublinear, little of n, then you need to make m equal to at least two. So let's take m equal to two. What do you get? Well, the degree should be sort of bounded away from the field size, or else you don't have distance in the code. So let's say the degree is like one minus two delta times. So I just called it two delta for some reason because I want to correct delta fraction of errors. And if you do that, the parameters roughly work out to be the following. n is q squared and k is roughly one minus two delta times q squared, the whole quantity squared divided by two. So d and q are roughly the same. You know, just ignore the delta, set it to zero for now if you want. But d and q are roughly the same, k is q squared over two, l is q squared. So the rate that you can get is at most one half. You can get as close to one half as you want. You don't even get one half, but you can get an arbitrarily close and you get a locality of square root of l, that's it. But we can't go beyond rate one half. And really there was no idea out there saying how can you go beyond this very simple thing? We came up with lots of very sophisticated nice codes, but not in this regime. The rate one half remained intact. And of course, like I said, you can get other parameters by just setting m equal to one over epsilon, you get locality n to the epsilon at rate epsilon to the one over epsilon, as I mentioned earlier. So things get worse rapidly. It's also interesting, yeah, square root of n, you can get this much, but what if I want cube root of n? The best rate is one sixth, et cetera, et cetera. So you decay like a factorial. And like I said, yeah, the degree needs to be at most q, m is at least two, k is at most q squared over two, and n is q squared. So there's no way we can get rate better than one half. And the breakthrough result of Coparty-Saraf and Jechallene is now you can basically get arbitrarily good rates. So pick your parameter alpha, and your rate will be one minus alpha. You can pick alpha to be arbitrarily close to one. Pick your parameter beta, again as close to zero as you want, and your decoding complexity, local locality will be n to the beta. And depending on your choice of alpha and beta, the fraction of errors will get smaller and smaller, but not ridiculously small. I think delta is like alpha times beta divided by eight, something, some fixed constant. And you have these codes for every one of these parameters. There are codes at any n that you choose. There are codes of length n, which have rate at least one minus alpha, and it's n to the beta locally decodable from delta fraction errors. So very, very beautiful. I mean, n to the point one you want with rate 99% you can get it. This is the one for which in the past we were thinking of rate 10 to the minus 10. So it's definitely very, very strong. And one of the interesting things is, I mean, so far, you know, locally decodable codes, if somebody said, okay, give me an example choice of parameters which works. I cannot think of anything interesting that I could convince anybody. Now, I think even concrete parameters that are coming out of these codes are actually interesting. So one of the parameters described in the paper roughly suggests the following. If I take my USB stick, it's like 30 to gig of information, and I decide that I want to implement a locally decodable code over here. I can use rate of about 80% or so. I think it's like 70% maybe for the choices that they describe. And what kind of speed ups do I get rather than looking at the entire stick in order to recover all the information on it to recover from errors in all this stick, you can do it sort of a 15,000 times faster. Well, that's a noticeable difference. Rather than 10 minutes, I'll be doing it in microseconds or milliseconds. So it's definitely a remarkable improvement. Actually sort of numbers which are making sense. So this is the first time that any code has attained this level of efficiency. So what are these codes? And the basic idea is really, yeah, we just go to work with multivariate polynomials, but we're going to do a little bit more sophisticated things with them. So rather than just saying encodings are going to be evaluations of the polynomials, I will also encode its partial derivatives. So why does this, or I will also include the evaluations of its partial derivatives. So this is sounding like a negative. I was already looking at rate at most one half. Now I'm going to give you even more information. How am I going to get higher rate out of this? Well, I'm now going to let the degree grow higher. Previously I was handicapped by using degree at most half of, at most the field size. Now I'll go up to degree which is say twice the field size. And why is this okay to use? So for example, if you're looking at a polynomial in two variables and you ask the question, how, what fraction of points can the polynomial in two variables have both, both evaluate to zero and have both its partial derivatives, one with respect to X, one with respect to Y evaluate to zeros. Well, this fraction of points is at most a degree of this polynomial divided by two times Q. Why? Because each one of these zeros counts roughly like two zeros of some function, right? I mean, so if you define the notion of two correctly, this is what we would get. So you get, you get to use degree of this polynomial which is twice the field size. And that starts giving you some things. So let's work out an example over here. I should mention, so what do we mean by partial derivatives? I won't give a formal definition yet. I might later, but I might not. So you should tell me whether I should stop, by the way. You're not going to do that, okay? Somebody should. So for every collection, so if I have an m-variate polynomial, I say I want to take i1 derivatives in the first variable or i2 derivatives in the second variable, i sub m derivatives in the mth variable. There is a way of taking derivatives. I won't tell you what this is right now. But this such a derivative would be considered derivative of order sum of these values, okay? i1 to i sub m. And what we can say is a polynomial p vanishes with multiplicity s at some point a1 through am. If it turns out that p of a1 to am is zero, which is the same as saying that it's zero derivative vanishes. But also all other partial derivatives of order at most s minus one vanish at this point. So there is a notion of defining these derivatives and once you define it, this is roughly what the definition corresponds to. And the multiplicity version of the Schwarz-Zippel lemma, which by the way, I mean, I'm sure it exists in the rich literature on algebra somewhere, but we've never seen it applied in computer science in the past or even clearly articulated in the coding theory literature in the past. Just says that if you ask the question, so what's the expected multiplicities of a polynomial at a random point in the space? So this is a strict strengthening of the quantities. This is a quantity which is strictly greater than the quantity that Schwarz-Zippel asked. They said, how often is a polynomial zero? Now we are asking, at each point, we'll count how often it is zero and then ask what is the expectation of this quantity? That quantity is at most the degree of the polynomial over q, okay? So this is the lemma that one can use for this appropriate notion of a derivative. So this sort of, it should be familiar from your background and analysis, though the notion of a derivative is slightly different. Now, given that lemma, this is how one could start constructing an error-correcting code, a locally decodable error-correcting code. We will choose the bivariate example again. I will choose degree roughly, oops, that should have been twice, one minus delta, two delta times q, okay? So twice q, roughly twice q, and I will use multiplicities two and the multiplicity parameter will influence the results in ways that I'll describe shortly. Now this is a code which is not going to be over the alphabet of fq, but rather over three tuples of this alphabet. It's a slight extension, but this kind of cheating is very legitimate and very useful to do encoding theories. You can easily derive other interesting results out of this. So now, so every, when I ask, what is the value of the code at a particular coordinate, I get three values from fq, okay? What's the message? It's still m variate polynomials of degree at most d. And recall, d is twice q now, roughly. Encoding now is something more. Encoding is the evaluation of a polynomial of its partial derivative with respect to the first variable, which I'm denoting p sub x, and partial derivative with respect to the second variable, denoted p sub y, all of these evaluated at a and b. So this three tuple associated with a, b is considered one coordinate of the code. For every choice of a, b, I get another such evaluation. So that gives me a code of length q square, okay? So that didn't change. Now the k is not quite d squared over two, which is what we used to have. It's going to be one third of d squared over two. Why? d squared over two is the number of coefficients, number of values of fq. If I measure that as number of symbols of the alphabet that I'm working with, well, the symbols of the alphabet is fq cubed, okay? So I have to divide by three. This three is the same as this three, okay? And so I get codes of length one third of d squared over two. And now if you look at, you know, d is two cubed. So you get, this is basically two times q squared, two third q squared is the maximum k I can think of. That's rate two thirds. We're already over the one third barrier, the one half barrier. And the question is what about locality? Can we actually decode these codes locally? So that's the question that we want to answer. And once we do, we get sub linear locality at rate greater than one half, all right? So I'll do a few slides, then we'll jump to the conclusions. So reconstructing this polynomial from, at a given point. So if I have this polynomial p which was my message and I want to evaluate it at AB, how do I do it? So this is basically the same idea as previously. So let's start with assuming there's no errors, but I just want to do sort of for, I want to recover its value from random values of F of AB, F rather than from F of AB. So what do you do? Well, you still decode along lines, you pick a random line which is of the form alpha times T plus A, beta times T plus A, where T is a parameter and the T is a value in the field. If I set T equal to zero, I get the point AB, but other values of T take me to other points in the space. I define the function g of T and this is the polynomial of degree at most D. If I look at the value of F over all choices of T, that gives me q evaluations of this polynomial, but q is not sufficient because D is much larger than q. D is like two times q. But I also have the derivatives of g, I claim. And why is that? Well, the derivative of g, g is a univariate function, there is only one derivative it has. And that derivative is really alpha times the x derivative of p plus beta times the y's derivative of p. And notice that all the points on the space, I also get the valuations of p sub x and p sub y. So I just multiply them by alpha and beta and get the values of g at all the points, g prime, the derivative of g at all the points in the space. Now I have q evaluations of g and I have q more evaluations of g prime. This certainly is enough to recover the value of a degree to q polynomial. And so we can reconstruct the value of the function from all the things when they are correct. What happens when I start having corruptions? Some fraction of values change, not much. So the things in red are what changed relative to the previous slide. So now I'm decoding with errors. So I have a function F, which is usually equal to p, but not always. All of these things remain the same. Well, but I don't get the values of g everywhere on the line. I only get it for most points of the line. So that means that the question of recovering the function g is no longer an interpolation question. It's a decoding question. It's a decoding question of the same algebraic way that this paper actually gives you an algorithm to decode it. All right. That is not all that we needed to do. Why? Because it's not good enough to recover the value of the function everywhere. I also need to recover the value of its derivatives everywhere. That is what was the best, the encoding. So if I want to recover the code word everywhere, then I also have to recover all the derivatives, p sub x and p sub y. Now, one idea that doesn't work is to just say, oh, well, you know, p sub x is just another polynomial whose degree is at most d. We just recovered a polynomial of degree at most d. We recovered the polynomial from its values and the values of its derivatives. I don't have the derivatives of p sub x waiting for me. So I can't do that. But there's a better idea which works out very nicely, which is that, well, on the line l, I actually recovered g as well as g prime. Once you give me g, I can compute the derivatives. So I can recover g prime as well. g prime gives me evaluations of alpha times p sub x plus beta sub p sub y on all the points of the line. Right? That was the definition. That was how we used it in the past. Well, if I have alpha times p sub x plus beta times p sub y at AB, I can do the same idea. Pick another random line through AB. And now I'll get alpha prime times p sub x plus beta prime times p sub y for some new pair alpha prime and beta prime. If alpha beta is linearly dependent of alpha prime beta prime, then it's a two by two linear system which I can solve to compute p sub x and p sub y at A and B. So two lines, and I get local decodability over here. So now we are actually done. We have local decodability. We have square root of enquiries. So what else can we hope for over here? You can, now, this is only giving you a bit rate one-half. You got up to rate close to two-thirds, but we didn't get to rate one. How do you go to rate one? And how do you get better locality? So first, let's fix the locality. You want some more locality? OK, fine, increase the number of variables. This is what we always used to do, to increase the number of variables, and you can do it now. Get your locality down to as much as little as you want. Now that you've fixed the locality, you ask the question, how much of a rate do you want? And you increase the number of multiplicities. You take the second derivative, you take the third derivative, and you take the s derivatives, and you use them all. And the rate starts going up to one. It's remarkable. I mean, you just sort of choose these parameters freely and you get whatever you want. The paper actually has a bunch of very neat algebraic ideas to supplement all this, but I think this is good enough as a point to stop with the paper. So we're done with the proof. I'll do a couple of concluding thoughts, and then wrap up. Now I never defined what derivatives are. In these slides, you have copies, and you'll find out what derivatives are. So I don't want to tell you what they are, but it's a very simple, nice concept. One of the very interesting things, I mean, we'd been working with derivatives of polynomials for a while now, about 10 years we've been looking at it and staring at these things, trying to do different things with them analytically. A realization that never came to me till now is that derivatives are not locally computable. Look, if the derivatives were locally computable, then there's no point giving you derivatives of the polynomial in addition to the values of the polynomial, right? But on the other hand, we think about calculus. How do you define the derivative? It's the value of the function here minus the value of the function there divided by something. That's a two local computation. But it doesn't work. I mean, it's not any point along the sequence that I care about, it's the limiting point. And the limiting point, actually, is not locally computable. And that's something that's being used in a very positive way over here. So it's kind of, you know, you say, well, here is something which is kind of, looks bad about the derivatives, and that's absolutely the thing that makes it very, very, very useful over here. So this is a sort of, I find it very interesting, so on. So one of the things that has been going on in the past 10 years or so is we've seen increasing uses of this multiplicities of polynomials, derivatives of polynomials and multiplicities. And the thing that it tends to do always has been that it allows you to work with higher end, higher end, higher degree polynomials that we previously were able to work with. We would love to see more applications of this and also to get a better understanding of why this is happening and sort of been complaining about this for a while. Nobody's risen to this challenge of trying to explain to me why are these actually a good thing to be doing? Why was it a good thing to include the derivatives of this polynomial in various computations? And another very interesting theme that has been going on, starting from the work of some Fox 2005, I think, Parvadeshan Vardy, and then work of Guruswamy and Rudra, and so on, there have been increasing examples of very, very, very algebraic codes which are not strictly speaking linear codes. Linear codes always had as their alphabet a finite field. These are all have, as their alphabet, a vector over a finite field. And these are turning out to be very useful algebraically. And I mean, everything we did was algebra and yet we are working not over a finite field as an alphabet. And this is getting to be increasingly an interesting theme where I'm sort of impressed by it. I'd love to understand what else can we do in this regime. Now, some questions back to, so I mean, this has all been very positive. Multiplicity codes are useful. I want to get back to my negative traits. So I would love to say, for example, that these codes are locally testable. And why? Well, amongst other things, I would love to see probabilistically checkable proofs of these parameters. I mean, whose rate is just tiny more than the rate of the best possible proof. So I take a classical proof, I extend it by a tiny amount, get a probabilistically checkable proof, which I can check with some linear number of queries. Now, even defining this question is very subtle. Since I did define PCPs, I won't be able to do that here. But I think there exists a very nice definition, but we don't have a clue on how to make that construction, how to get constructions of that type. I think it will be a very nice thing to do. And finally, so yes, there's no rate barrier. So whether we're going to start seeing this in devices, that's an interesting question. Thank you. Oh, some intuition for what the hazard derivatives are. Actually, one doesn't need intuition. You can just sort of, you can ask the question, what do I want from this derivative? You look at the Schwarz-Zippel lemma. Or just look at this question. If I have a univariate polynomial, what should it mean to say some point A is an m-th root of it, is a s-th root? Sort of says x minus A to the power of s divides this polynomial. You look at that expression, you stare at it in a couple of different ways. The definition becomes pretty obvious what the derivative should be. And that's roughly how it's done in our papers. It's a very simple algebraic expression. It's very naturally motivated. Seems to have properties which are nice, simple, but nothing very subtle about it. One thing I'd say is, okay, derivatives, the one tricky thing about derivatives in these hazard derivatives, y, hazard derivatives, y are different from any classical derivatives, is the classical derivative, you think of, the second derivative of a function with respect to x is the first derivative of the first derivative. Does not happen with hazard derivatives. And that's again a very, very good thing because if you take the second derivative of any polynomial over gf2 to the k, it's zero in the classical sense. The first derivative will have only even degree polynomials and the second derivative will be zero. Hazard derivatives don't have this property. So yes, you can actually use second derivatives and third derivatives and do something interesting. So that's where these things become very nice and useful.