 So thanks very much Guy for the introduction and thank you all for being here and a big thank you to the organizers for inviting me, this is a tremendous opportunity to get to speak to you guys. So I'm going to talk to you today about cryptography, local decoding, and distributed storage. So what is this talk about? So the point of this talk is that I would like to advertise to you, the crypto community, some new notions of locality in error correcting codes. So in a bit more detail, locality is a notion in error correcting codes that has a rich history with cryptography going way, way back so you might be familiar with locally decodable codes which has tons of applications in crypto. However, several new notions of locality have emerged over the past five to ten years in coding theory, motivated primarily by distributed storage but also with some other motivations. And some of those have found applications in crypto or sort of crypto adjacent areas, but I believe there should be more. And so the point of this talk is I just want to kind of tell you about some of the work that's being done in this space and I hope that it will find even more uses in cryptography. I should say this is a survey talk so most of the results that I'm going to talk about are not mine, although I will talk about a little bit of my work. Okay, so the outline, so what's the plan? So first I told you that this talk is about locality in error correcting codes so I need to first tell you what that means, I'm also going to define error correcting codes, don't worry. And I'm going to start out with a little warm up which is kind of an example that you might be familiar with, locally decodable codes and private information retrieval. And so I'll spend just ten minutes or so on that. And then the bulk of the talk is going to be devoted to an example you might be less familiar with which is regenerating codes. So this is a one notion of locality that comes out of distributed storage so I'll kind of set that up, I'll tell you what regenerating codes are, I'll tell you a few results and a neat fact about polynomial interpolation which I hope will have applications in crypto and then I will wave my hands at a couple of crypto flavored applications of these. And time permitting, I'll give some pointers to some other notions of locality and cryptography which means that at the end I have a giant slide full of references and then I'll conclude. So that's the plan, great. Okay, so let's start, what is locality in error correcting codes? Okay, well it's okay, so first what are error correcting codes? I should do that first. Okay, so what are error correcting codes? So error correcting codes solve a communication problem between Alice and Bob. So here's Alice, here's Bob. And suppose that Alice has some message that she wants to send to Bob. So throughout this talk the message is going to have length k, I'm going to represent it with these blue boxes here. And the problem for Alice and Bob is that Alice's only way of communicating with Bob is through some noisy channel. So when she tries to send a message to Bob, it's going to get corrupted in some way which I will elaborate on later. So what does she do? She encodes her message. So she's going to take her message x and encode it as a code word c of length n. Sorry, here sigma is just meant to be some arbitrary alphabet. So you could think of sigma being 0 or 1 if you want. So these are just bit strings. So Alice is going to take her message of length k and encode it as a code word of length n where n is bigger than k. So you should think of this as adding redundancy to kind of protect it from the noise that is about to happen. And then she's going to take c and send that through the channel instead. And the channel's going to do something bad to it. So for this talk, I'm going to think about two different sorts of bad things that could happen, errors and erasures. So by an error, what I mean is that the channel will maybe switch a symbol from one thing to something that it ought to be, like flip a 0 to a 1 or a 1 to a 0, and Bob has no idea where these occurs. It just sort of flips things around. And erasure is where the channel replaces the symbol with a big red question mark. So Bob knows where it is, he just doesn't know what was there first. So erasures are kind of gentler than errors. And throughout this talk, I'm going to be thinking about adversarial noisy channels. And here adversarial just means that the channel can introduce as much error or erasures as it wants up to a particular budget. But it can do it in any way it wants in any computationally unbounded way that it wants. So for example, we might say that the channel could adversarily introduce up to a 10% or introduce errors into 10% of the positions or maybe just up to 17 erasures or it has some budget. And then what Bob does is Bob sees this corrupted code word, C twiddle, and decodes it. And his goal is to learn X or in some cases something about X. So this is sort of the setup. And our goals are twofold. So one, we want to handle as powerful an adversary as possible. That means we want the noise budget to be as kind of big as possible, or at least big enough for our applications. And we also want to make N as small as possible given K, right? So Alice kind of blows up her message of length K into a code word of length N. We want this blow up to be as small as possible to minimize the overhead. So those are going to be the two goals. Okay, so that's, all right, and also just a note. It's okay if the adversary learns X. There's no privacy or security requirement here. It's just that we want Bob to figure out what Alice meant to say correctly. Great, okay, so just to make sure that we're all on the same page, let me just do a quick dumb example just to get the notation straight. So a dumb example is a repetition code. So suppose that Alice has some message X over an alphabet 0, 1 that she wants to send to Bob. One thing she could decide to do as an encoding map is just repeat each bit three times, right? So she'd send 0, 1, 0 to 0, 0, 0, 1, 1, 0, 0, 0, that's her code word. And then she'd send that across the channel. Let's say for the sake of this example that the channel is allowed to introduce up to one error and it chooses to introduce it here. So it's flipped this 0 to a 1, although Bob doesn't know that. We know that, Bob doesn't know that. And now it's Bob's job to decode and figure out what Alice meant to say. Fortunately, in this case, it's pretty easy. Bob knows that Alice is just going to repeat each bit three times. So he just looks at the code word in chunks of three. Does majority vote and figures out what Alice meant to say. Great, okay, so how does this work on the goals? Okay, so this can handle an adversary that can introduce one error. And n is equal to 3k. So as you might have guessed from the title of the slide, which has the word dominant, this is not the best scheme you can have, not the best trade off between n and k you can get for this particular adversary. There are better examples out there. We'll see one later. But for now, the point of this example is just to sort of get the notation straight, so let's consider that done and move on. So are there any questions about the setup at this point? Just error correcting codes? And if there are any questions at any point, please feel free to interrupt. Okay, so now that I've defined error correcting codes, let me tell you what is locality in error correcting codes. Okay, so locality, picture starts out basically the same way. Alice has a message she wants to send to Bob. Some noisy channel is going to corrupt some stuff. But now the switch is that Bob does not actually care about recovering all of Alice's message. Instead, maybe he's just interested in a little piece of it. So maybe Bob just wants to recover one little symbol xi analysis message. So if this is an error correcting code like we saw in the previous slide, Bob could run the whole decoding algorithm, figure out all of x, and throw away all of it except for xi. That would work. That would be very wasteful. He's getting all this stuff that he doesn't really need. And so the hope is that maybe Bob could somehow get away with doing less work than that, maybe by not even looking at all of the corrupted code words from C twiddle. So maybe Bob just looks at a couple of locations or looks at just a few bits from this corrupted code word and tries to recover xi. So informally, that's sort of my informal definition of locality. So for any i, Bob should be able to learn this symbol xi without looking at too much information from C twiddle. Okay, so that is not a very precise definition. And in fact, there are many, many ways to instantiate it. What kind of corruption does the channel get to introduce? How much corruption? How should we quantify not much information on Bob's end? And depending on how you instantiate it, how you answer these questions, you get a whole suite of different notions of locality in coding theory. So for the purposes of this talk, I'm just going to kind of arrange them on this axis from powerful adversary to less powerful adversary. So on the powerful side, you have an adversary who's allowed to introduce a constant fraction of errors. So say 10% of the bits get flipped, something like that. So on the other side, you have a less powerful adversary who can introduce, let's say, a constant number of erasures. So even just say one symbol gets erased. This is a much weaker adversary. So on the strong side, there's a notion of locality called locally decodable codes, which you might be familiar with. So these have been around for a long time and they have sort of a rich history with crypto. So I'll talk about those first and kind of explain one of these connections. On the other side of the spectrum, and this is what I'll talk about later, are regenerating codes. So regenerating codes are motivated by distributed storage. And they actually come along with a whole bunch of different notions of locality kind of on this side of the spectrum, motivated by distributed storage and distributed computing. And those have, yeah, so they belong over here. And I'll talk about those second. That would be sort of the bulk of the talk. I should say there are also tons of other notions of locality that you could come up with. I've only picked sort of a sub-sample to put here on the slide. And probably I should have a few more axes to appropriately represent them. But I don't have time to talk about them all, so I'm just going to talk about these two highlighted ones. Okay, so let's get started with this example that you might already be familiar with, which is locally decodable codes and private information retrieval. It's actually just so I can get some context. Who here has seen this before? Okay, many people but not everyone. All right, I'll go moderately slowly. Okay, so what is a locally decodable code? So a locally decodable code is one way of instantiating this picture. And the way I'm going to choose to instantiate it is that the adversary gets to introduce a small constant fraction of errors. So let's say 0.01 fraction of the symbols that can just change to something else as it likes. And Bob is allowed to look at T different positions of the corrupted code word. Some of those might be an error, some of those might not be. Bob has no idea. Bob's queries are allowed to be randomized. And the requirement is that Bob should be able to figure out what Ellis wanted with probability, let's say at least two thirds over his randomized queries. Okay, so locally decodable codes, like I just said, adversary introduces a constant fraction of errors. Bob wants to recover a single symbol with high probability using just a few queries. Why might we want this? So there are tons of applications in theoretical computer science. But one application in crypto is private information retrieval. So let me sketch real quickly how this goes. So I'm going to introduce private information retrieval, though I probably don't need to for this audience. So what is private information retrieval? So the basic setup is you've got a client here who I'm still going to call Bob because he's wearing glasses and I only have the two stick figure images, I guess. And Bob wants to query a database for some element in the database, XI, say. One thing he could do is Bob could ask the database, hey, what is XI? And the database could send him back XI. So this is great, Bob gets the information he wants, but this is not at all private. If Bob is maybe a little embarrassed about his query, he doesn't want to reveal that he's really looking for XI. Then this violates Bob's privacy, okay? One thing we could do to get around this is Bob could just ask the database for all of the things. The database would give Bob all of the things and that's great. Bob gets what he wants, it preserves privacy. The downside is that this takes tons of communication. So the solution, hopefully, would be something that looks like this. Bob sends some query, which takes not too many bits to the database. The database sends Bob back some answer, also which takes not too many bits. And what we would like is that this scheme here is both private and correct. So private here means information theoretic privacy. So the distribution of this query queue should be independent of the identity of I, so the thing that Bob wants. And correct just means that Bob learns the thing that he wants. And additionally, we'd like to use little communication. So this L1 plus L2 should not be too big. The problem with this, as I've drawn it on the slide, is that this is impossible. Basically, I need basically a linear amount of communication to make this happen. So this is no better than the scheme where the database just sends all the contents to Bob. So there's two ways around this. One way is to introduce some cryptographic assumptions. That leads to a long and fruitful line of work that I'm not going to discuss here. Another way, if I still want information theoretic security, is to introduce T-server private information retrieval. So this was introduced by Chor Goldwright, Krishalev and Sudan in the late 90s. And basically the picture looks like this. Now, all of a sudden, Bob has three different databases that he could talk to. He's going to send queries to each of them. And they're going to send answers back. And we want the same sorts of guarantees. Information theoretic privacy means that each of these queries are themselves uniformly random. They might be correlated with each other. But each database, as long as it doesn't talk to the other guys, doesn't learn anything about what Bob wants. And then the answers should yield XI. And the important assumption here is that the databases are not communicated. And turns out that this is possible. And why am I talking about it now is because for some of these settings, the best private information retrieval schemes we have come from locally decodable codes. So let me just really quickly give a cartoon with the connection. So a T-query locally decodable code gives a T-server private information retrieval scheme. So how does this work? So let's suppose that here's Bob when he wants his query and here are these three copies of the database and they don't communicate with each other. Each one of these copies of the database is going to sort of in its head encode the database. It's going to treat it as a message for a locally decodable code. So it's going to take the database and encode it as a code word. And now Bob in his head is going to think, well, what queries would I query to this locally decodable code if I wanted to send it to a local or an XI? And then Bob is going to send those queries to the databases. The databases will find the appropriate queries here, here, here, and send them back to Bob. And then Bob will do whatever the locally decodable code said to get back the information that he's interested in. So I claim that this is that if I had a good locally decodable code to begin with, then I get a private information retrieval scheme on the other end. So first, why is it private? So for it to be private, I need these queries to be independently uniformly random, so that this database learns nothing about the thing that Bob is interested in. And this privacy comes from the fact that this adversary in the sort of locality and error correcting code setup is kind of nasty. It can introduce a constant fraction of errors. And what that means is that if Bob's queries were not basically uniformly random, then the adversary could just mess Bob up. So for example, if Bob really always queried this symbol or queried this symbol a bit more than he was supposed to, the adversary would just always mess up that symbol and Bob would be hosed. So you can make this precise, but basically the fact that this adversary is very strong means that these queries must be uniformly random, which implies privacy. And then, why do we have low communication? Well, remember the whole point, the whole goal in the coding theory setting was to make n as small as possible compared to k. We don't want too much overhead. And how many bits does Bob have to send to each of these databases? Well, he has to give an index into this code word, which takes log n bits. So if n is really small, then this is also more communication efficient. Okay, so that's the basic connection. And this leads to the best known three server PIR schemes, at least as far as I know. I'm due to Yucannon and Nefermanco and much follow-up work. Okay, so that's basically all I'm going to say about locally decodable codes and private information retrieval. Because as we saw by the show of hands earlier, I suspect this is review for many of you. If you are interested in learning more, here's some for the reading. Stay right at the red line. Sorry. Good. So the main point of my introducing this of my... I understand the point of the sign was so that then I wouldn't say it out loud, but okay, yeah, it's too late, it's too late. Back to the talk. The point of my talking about this is mostly both to sort of give a reminder that locality has been really useful in crypto historically. And also to serve as a point of contrast for what's going to come next. Okay, so what's going to come next, which is the bulk of this talk, is coding for distributed storage. And maybe here I'll have some examples that you might not have seen. Okay, so the main difference between notions of locality like locally decodable codes or locally correctable codes or locally testable codes or other things you might have heard of, or sort of these sort of older notions and newer notions which are coming out of distributed storage is that these newer notions have a much, much gentler adversary. So in particular, this very gentle adversary is going to introduce just a single erasure. So just one symbol is going to get erased. One small tweak from the previous slide. Now Bob is going to be interested in a symbol of the code word instead of a symbol of the message. That doesn't make a huge, huge difference. The main difference here is that this adversary is much more gentle. So this is going to be the setup for the rest of the talk. One might ask, why should I study this adversary? This adversary seems almost too easy to be interesting, right? Just one erasure. We're talking about handling a constant fraction of errors. This is so much easier. What's the point? So as I said earlier, the main motivation comes from distributed storage. So let me back up for a little bit and kind of talk about this setting there. So in distributed storage, the setting is as follows. So Alice has her message, which I'm now going to call a file, that she wants to store across a distributed storage system. So let's say n nodes, there, n storage nodes. And the model here in these big data warehouses is that you have tons and tons of really cheap machines and they are failing all the time, right? So Alice wants to protect her data against these possible failures. So what is she going to do? She's going to encode it with an error correcting code. And then she's going to take these symbols of her error correcting code, so n symbols, and she's going to ship them off to n different storage nodes, like that. And what is the point of this? The point is that even if a couple of these servers fail, or something bad happens to them, then no data is lost, right? That corresponds to a couple of erasures in the code word. We can decode the corrupted code word to recover the file. So no data is lost, even if a couple of nodes fail. So this is great. And in fact, this is what is used in most distributed storage systems, some sort of erasure coding. So I should say there's no locality yet in this picture. This is just a global statement. Let me just take a moment to digress and say, give an example of a code that is actually used in these systems. So a popular choice for erasure coding and distributed storage and also for coding in many, many other domains are Reed Solomon codes. So Reed Solomon codes have been around in the 60s. They're used all over the place. And here is what they look like. So the alphabet sigma is a finite field f of size q, where q is bigger than n. And I'm going to pick a bunch of points, alpha 1 through alpha n, in f, to start. Okay, so now how do I define the code? I'm going to start with my message. My message has length k, so it's k elements of a finite field. I'm going to treat these k elements of a finite field as a polynomial in the natural way, so my k elements become the k coefficients of my degree k minus 1 polynomial. And then how do I encode this message? I just take this polynomial and I evaluate it at the end points that I chose ahead of time, f of alpha 1, f of alpha 2, up through f of alpha n. So this is Reed Solomon codes. And Reed Solomon codes are used all over the place. And one reason is that they have this really nice property that if I encode my message this way, I can correct any n minus k erasures. So let's quickly see why this is the case. So suppose that there are n minus k erasures, that means that there are k evaluation points left. So I have k evaluation points of a degree k minus 1 polynomial, which is enough for me to do polynomial interpolation, right? Two points determine a line, three points determine a quadratic, k points determine a k minus 1 polynomial, degree polynomial. So then I can do polynomial interpolation and I recover everything. So this is a really strong property, right? I definitely couldn't handle n minus k plus 1 erasures, because then I'd have less information left than I wanted to recover. So this is the best you could do. So a code which satisfies this is called maximum distance separable, or MDS. I'm gonna be throwing this acronym around a little bit as we go on. And what MDS means just, again, to keep in mind, it means that it gets the optimal trade off between number of erasures you can handle n and k. Okay, so we solve them in codes, good? Okay, so back to distributed storage. So now, let's say we're using an MDS code. So most of these distributed storage systems do use MDS codes, because we don't want to lose data. I shouldn't say most, many do. And so they can correct n minus k erasures. But most of the time, it turns out, only one node fails at a time. So we do want to protect against this case that n minus k nodes might fail. But most of the time, only one fails, okay? And then we want to maintain the system, right? We want to set up a replacement node that can replace the node that has failed. And so here's my replacement node, I've suggestively named it Bob. And what does this replacement node want to do? Well, it wants to correct this one erasure. And it would like to do it with very little communication. Because in this setting, the communication between these nodes is really expensive. And so here's our locality. And here's why it makes sense to study this very weak seeming adversary who only introduces one erasure. And that's because we'd really like to minimize this communication. So returning to this picture, again, this motivates this weaker adversary where the adversary introduces a single erasure. And our goals are, okay, so Bob should be able to learn CI without too much information, the same locality goal we had before. And additionally, we'd like our code to be an MDS code, meaning that it can fix any n minus k erasures, so that our system is safe from catastrophe in the worst case. So this is sort of the setup that we might want in distributed storage. So for the rest of, or for the next little chunk of the talk, let's see how we might get this. So in order to investigate how we might get this, well, let's just try to build such a code. And so far in this talk, we have only seen one example. So clearly, that will be our starting point. So we know that Reed-Sellerman codes satisfy this MDS property. And so a natural question is, okay, do they satisfy this thing too? Let's find out. Okay, so now the problem becomes low communication decoding of Reed-Sellerman codes. So I promised you a fun polynomial interpolation puzzle at the beginning, this is it. So the problem now for Reed-Sellerman codes is as follows. Bob wants to know f of alpha i for some i for some polynomial f of degree k minus 1. Bob is not allowed to query this f of alpha i, but he is allowed to query other evaluation points. Question is, how many evaluation points does Bob need? So far this should not be that interesting a puzzle to you guys. Right, so the naive solution is let's make k queries. Then we can, of course, recover the entire polynomial and in particular this f of alpha i. Of course, the point of this talk is we want a better solution. Okay, so what's the better solution? Can we make k minus 1 queries? That's good, I'm seeing a c of noes. Yes, we cannot make k minus 1 queries, right? If k minus 1 queries actually tells Bob absolutely nothing about f of alpha i, right? If I tell you these are two points in y quadratic, tell me what the third is, like that point could be anywhere. So Bob learns, Bob learns nothing. So this might lead us to believe that the better solution does not exist, that we're just kind of hosed in this, in this attempt. So, but we can go forward and there's sort of two different ways to go forward at this juncture, both of which have led to really rich and beautiful literatures in distributed storage. So one thing we could do is relax this MDS requirement. So this gives rise to something called locally recoverable codes, also called locally repairable codes or local reconstruction codes, which fortunately all three have the same acronym, LRCs. So I don't have time to talk about everything in this space, but there's a really, really beautiful literature here. Here are a few references if you'd like to check it out. And I can answer questions later. The thing we're gonna do in this talk is do the other option, which is relax the locality requirement. And this is what we're gonna do, and it's gonna lead to something called regenerating codes. Okay, so next part of the talk, regenerating codes. So let's return to this problem, this polynomial interpolation problem. So we decided that Bob could not do any better in that he could not look at fewer than k queries here. So what are we gonna do? Well, one observation is that our goal here was to get a decoding map that used few bits. It was the communication complexity that was expensive in this application. And the key inside of regenerating codes is that minimizing communication is not the same as minimizing query complexity, right? So we could hope to do a bit better. So here's, this leads now to the definition of regenerating codes. This is due to Demakus et al in 2010. So regenerating codes are codes which are MDS codes. So remember that means they get this optimal trade off between N and K and the number of erasures they can tolerate. And they also have this following locality property. So for any I, Bob can solve this problem. If symbol CI gets erased and Bob wants to know it, then he can learn it by contacting possibly many nodes and downloading just a few bits from each. But the requirement is that the total amount that he downloads should be small. And for the purposes right now, what counts as small, small counts as significantly less than the naive thing. So the naive thing would be I should contact K nodes, download everything, reconstruct the whole code word. So that would be K symbols in my alphabet Sigma, so K log Sigma bits. So we'd like to do this with fewer than K log Sigma bits. That's what a regenerating code is, sort of, makes sense? Cool, okay, yeah, and let me say the rules are that each node is allowed to do some local computation which might depend on the identity of the failed node I. But they're not allowed to communicate with each other. So this isn't a formal definition, but hopefully the setup is pretty clear. Great, okay, so regenerating codes, like I said, were introduced by Demakus et al in 2010. And the definition that I had on the previous slide, which is definition, is actually only a very, very special case of these codes. More generally, you could say, okay, maybe Bob also does have some locality constraints. He doesn't want to contact more than D nodes. Maybe the nodes can store more than one field element. You can kind of play around with the size of the nodes. Maybe I don't want to allow any local computation. There's plenty of other generalizations you can put on this. And there's been long lines of work on all of these. And in many cases, it's now known sort of what the best trade-offs between all the parameters are, and there's many, many beautiful constructions of these codes. So unfortunately, I don't have enough time in this talk to tell you all the cool things that are happening in this space, or that have happened in this space. But I'm going to focus just on one line of work that I've been involved in a little bit, which is that Reed-Sellerman codes themselves are actually regenerating codes. So that polynomial interpolation problem that we saw before actually can be solved with low communication. And this is going to lead to our neat polynomial interpolation fact that I hope will have some more applications in crypto. Okay, so punchline, spoiler alert, Reed-Sellerman codes are actually regenerating codes. So that means that Bob can solve this polynomial interpolation problem using way fewer bits than he would if he were to just naively contact K different evaluation points and download them all. I find this very surprising. I was always taught that you need K evaluation points to determine the last one, and this is kind of saying that, well, okay, you do need to contact at least K different evaluation points or more than them. But you can actually get away with much less information downloaded than you could otherwise. Okay, so let me, right, so we've already seen that they're MDS codes. So let me talk about this other thing. So this is, let me give you some quantitative results here. This is like the only quantitative slide of the whole talk. So apologies, but just to have something concrete. So what are some quantitative results for this problem? Okay, so let's suppose that K is equal to 1 minus epsilon times N. Okay, so the degree of my polynomial is like 1 minus epsilon times N, or the length of the message is 1 minus epsilon times N, so I only have a small constant fraction blow up. So just as a benchmark, plain old polynomial interpolation would need, I need to, would mean I need to download K whole symbols, which would be K log F bits or 1 minus epsilon N log F bits. So here's a couple of results in this space. If F is equal to N, so the field size is equal to the number of evaluation points, which is a common setting in retellment codes, then you can get away with downloading N minus 1 log 1 over epsilon bits. So basically before we are contacting almost all the nodes and downloading everything, which is a super constant number of bits. And now we're just, we're still counting basically all the nodes, but now we're downloading only a constant number of bits from each node. And in particular, if epsilon is equal to a half, so I have a rate one half code, then I'm downloading just one bit from every symbol. And this is a result that's due to myself and Dengkakura Swami, which was later refined by Dalin Malenkovich. In a different parameter regime, let's say if the field size is really, really big, much bigger than N, there's a beautiful result of Tamoje and Garg from last year, which shows that you can, or Bob can download log F over epsilon bits from the total from all the other symbols. And recall that Bob only wants log F bits to begin with. He only wants one field element. So this is only a constant fraction off from sort of the obvious information theoretical around. This is, I think, pretty cool. And I should say both of these results, they're also, can also prove to be optimal for any code in these different parameter regimes. Where there is some fine print on any that I am omitting. Cool. So what do these schemes look like? Are they nasty and complicated? No, they're actually quite simple. So here's kind of a high level overview of what one of these schemes looks like. So let's focus on the case where k is equal to n over 2. So f is a degree k minus 1 polynomial. And the field size is n, which is 2 to the t. So field of characteristic 2, right? And so I promised you on the previous slide that Bob can solve this problem. So Bob can figure out what is f of alpha i by downloading just a single bit from all of the other symbols. So what does it look like? I'm not gonna tell you the good details of the scheme, but basically at a high level what it looks like is this. You can find some constants, gamma ij in my big field, which pick ahead of time so they have some really nice algebraic properties. You can look at the paper for the details. And then what the nodes are gonna do is they're each gonna return a single bit to Bob and that bit r ij has the form of the field trace of these constant gamma times whatever the node is holding, this f of alpha j. So if you haven't seen the field trace before, the only important thing here is that it spits out a single bit. So it takes in an element from the big field and it spits out a bit in F2. And it's an F2 linear function, so it's really easy to compute. Think of it as like a dot product over F2 or something. So each of these nodes does this F2 linear thing, sends back a bit. And then what does Bob do? Bob takes all these bits that he receives, adds them up with some other predetermined constants, and gets f of alpha i. So I won't say too much more about the algebra of this, although it's not very difficult. Ask me later if you want like a ten minute overview. But all the operations are involved are linear and pretty easy. Cool, okay, how am I doing on time? Perfect, all right, so summary so far, what have we seen? So regenerating codes are MDS codes which have the property that Bob can recover any single erased symbol with far fewer than K log F bits. Cool, and we just saw this fun fact. You can do this with Reed-Solomon codes, which is kind of a cool polynomial interpolation fact. So then, just like when I was talking about locally decodable codes, you might ask, okay, why do we care? So I've kind of motivated this by distributed storage. It also turns out to be, regenerating codes also turn out to be useful in distributed computing. But the point of this talk is that I'd like to sell you on these being useful in cryptography. So let me kind of wave my hands at a couple of applications in crypto that have popped up so far. Okay, so regenerating codes, a few crypto flavored applications. So the first application I want to start with is secret sharing. So communication efficient secret sharing. So let's recall Shamir's secret sharing scheme, which I probably don't need to define for this audience, but I'll do this anyway. So the setup in secret sharing is as follows. You've got some secret S, and you've got N parties, and you would like to come up with N shares, C1 through CN of S, that you can distribute among the N parties. And you'd like this to have the property, say that any K parties can recover the secret, so K of them working together can always get the secret. But no K minus one of them can learn anything about the secret. So these K minus one learn nothing. So it's a goal. And a classic solution to this is Shamir's scheme, which is based on polynomials. So the idea is we're going to choose some random polynomial of degree K minus one with the property that F of zero is equal to S, so the secret is equal to F of zero. So, and then the scheme works as follows. The secret is F of zero and we give different evaluation points to each of the parties. And now this has the properties that we said any K parties working together can recover S just by polynomial interpolation like we saw before. And no K minus one parties can learn nothing about S for exactly the same reason that you all were shaking your heads five minutes ago. Great. And so the application here is communication to fiction, secret sharing. So we have the secret sharing scheme, Shamir's scheme, which is basically a Reed-Saltman code. And we might ask the question, OK, what if all the parties are available and they want to reconstruct the secret? And let's suppose they're all honest. How much communication does it take to recover the secret? So now we can see that this is exactly the same problem that we just answered. The secret is now in erasure. Bob wants to know what it is. And so the answer that we just saw from the previous slide as well may be less than you might have thought. So Bob can contact each one of the parties. They just have to send him a few bits, or in some cases a single bit. And Bob will figure out what the secret is. So I guess Bob here is some centralized referee or something. So more generally, in this communication efficient secret sharing application, you can ask, OK, for a general Z, which is less than or equal to K, which is less than or equal to A, suppose I want the threshold property that any K can recover the secret, no Z learn anything about it. And I want to know how much communication do A trustworthy parties need to learn the secret? So maybe not all of them, but fewer than all of them, but more than K. So this has been studied, actually mostly in the electrical engineering community, as far as I'm aware. So here's some pointers to some further reading. So some of these works come up with things which are not Reed-Sellman codes, but based on Reed-Sellman codes. Some of them are based on Reed-Sellman codes, not all of them. That's a good question. Yeah, I think secret sharing is the cryptographic application that they have in mind. But the goal is secret sharing. Exactly. Yeah, the goal is secret sharing. And that's one of the reasons that I want to tell you guys about this work is because I feel like you should be able to take this and then say, OK, maybe my goal is now secure multi-party computation. What can you do? And to the best of my knowledge, not much work has been done there. So yeah, I think that's a really good question. What are the further cryptographic applications of this? And yeah, I think that'd be great. Yeah, I'll tell you one more application in a bit, but yeah, great. OK, so that was application one. Let me tell you application two, which is really a non-application, which is locally leakage-resilient secret sharing. So you can see this polynomial interpolation fact that I just told you as a kind of a statement that Schmeer's scheme is not secure to leaks. So what does that mean? So suppose that each party has a spy in their midst, and they're just leaking a single bit to some bad guy. So this is Bob, but now he's bad. And he can learn the secret even if he autent. And so this was recently studied by Ben and Mooda at all. And what they showed is, so in this example that I told you, it was actually kind of important that the field have characteristic two. They showed that if you take a prime field and try to do this, you can't. And so that paper is appearing at this crypto. I believe the talk is on Tuesday, so we should all go to that. Probably be pretty cool. OK, so that's a non-application. And now I want to talk about another positive application, which is low communication private information retrieval. So we saw private information retrieval before at the beginning of the talk. And now I'm going to talk about a low communication version. So I'm going to change the setup just a little bit. So in this new setup, so still we have Bob, who wants to query something from a database. But instead of taking n copies of the database, which don't talk to each other, Bob is going to encode the database and distribute these symbols of the encoding over n servers that don't talk to each other. So it's a sort of coded PIR setup. And then the same thing happens. Bob sends messages to each of the different servers, gets replies back, and wants to figure out what XI is. So this has the advantage over maybe replication-based PIRs that this takes less storage overhead and also can be redundant or can be resilient to node failures. This is like a real distributed storage system, which might also support some sort of privacy guarantees. And for some applications, it makes sense to care about download bandwidth but not upload bandwidth. And just very briefly, the reason why we might care about download but not upload is that if we're thinking about this in a distributed storage context, we might think that we're trying to store big files. And the thing that I want privacy about is which file am I asking for. And the way that these are traditionally implemented is that these files are kind of striped across the servers. And the effect of this is that if we're trying to retrieve big files instead of bits, the download cost is always going to swamp the upload cost no matter what. So we might as well just care about the download cost. So there are some settings where it makes sense just to care about download cost. OK, so now back to what the application actually is. So coded PIR, the database is encoded, distributed over n, non-communicating servers. This is nice because it has less storage than the just n server PIR, and systems also are robust to failure. So it's a real distributed storage system. And if we don't care about upload bandwidth, then we can actually use these ideas from regenerating codes to achieve lower bandwidth, to achieve smaller download bandwidth than we would otherwise. I should note that coded PIR is also interesting, even if you do care about download bandwidth. And that's also been studied. But for this application, I'm going to focus on this setting where we don't care about upload bandwidth. I think I got my downloads and my uploads mixed up a few times in the previous sentence, but hopefully I'll know what I mean. OK, so how are we going to do this? So I'm going to show you how to get low communication coded PIR using the ideas that we've just seen. So let's recall what one of these schemes look like. So we've got each one of these nodes has to tell Bob a single bit. And what is that bit? That bit looks like the field trace of some constant gamma ij times some f of alpha ij. And the first thing we might hope is, OK, well, Bob just sends gamma ij to the nodes, and they reply back. This obviously has no privacy, because gamma ij depends on i, which is the thing that Bob wants. But fortunately, it turns out that if you open up these schemes and you look at them, it works fine if you put in just any fixed sigma here, as long as it's the same over all the different nodes. So this kind of immediately gives you a coded PIR setup with Reed-Salomon codes. So we've got our database. We're going to interpolate some degree k minus 1 polynomial through it. So I'm going to pick some fs that are degrees with my database. I'm going to encode my database in a distributed way, like this. So I have n servers that don't communicate with each other, each of them holding one polynomial evaluation point. Now Bob comes along and wants to know what is f of alpha i. So he wants to make some query into this database. And what he's going to do is he's just going to send, he's going to first choose sigma at random, the sort of winding factor, and send to each one of these nodes sigma times this appropriate constant gamma. And now this is uniformly random. It takes a lot of communication, but I just said, OK, in this setting, we don't care so much about upload bandwidth. And then what Bob receives is just a single bit. This r, this r, or scaled up appropriately. So this is an example of coded private information for people with low download bandwidth using these ideas. And again, you can generalize this, not just for this sort of cute read settlement code example. And you can also ask what happens if a small number of servers are allowed to collude. So I don't want to make the strong assumption that no two guys are allowed to talk to each other, but maybe there's some coalition of a, what letter have I not used yet, z, z servers that can collude. How can do it? And there's been, again, some work on this, again, mostly coming out of the electrical engineering community as far as I'm aware. And here's a few pointers to some works which study this. And again, they come up with codes, which are not read settlement codes, which can kind of get these optimal trade-offs, or near-optimal trade-offs, between sort of the number of servers that are colluding and the amount of information Bob needs to download and the storage overhead. Great. OK, so how am I on time? OK, cool. Yeah, so let me just take a second to mention a couple of other flavors of locality that have come up in cryptography. So I just touched on two, but there's tons more. And literally, I just thought about things until I ran out of room on the slide. So there's way, way more. And I'm sure that many of you know more than even I know of applications of locality and coding theory. But sort of a few prominent ones include batch codes, which sort of came out of the crypto community, which are useful to speed up private information retrieval scheme schemes. And actually, batch codes now, those ideas have now become useful in distributed storage, kind of completing the circle. There's something called PIR codes, which are, I guess, mathematically similar to batch codes, which can be used to turn any T-server PIR scheme into a coded PIR scheme with the same amount of communication, but with much less storage overhead. There's oblivious locally decodable codes, where instead of asking for the queries to be actually uniformly random, you ask that they're difficult to distinguish from uniformly random under some cryptographic assumptions. There's been a whole bunch of work on locally decodable codes and actually all sorts of codes against computationally bounded channels. So you assume that this adversary, instead of just you getting to do whatever the heck it wants as long as it introduces fewer than 10% errors, is computationally bounded. And you can try to see what you do there. And there's a small body of work about things like secure locally repairable codes, where you kind of imagine you have a data center that is trying to do these local repairing operations, and you have some eavesdropper who's listening to the repairs going on and is trying to learn something that they oughtn't. And how do you protect against stuff like that? So those are just a few examples. I'm sure there are tons, tons more. But that's what came to mind when I was writing the slide. OK, so now in my minute or two left, let me conclude. So I'm going to flash up this slide again. What was the point of this talk? The point of this talk is that I wanted to advertise to you guys some new notions of locality in error correcting codes. So to summarize, locality is a notion in error correcting codes that has a rich history in cryptography. We saw this example of locally decodable codes and private information retrieval. But several new notions of locality have emerged over the past five to 10 years. Many of these come out of distributed storage. And the unifying themes of these new notions is that they have a much gentler adversary. So I talked about regenerating codes. Another prominent family is locally recoverable codes, which are also really cool, and you should check out. So these have found some applications in cryptography. So I sort of waved my hand. It's about low communication secret sharing, low communication private information retrieval. But I believe there should be more. So like Tal was asking earlier, what is the bigger cryptographic goal of this? I don't know, and I hope you can tell me. So I think there should be tons more applications of these tools. And yeah, I hope that you can help me think of them. So with that, I will end, and happy to take any questions.