 I'll be talking about error correction in a computationally bounded world. So I'd like to start by talking about just normal error correcting codes, very standard error correcting codes that hopefully a lot of us are already familiar with. So an error correcting code is a pair of efficient algorithms, an encoding algorithm and a decoding algorithm so that for every message in some C, if C differs from the encoding of the message in not too many positions, then decrypting C gives me M. Another way of thinking about it is if I have some message M and I encode it and then change some of the positions of the encoding, fewer than row fractions and then decode that I get my original message. So I encode, add a few errors and then decode and get my original message. Cool, that's standard error correcting codes. Then in these standard error correcting codes, there's kind of two parameters that are intention. On the one hand, when we construct error correcting codes, we want our error rate to be as high as possible. The higher the error rate, the more errors we can correct. So we want our error rate to be high. On the other hand, we also want our data rate to be high. So our data rate is the length of our message M divided by the length of the encoding of M. So the data rate kind of represents how much blow up there is. So when we take M and encode it, we want it to not blow up too much. So there's kind of tension going on where we want to make the error rate as high as possible and also the data rate as high as possible. And in general, yeah, we can't get them both arbitrarily high and there's some sort of trade off. Cool. So now these Hamming codes, they have a very good property of them is that they're very versatile. And by that I mean there's very little, very minimal assumptions on the noise. That is, if I encode a message and I add worst case noise, like I corrupt any up to row fraction of the positions, when I decode it, I'll still get them. So they're very versatile in that I'm not making any assumptions of the noise on the noise. I'm saying no matter what errors I have, as long as there's few of them, I'll be able to get my original message back. Now, this can also be a problem sometime in that these worst case requirements may be overly stringent. So for example, suppose I know something about the errors that are coming in. Suppose for example that I know that the errors that my code word will be experiencing come in uniformly at random. So each entry is going to get corrupted with some probability and each entry is uniformly random. Then in this case I don't actually need this very strong guarantee about worst case noise. And it might be that because I don't need this worst case guarantee on noise, I may be able to get better trade-offs between error rate and data rate. And this is indeed the case. So for example, like we said in random errors, if I look at binary alphabets, at large alphabets, we can see that there's a better trade-off between error rate and data rate for random errors than worst case handy errors. Similarly for binary alphabets, it's a little more messy. But also if I assume random errors, I can get better trade-offs. So we see if I assume random errors, I can get better trade-offs than if I assume worst case errors. And what about kind of in between? And in general, is there some way I can or we can define a notion of weakly worst case errors that still achieve better trade-offs than hamming? And from a cryptographic point of view, a very natural thing to do is instead of looking at worst case errors, to look at the worst case errors a computationally bounded adversary can do. And this has actually been done before. We call them pseudo-hannying codes. And roughly speaking, a pseudo-hannying code is some code where the messages and the errors are chosen by some polynomial time algorithm. So earlier in the standard hamming codes, our errors were worst case. And here they're just worst case, but chosen by something that's computationally bounded. And in general, when you use error-correcting codes, whatever processes it is that created your message is likely going to be computationally bounded. And similarly, whatever process is adding noise or adding errors is also likely computationally bounded. So in those situations, a pseudo-hannying code is actually good enough. We don't need this actual worst case guarantee. And the hope is that these pseudo-hannying codes can give us better trade-offs between data rate and error rate. And so these pseudo-hannying codes, yeah, so these pseudo-hannying codes, our results are about them. And what we do is we can, here's our result. For any r and any rho small enough, there's a public coin stateless pseudo-hannying code in the random oracle model. Cool. So before going into the details of this result and interpreting it, I want to give some background and some context that'll make understanding this and its relationship to what's known a little easier. So these pseudo-hannying codes, they actually were known before in the secret key stateful setting, meaning that the encoder and decoder need to share some secret key. And also there was some sort of state, meaning the encoder needed to maintain some sort of state. And each time they encoded, their state updated. Cool. And this was known to be doable if you have one-way functions. And when you have one-way functions, the rate, the error rate to data rate trade-off was the same as for random errors. If you just assume that your adversary is computationally bounded, you assume one-way function, secret key, and that you can maintain state, you can actually get trade-offs as good as for random errors. And also it was proven that when rho is bigger than a half, when you have more than half errors, this model is necessary, meaning being secret key and being stateful are both necessary. If you don't have a secret key or aren't able to maintain state, you can't actually achieve these results for a row that's bigger than a half. And so what we do is we create such pseudo-hamming codes for the public coin case, and our codes are stateless. And we achieve the same rates when rho is less than a half. So for greater than a half, there's an impossibility result showing it's not possible. And we match it for all the row where there is no such impossibility result. So when our errors are less than half the entries. So so far, we've described what our results are at a high level. And I want to be more precise about what a pseudo-hamming code exactly is. So a pseudo-hamming code is a triple of algorithms set up in coding and decoding, so that for all polynomial time adversaries, they win this following game with only negligible probability. So what happens first is the challenger makes some setup, and in our case, this setup is actually just a uniformly random string. So we can also think about it as just some shared randomness. And then the adversarial will pick some message. The challenger will encode it. In our case, the encoding is deterministic. So actually, you can think of the adversary encoding it by themselves. So the adversary picks a message, it gets encoded. And then they send some C prime, which is C with some errors. And they win if C prime doesn't have too many errors, only up to row errors compared to C. And decrypting C prime will give me something different than M. So to summarize it, the adversary picks an M, and then they pick a C prime. And C prime has to be similar to C, but get encoded to something else. Another way of thinking about it is we pick some M. The adversary picks some M and codes it, adds some errors. And after adding this error, it should get decoded to something that's not M. And over here, we also have these two parameters in tension. One is the data rate, which is the length of the message divided the length of the code word. And the error rate, which is the fraction row of errors that we wish to be able to correct. So like I mentioned, codes like this, these pseudo-hamming codes, were known to exist if we assume that there is a shared secret key and that we can maintain state. So we can ask, why is it important to get this public key and stateless, which is what we do in our paper? And some arguments for this is that stateful codes can be bad for certain applications. For example, error correcting codes are often used to store data. So when you store data, you know over time it might experience corruptions and so on. And so if you're storing a lot of data and using the same code everywhere, then being stateful won't actually work. The problem being is that the requirement on the state, the guarantee on the state or these codes, these stateful codes that are known to be constructable, what happens is if you send, if you encode some message and then encode a new message after updating your state, seeing the encoding of the second message can actually give that adversary enough information to corrupt their first message. So these stateful codes are only good if you encode something and then don't encode anything else until you're done with the first encoding, until the first encoding has been decoded and you don't care about it anymore. If you encode multiple things kind of in parallel, then one can look at something you encoded late and use that to corrupt something that was encoded earlier. So that's an example of why you might care about having stateless codes. And first the secret key setting, one example is that if there's many parties that need to talk, secret key codes may not be as realistic. You would either need every pair of people to share a secret key or have a secret key shared between all of them, which of course has security issues. So having stateless public key codes can be better in some applications. All right, cool. So with that, let's go back to our main result and try to get a better understanding of what it is exactly. So here's the main result. And the way I think about it is like this, is here we have, we kind of drew the red line is the best possible handy codes for large alphabets. And the green line is the best possible trade-off between error rate and data rate for random errors. And this blue line represents what we can do. So what we can do is we match random errors up to a half and then after a half the impossibility result kicks in and we can't really do anything. So up to a half we match random errors. And we also have a similar result in the case of binary alphabets, which you can see here. Here we don't quite match random errors, but we still do better than the hamming bounds. Great. So the next few minutes, I wanna kind of outline our approach and outline how our proof works. So the idea is our starting point is list decodable codes. So a list decodable code is an error correcting code. Where what happens is you encode a code word at errors. And then when you decode, instead of just getting the message you started with, you usually get a long or not long, you get some polynomially sized list of possible messages, such that at least one message in this list is the original message, right? So you take a message and code it at errors and then you get a list. And the original message is somewhere in that list. And good list decodable codes are known to be constructable. So that's our starting point. We use these codes that are known to be constructable. Yeah, and the rate of these codes actually match the rates that you can reach for random errors. And that's basically what allows us to match the random errors is because we start off with these list decodable codes, change them in certain ways without affecting the rate. All right, cool. So our approach is we start with these list decodable codes and then we modify these to also have pseudo distance 0.99. So pseudo distance is a concept we define for a proof. It might be interesting outside of our proof too. And a pseudo distance 0.99 means that it's computationally hard to find two code words, which differ on, which are the same on at least 1% of their entries. All right, so it's hard to find two code words that are close. And then what we do is we prove that list decodable codes that have high pseudo distance are also pseudo hamming codes. So with this in mind, there's kind of two steps we need to do. First, we need to show why a list decodable code that has high pseudo distance is pseudo hamming. And then we also need to show how we can modify list decodable codes to have high pseudo distance. So let's start with the first thing. Let's show how list decodable code with high pseudo distance are pseudo hamming. All right, so let's suppose we have a code that is list decodable and has some high pseudo distance. Here, for example, 0.99. Who will describe a decoding algorithm? And then we'll argue why this decoding algorithm and this whole description of the code satisfies the pseudo hamming property. So to decode a noisy received word, what we do is we list decode it to get a bunch of messages, a bunch of candidate messages. And then we pick the message such that the encoding is closest to why, to what we started with. All right, cool. So why is this pseudo hamming? Well, let's prove that it's pseudo hamming. So suppose it's not, all right? So suppose it's not, meaning there's some other M prime in the list of messages, that's that it's encoding is close to why, right? When we decoded, we got some M, we got some M prime that was close to why, that was closer to why than M. So then coding of M prime has to be within row of Y. Then because we also know that encoding of M is within row of Y, because Y, the way we got Y was we encoded M and added some errors, right? So both then coding of M and then coding of M prime have to be close to Y. So then coding of M and then coding of M prime have to be close to each other. And now this violates pseudo distance because this gave us a way to find an M and an M prime that are close, right? So to summarize this kind of, suppose it's not pseudo hamming, so we take a message M and code it, add errors and decode, and we get some certain M prime. So this M prime, its encoding has to be similar to the encoding of M. So we found two messages that map to something similar, which exactly contradicts the pseudo distance guarantee. All right, cool. So now that we understand this, the thing that's left is to construct these codes that are list decodable and have high pseudo distance, right? So here we show that if we have something list decodable with high pseudo distance, this gives a pseudo hamming code. So now we just wanna construct these high pseudo distance list decodable codes. So let's see that in the next slide. So how do we create good pseudo distance? So the idea is we start with some high average distance code, meaning that with high probability, two random code words are far away from each other. And in fact, these optimally list decodable codes have this property, so these are known how to be constructed. How to be constructed. So this is where we start at. And this condition should actually feel pretty natural, right? The idea being an error correcting code, what it does is it kind of spreads out these code words. And two random strings are going to be different in almost all their entries because our alphabet size is high. So they're gonna be much fewer than 1% of the entries where they agree. So when we pick two random string, they should be far away. So an error correcting codes that's like uniform enough that's kind of random enough in a sense should satisfy this high average distance property. So the next step after we start with these high average distance code is to scramble the code words with a random permutation pi. So we take an M, we scramble it, we permute all these M's and then we encode it. The idea being that if in the first step we knew how to find two messages that encoded to the same thing, now it's going to be a little harder. Because if I pick two messages, they're going to get mapped to random code words. So with high probability, they'll be far away. So that's the idea. Yep, so the good is that if we take any two messages with high probability, they'll get mapped to something far away. But there's still a problem. This doesn't actually give us high pseudo distance. So what's the problem here with that tack? So the problem is the code word set is still the same as it was before. So what we can do is we can find two code words that were similar. Then decode them and invert pi. And then we found two messages that will encode to similar code words, just code words that are close to each other. So there's still an attack. The idea of that attack is that we start with the code words and then invert it to find the messages that get mapped to something similar. So the idea to fix this is to somehow prune the code word set. So if we make most of, we kind of take most of the code words and prune them, make them not valid anymore. But now if we try this attack of finding two code words and then inverting it with high probability, the two, at least one of the two code words that we found is not gonna be valid. So here's how we do this. What we do is we take our random hash and instead of encoding the pi of m directly, what we do is we take m and append the hash of m and permute that and encode that. So now the idea is that if I take a random code word and I decode it, with high probability, I won't get something of the form m comma h of m. With high probability, I'll get something random and so the end of the string won't be the hash of the beginning of the string. So with high probability, a random something that originally was a code word, a random string that originally was a code word, here won't count. And the idea is that we can show that this, that for something like this, it's actually generally hard to find two code words that are close to each other. So this has high pseudo-distance. And a few notes about this is, first of all, this pi, right? I said it's a random permutation, but actually this is something that we can construct unconditionally. And notice that our encoding has the same asymptotic rate as the original encoding. The reason being that this, adding this h only made our message a little longer, right? Started as k minus root k, we added root k to it. So asymptotically, this actually won't change our rate. Cool, and you know, if you want a higher security, what you can do is you can raise this root k. If you want a bit more efficiency, you can lower it. So there's some sort of freedom in choosing this function that here we set to root k. Cool, so that's the approach. So to summarize what we do is we start with these list decodable codes. We make sure they have high pseudo-distance as we did in the slide by doing this hash permutation coding thing. And then we show that a list decodable code with high pseudo-distance is actually pseudo-hamming. Great, so now the question that showed up, that shows up is, is our random oracle usage reasonable? So recall earlier we had this hash function. And this hash function, we kind of assume, we showed that if it's a random oracle, it works. There's the question of whether that's reasonable. And so we don't know how to instantiate a concrete hash function that works for us based on standard cryptographic assumptions. But we can show that a hash that is two input correlation intractable for a specific, a certain relation would be good enough. Now, it's known how to construct two input correlation intractable hashes for specific, for some relations, but not for our relation. And it's also worth noting, there's no implausibility result known for this type of correlation reactability. And also the current techniques are showing correlation intractable. We don't know how to use them on our specific relation R. Our specific relation R seems too complicated for currently known constructions to work. So as a reminder, if we assume that we can maintain state and that we have secret keys, then it's possible to construct such pseudo-hami codes just from the assumption of one-way functions. So my big question here is whether it's possible to also construct stateless, public-coined pseudo-hami codes using one-way functions or some similarly more standard cryptographic standard assumptions.