 All right, so this is new techniques for obfuscating conjunctions. This is joint work with James Bartusek, Tancred LeFont, and my advisor, Mark Zandri. So our motivating scenario is a simple password check application, so you're trying to write some program that's going to accept if your input is equal to some preset password. And from a functionality standpoint, the pseudocode on this slide is totally fine. But it's pretty obvious that you would never want to do a password check this way. And the problem with this is that if an adversary sees the implementation of your program, they'll learn your password in the clear, and you have no security whatsoever. So the slightly smarter thing to do than this is to just store the hash of your password, right? So we're going to take the SHA-256 hash of your password. We'll store that instead. And now this new program is going to take its input x, compute the hash of x, and compare it to the stored hash value. So what we just saw was actually a really simple example of program obfuscation for the limited class of point functions. We had some insecure implementation of a point function in the top right-hand side. And by just taking the hash, we really created this new program that's a heuristic obfuscation of this point function, at least. Meaning that the correctness of the program is preserved. The input-out behavior is essentially the same. But this new program has a more secure implementation, at least in some meaningful sense. And there's going to be many ways to define the security of obfuscation. But for today, we're going to go with this distributional version of security. So we're going to work in the setting where the program itself is going to be sampled from some fixed a priori distribution. And we want to say that given the obfuscation of this program, you shouldn't be able to learn anything about the underlying program. So first, I want to say that maybe it seems weird to sample the program from a distribution. But if you're picking your password with any entropy whatsoever, you are implicitly sampling your point function from some distribution. And I also want to note that this definition, at least the way that I put it on this slide, doesn't actually make any sense unless you restrict the distribution somewhat. And you have to say that we're only going to talk about distributions where it's hard to find accepting inputs to these programs. So basically, you pick the password that's hard to guess. So we have this really simple heuristic obfuscation for point functions. And you can ask, well, what other classes of functions can be obfuscated with really simple techniques? Hopefully, we can get something a bit more expressive than just functions that accept on a single input. So what I'll be talking about today are these programs that compute a conjunction over their input bits. So this is just a really natural extension of the functionality that point functions give us. So a point function is going to specify some string of 1s and 0s. And it'll accept if and only if the input matches the string on every single position. So what I'm going to have here are these conjunctions where they're parameterized by some pattern. And it's going to be a string of these 1s, 0s, and wildcards. And how it differs from point functions is that this wildcard character, denoted by the star, is a position where I don't care what the input bit is. OK, so these programs are going to have some pattern hardcoded. And we're going to accept an output 1 if and only if the input bit string matches the pattern at all the non-wildcard positions. So hopefully, this functionality is clear. The rest of this talk will be devoted to obfuscating this class of programs. So a number of works have considered this exact problem of trying to obfuscate conjunctions. But for today, we're going to be concerned with this one work of Bishop et al from crypto 2018 that showed how to build an obfuscator for conjunctions using ordinary cryptographic groups with no multilinearity whatsoever. And they proved distributional security of their construction in the generic group model. So in this work, this is basically a direct follow-up to their work. We're revisiting their techniques and seeing how far we can push these techniques in a number of different regimes. So we have one construction that's essentially a dual to their construction where we achieve a better efficiency, and we've achieved slightly better analysis in the generic group model. And then we have two other constructions that are not group-based at all, but are heavily inspired by their techniques. And so the first of those is this construction that's based on LPN. And the last one is a construction that's purely information theoretic. Although for the last one, we do have to make some severe restrictions. So if you think this contradicts something that you know, then there's probably some restriction that prevents that. So for today, we're just going to talk about this LPN-based construction to give you guys a flavor of our techniques. So while I did say LPN, just for the talk to make things a little simpler, I'm going to actually work over FP. So P is going to be some large prime. It's going to be exponential in the pattern length, which is going to be N. So the way the construction works is that I'm going to have some length and pattern, consisting of 0s, 1s, and wild cards. And I'm going to encode this as this two dimensional vector, E. So E is just going to have twice the length of the pattern. And the entries of E are going to be over FP. And the encoding scheme is pretty simple. We're going to imagine breaking up E, this two dimensional vector, into n different consecutive blocks, each consisting of two entries. And we're going to have the i-th position of the pattern correspond to the i-th block. And this encoding scheme is basically from BKMPRS. And the way it works is that if the i-th position of the pattern is a 0, then in the i-th block, we're just going to put a uniformly random value in the top position. And if the i-th position of the pattern is 1, we're going to put a uniformly random value in the bottom position. And otherwise, if the pattern has a wild card, we're just going to put two 0s. So if there's maybe one thing to hold in your head for the rest of the talk, it's this correspondence. I'm going to come back to it a lot. But hopefully, it's simple enough. So here I had this pattern 0, 1 star. It gets encoded as a six-dimensional vector E, or E1 and E2 are uniformly random elements from FP. And the rest of the obfuscation is to just draw a random matrix B, dimensions n plus 1 by 2n, and to output B along with B times E. So super simple, I encode the pattern in E, and I multiply it by a random matrix. And that's the whole program. So given this program, how do we evaluate? So let me just work through an example. Let's say I'm giving you the obfuscation of the pattern on the previous slide, so 0, 1 star. So you get to see B and B times E. You don't know what pattern is encoded. You don't get to see this E vector, but I'm just showing it to you here. And let's work through an example of evaluation for x equals 0, 1, 1. So this is an input that should be accepted by this program, because 0, 1, 1 matches the pattern 0, 1 star. So what we're going to do is we're going to use x. We're going to read off the bits of x to select n columns out of B. So B is this 2n-dimensional, this width 2n matrix. We're going to break this up into n different blocks of two columns. And in the i-th block, reading from left to right, we're going to look at xi. And if xi is 0, I'm going to select the left column. If xi is 1, I'm going to select the right column. So hopefully this correspondence is clear. If I have x equals 0, 1, 1, I'm selecting the columns B1, B4, and B6. So that's the first step. The next step is to pick a uniformly random vector k that's orthogonal to all my selected columns. So such a vector k is going to exist just because these columns are n plus 1 dimensional and I'm picking n of them. So there will be some k orthogonal to all of them. And then I just multiply k by Be. That's going to give me some scalar. And if that scalar is 0, I accept. And if it's non-zero, I reject. So why does this scheme work? Why should kbe go to 0 if x matches the pattern? Well, kbe is a scalar. And by associativity, I can also think of it as the inner product of this two-end dimensional vector kb and this two-end dimensional vector e. So e is going to be a vector that has zeros and lots of positions, but in some positions it's going to have these uniformly random ei values. And because we're working over some large field, how are we going to have any noticeable chance of e multiplying another vector and going to 0? Well, every single one of these uniformly random eis has to get zeroed out. So they get zeroed out if kb is 0 in those corresponding positions. And when is that going to happen? Well, it turns out that's going to happen exactly when the input matches the pattern. Because x is selecting columns of b, and then we're picking k to be orthogonal to exactly those columns. And so kb is going to be 0 in those positions that are basically encoding my input x. And the way we set things up, this is going to go to 0 if and only if x actually matches the pattern. So hopefully, this evaluation procedure made sense. But even if it didn't, we're not going to come back to it. I just needed to prove to you guys that this is actually a working program. You can actually use this. So the only thing to remember is just that we're encoding patterns like this, that the zeros correspond to a uniformly random value in the top position, one corresponds to a uniformly random value in the bottom position, and a wildcard is 2 zeros. So for security, remember we're talking about distributional versions of security. So we have to fix some distribution before we can talk about security. And the distributions we're going to consider are the ones from the prior work of BKMPRS. So this is how the distribution works. To sample a pattern, it's going to be length n. You pick alpha times n uniformly random positions among the n positions of the pattern. And in those positions, you're going to just sample uniformly random 0, 1 bits. And then everywhere else, in the remaining 1 minus alpha fraction of the positions of the pattern, you just put wildcards. So this is the distribution of patterns we're talking about. And the theorem statement is that if you sample the pattern from this distribution encoded in this vector e and you give out b, b times e, then under the standard learning parity with noise assumption with constant noise rate alpha, this whole obfuscation b, b times e looks uniformly random and therefore hides all the information about the underlying pattern. So let me just clarify what I mean by LPN. I'm not going to start from just ordinary LPN. I'll be starting from this variant of LPN to save a bit of time. So let me just specify how I'm sampling this error vector, because it's a little bit different from the standard Bernoulli noise. So e prime is this vector that's going to be n dimensional. And it's going to be sampled to have exactly alpha times n random entries. So generally in LPN, we have a variable number of non-zero entries. But here, this is this exact LPN problem where I'm fixing the forehand that this e prime vector is going to have alpha times n random entries, and the remaining entries are all going to be 0. So e prime is sampled to have alpha times n random entries in uniformly random positions. And so all these entries are just independent random values from fp, and everything else in e prime is 0. So we have this distribution on e prime, and this exact LPN assumption is saying that this, if I sample a uniformly random matrix h of dimensions n minus n to the epsilon by n, and I give this to you, then you can't distinguish between h times a sample e prime from this distribution or a uniformly random vector u. And it turns out that this assumption is polynomially equivalent to the standard LPN assumption. So when I say that we get things from standard LPN, we start actually from this assumption, but it's been shown in prior work to be equivalent to LPN. So at this point, it might look like we're pretty close to done. This h and h times e prime looks a lot like our obfuscation, so can we just apply LPN and say that we have security? Well, not quite, because there is a pretty important mismatch between what this exact LPN is saying and what we want to say. So exact LPN is saying that this h and h times e prime, for e prime sample from this distribution in the top left hand side, is indistinguishable from random. And what we want to argue is that our obfuscation, this b and b times e, for e that's encoding one of our patterns, that this whole thing looks random. And what's the difference between the top and the bottom? Well, the important point is that for error vectors e prime on the top, they're n-dimensional. They have alpha times n non-zero entries, and they're in uniformly random positions. They can be anywhere they want. But if I'm encoding a pattern with my obfuscation scheme, I can't put these non-zero values just anywhere, because the way that I'm encoding my patterns as these two n-dimensional vectors is by doing this blockwise encoding, where every two positions I'm encoding a position of the pattern. And the way that I set up this encoding, every single block has to have at least one zero entry. That's just the way I've defined my pattern encodings into these two n-dimensional vectors. So I have a mismatch between these two distributions. Basically, for the distributions of patterns I'm talking about, it's equivalent to sampling a two n-dimensional vector with exactly alpha times n non-zero entries, but conditioned on every single one of these size two blocks, having at least one entry that's set to zero. So hopefully the challenge is clear here. We have to go from this unstructured error on the top to this structured error on the bottom. So I should mention, for going on, that this problem of structured error in LPN has actually been studied before, but in the context of attacks, and actually showing that this is a vulnerable setting of LPN. In particular, if I had the same setup as I did on the bottom half of the slide, and instead B had two n minus n to the delta rows instead of just n plus 1, you could actually learn E by a real linearization attack. So in order to prove the security of our obfuscation, we actually had to just prove a new thing about structured error LPN. In particular, we had to prove that structured error LPN is actually secure in parameter regimes that weren't previously known to be at least trivially implied by LPN. So here's how we do it. I'm going to start from this exact LPN assumption I specified. So it's saying that this matrix h and h times e prime is indistinguishable from random. And I'm just going to write h in terms of its columns, so h1 up to hn. And the very first step is to literally just write down a matrix of twice the width as h. OK, so here's how I'm going to do it. I'm going to sample n uniformly random columns u1 up to u1. I'm going to sample these myself. And what I'm going to do is I'm going to interleave the columns that I just sampled, these new UIs, with the columns of h. And I'm going to make this wider matrix k. So here's the way that I interleave them. k is going to have twice the width of h. And I'm going to think of k as being broken up into, again, n blocks, where each block is two columns. And going from left to right in the i-th block, I'm either going to put hi Ui or I'm going to put ui hi. So I'll flip a coin to decide between the ordering of those columns. So all I'm doing here is I'm taking my matrix h, sampling n more columns, and then just randomly interleaving them with the columns of h. OK, so this seems like quite a silly thing to do. I'm given some LPN problem. And I'm just making my matrix wider. Why am I doing this? Well, the point is that by doing this, I can actually transform this unstructured error e prime into the structured error e that I care about. So by changing the matrix and making it wider like this, I'm actually implicitly injecting the structure I want into the error vector. OK, so maybe this is a little bit tricky to see. So here's a proof I picture. It's actually a totally elementary observation. If this h times e prime, this vector on the left-hand side, is totally equivalent to this vector on the right-hand side where I'm now writing the matrix as this wider matrix. Because if the vectors on both sides are equal, then if I make this matrix h wider and turn it into this matrix k, that's the exact same thing as throwing in these zeros to zero out these new columns that I wrote down. So now e is going to be this vector that's going to have these zeros in positions corresponding to where these gray UI columns are. So basically for free, we just started with exact LPN. And now we have structured error. So we have this matrix k that's to mention n minus n to the epsilon by 2n. When I multiply it by a structured error, it's indistinguishable from uniform. And this is perfectly equivalent to the exact LPN problem. So we haven't done anything super interesting yet. But we're not done, because this isn't what we wanted to show. What we wanted to show was that our obfuscation is indistinguishable from random. And in our obfuscation, the matrix that's being multiplied by the structured error vector is n plus 1 by 2n dimensional. So that matrix has n plus 1 rows. And this matrix k has n minus n to the epsilon rows. So we have a challenge here, where we need to somehow add n to the epsilon plus 1 additional rows. But it's a little bit tricky to do that, because for every single row that you write down, you also have to write down the inner product of that row with the error vector e. And if you don't know e, how are you going to write down that inner product? So somehow we need to tack on n to the epsilon plus 1 additional rows. But it's unclear how to do that. So we have one observation that we can use, which is that we actually have a little bit of partial information about e. We actually know for every single row of k, we know it's inner product with e, because it's written there for us. It's the entries of ke. Every single entry of ke is just the inner product of the corresponding row of k with e. So we have this partial information about e, and we can use this to our advantage by just, well, if I want to write down a row where I know the dot product of that row with e, I can take a linear combination of the rows of k, and if I take the same linear combination of the entries of k times e, then I'll have the right dot product. Right, so I can just extend this matrix by tacking on random linear combinations of the previous rows of k. Okay, so this gets us pretty close. Right now we have a matrix of the right dimensions, and we have our error vector looking the way it should. But at this point, we're still not done, because this matrix doesn't actually look random. Right, this is a matrix k, where the first n minus n to the epsilon rows are uniform, and the last n to the epsilon plus one rows are just random linear combinations of the previous rows. And if I sample the random matrix, it would look nothing like this. And just to convince you that it's not random, you can just look at the rank. Right, the rank of this matrix is at most n minus n to the epsilon. If it were a random matrix, its rank would be n plus one. Okay, so we somehow have to break up these linear correlations that prevent this matrix from looking random. Okay, and we have to do so in a way that preserves the fact that we know the dot product of every single row with e. Okay, so hopefully the challenge is clear that we have to make this matrix look random, but we like delicately constructed this matrix so we knew all the dot products with e. So it seems like it'll be quite difficult to change this matrix without knowing what e itself is. Okay, so we actually have one last piece of information that we haven't really taken advantage of yet. And it's the following, it's that, well, okay, I kind of lied when I said we don't know e. We actually know half of the entries of e. E is this two-in-dimensional vector where n of the entries we just put in ourselves. We inserted n entries into e ourselves as zeros. Those are the entries that correspond to these gray columns in k. Okay, so we can use the fact that we actually know what half the entries are. There's gonna be zeros in n of the positions. And so what we're gonna do is we're gonna randomize out the bottom part of this matrix exactly in the columns that correspond to the zeros in this e vector. Okay, so I'll sample this matrix v. It's width is two n and it's gonna fit right on top of this rk at the bottom. And I'm picking this v so that it has uniformly random columns in the positions corresponding to where e is zero and v is zero everywhere else. And this way I can add v to the bottom here. I can change this rk into rk plus v without affecting the fact that I correctly compute the dot product of every single row with e. All right, because v times e goes to zero, right? And so the question is like are we done now? Like is this k concatenate rk plus v matrix actually a random looking matrix? And this last step is a purely statistical step. So maybe no need to read all the text on this slide, but to at least convince you that this plausible, that this k concatenate rk plus v matrix is statistically close to a uniformly random matrix. If you just add up the entropy in k, r and v, you'll see that for the right setting of parameters there's actually quite a bit more entropy there than you have in a uniformly random matrix v. So hopefully it's at least plausible that you could actually work through the leftover hash lemma style arguments and show that this thing is negligibly close to a uniformly random matrix. And indeed in the paper that's what we do, we carefully bound the collision probabilities of this matrix and we show that it's very close to a uniformly random matrix. And with that this completes the reduction. This gets us to the point where we can say that our obfuscation which is this uniformly random matrix b multiplied by this structured error vector e is computationally indistinguishable from random. Okay, so to conclude, the way that we're obfuscating here is we're taking a pattern, we're writing it as this two-in-dimensional vector and literally just multiplying by a random matrix and that's the whole construction and security follows under standard constant rate LPN. And we have two additional constructions in the paper so if you wanna read more about simple ways to obfuscate conjunctions, we have a generic group scheme that's very similar to what's done at BKMPRS. And we have another construction that I think is really cool. It's an information theoretic construction that consists of a sequence of matrices and evaluation is just done by taking a subset sum of the matrices corresponding to your inputs and computing the determinant of that matrix. So this sounds cool, the link to our e-prints right here and these slides are on my website. Thanks. We have plenty time for questions. So can you show that the reduction from structured noise to standard noise? So does it show you that the previous attacks by all our title, is there a gap between the parameters that you can prove hardness and the parameters for which you can actually break it? Yeah, so there's a huge range where we don't know the answer. Basically, if I go back to, well, okay, if you look at this matrix here, we see that this K, if it has n minus n to the epsilon rows, it's just like trivially equivalent to what Alpena is saying. And then with our statistical argument, we show that we can extend this all the way to n plus one rows. And Aurora G kicks in when the number of rows is two n minus n to the delta. So there's a gap, right? So if K had two n rows, it's totally trivial. You can solve for e. Aurora G can bring it down to two n minus n to the delta. But then there's a gap between where the attack is at two n minus n to the delta and a weak improve, which is actually a bit better than n plus one. We can go all the way up to n plus n to the delta. But we don't know how to close the gap between. So that's an interesting open question. Any others? Isn't this knapsack lpn the version you're using? Isn't this called knapsack lpn? The version that you're using isn't called knapsack lpn or something? There's a lot of different names for it. I think in one paper it's called knapsack lpn. I think the exact lpn says just the standard lpn, in which the noise distribution is just the number of... Yeah, it's just the dual or the syndrome of standard lpn, but then I've changed the noise distribution to not be Bernoulli, but exact. I see. That's it. It's called knapsack somewhere, sometimes dual, sometimes syndrome. I think the knapsack is just the dual variant if you take the standard lpn. But I think you're probably using the exact knapsack lpn version. Exactly, yeah. But the parameter is something like that. I thought the loss, it's like you have this n epsilon loss in the number of rows. How much is epsilon going to be? I thought it was something like... So epsilon can be any constant between zero and one? Epsilon just corresponds to the fact that in the primal view, the tall A matrix is like n to the epsilon width and n height. So the height should be polynomial in the width. It's just somehow I vaguely remember it was, had to be less than one half, but maybe any constant between zero and one could be okay. I had a vaguely collection that it's between zero and one half, but yeah, might be okay. Maybe let's talk about it offline. So maybe one last question. You probably said it. So what is the distribution of x that you can allow at the end? So it's like a string with wild cards. Oh, what's the distribution on patterns? Yeah. Oh, yeah, yeah. It's this distribution here where it's a uniform distribution over patterns with a fixed number of wild cards. And the wild cards are at positions that, random positions or those positions? Like uniformly random over all patterns that you can possibly write down with the same number of wild cards. So if you think this distribution is extremely contrived, I agree with you. We do have other constructions in the paper that don't have such contrived distributions. In particular, information theoretic construction doesn't have like the strong uniformity condition. Okay, so could this be like for password hashing where you want to check a password when a person mistyped some of the words? Yeah, yeah, so this, if you want to, I would discourage using like this exact thing for password hashing because the distributions are, they have to be so random. But for the generic group construction, we actually showed that you can get a fairly wide class of distributions. And actually there's another work of Bollins and Wee that gets an even wider class of distributions. And in our information theoretic construction, we just require that the non-wild card bits themselves have enough entropy. And we don't have to have any entropy on like where the wild cards are. So I think for applications, you sort of want that, that you don't have to have entropy on the location of the wild cards. So, okay, thanks. Yeah, thank you. Thank the speaker and come to the last talk. Thank you. Thank the speaker and come to the last talk. Thank you.