 Okay, so the next talk is on permuted puzzles and cryptographic hardness and Justin will give the talk right hi I'm Justin, and I will tell you about permuted puzzles and cryptographic hardness later Right now. I'm going to talk about private information retrieval Which is the main reason that anybody should care about permuted puzzles and it's also something I guess you've already heard about but just to like remind you so in private information retrieval we have Some database that's on a server and you have and some client that wants to access this database without revealing which part of this database is looking at and Of course the two basic requirements are that the client should actually get the bit of the database that he's interested in and that the The curious server should learn nothing about which bit the client is accessing Okay, so the in Traditional focus and private information retrieval has been on minimizing the communication between the client and server like specifically trying to do better than the trivial solution where the server sends everything to the Client and by now we know how to do like poly logarithmic overhead or maybe even O of log I don't know but in this talk We're going to focus on something different which is the amount of computation that is done by the server So there is like a trivial linear lower bound on the amount of work that the server needs to do because if it if It doesn't touch some bit of the database and you know that the client is not looking at that bit Still though, we're going to try to do better than that by allowing the server to do some preprocessing That's what we would hope for And so this notion of private information retrieval scheme are not just a client but also the server runs in sublinear time Was termed doubly efficient private information retrieval in TCC two years ago So the question is does doubly efficient private information retrieval exist and I have no idea. I don't have any candidates and I don't know of any impossibility results However, there is like a useful relaxation still of doubly efficient pure I guess you can think of as a secret key version of pure In this version Well, there's a secret key and you need to have the secret key in order to Process a database or to make queries to the database and privacy is guaranteed against the adversary adversary that does not have the secret key So intuitively this corresponds to a setting where the database is actually like owned and controlled by the client But the client is just storing his own database on some Snoopy server So this notion of a secret key Private information retrieval scheme is actually sufficient for some applications like a homomorphic encryption for RAM programs Okay, now the next question is the secret key doubly efficient private information retrieval exist I also don't really know but I have like some more reason to think that it might exist So there's basically one candidate construction that was in the same TCC papers In 2017 and within this there's like sort of like one fuzzy construction There are some ways you can play with parameters and vary it a little bit But this is fundamentally like an ad hoc construction. It's not based on any standard cryptographic assumption And it seems to me like it's very hard to try to construct the scheme based on it on a Standard assumption, although I don't have any formal evidence to justify this So I want to show you briefly what the candidate scheme looks like even though it's not Really the focus of this talk sort of So the secret key is involves a pseudorandom permutation and also an encryption key Although like mostly we'll forget about the encryption key. So what you do is you take your database and you encode it in a locally decodable code specifically we consider the locally decodable code that is the read-molder code and so You don't need to know what that is So we encode the database and then we randomly permute all the entries of the database under under this permutation pie And we encrypt each entry of the database now Said that you want to look up an entry of this database privately. So here's the The scheme that we proposed First I'm being a little bit vague first you sample a low-degree curve I'm not too low-degree, but say like the degree is a security parameter Through the point that you're interested in in the in the read-molder encoding the database Now instead of just you take some points on this curve and you Ask the server you like permute these points yourself and you ask the server to tell you like the values at the permuted points and Now you just decrypt the responses that you receive and use some polynomial interpolation Which is it's just some local decoding procedure. That's guaranteed by the code. You don't need to Worry too much about that if you're not familiar with read-molder So does the scheme make sense to everybody? Okay, so Yeah, that's like roughly a scheme you can play with a little bit by like Varying how you choose the points on the curve and you can like maybe throw in some noise too but That's that's this is basically the only scheme that we have So to be a little bit more formal about what we want out of this scheme will tell you like what the security game is So Again, you have a challenge and an adversary you sample your random your random key and now an adversary picks two different sequences of Queries that I might want to make If you want you can think about these as being adaptive, but for now like just for simplicity. Let's say they're all chosen at once The challenger like just makes the Deep that the peer queries basically it samples the curves that go through the specified points for like one of these two Two query sequences it permutes all these points and the adversary then tries to guess which which of the two query sequences The challenger chose Okay So if this were to be secure, where would the computational hardness be coming from? Well, the hope is that this secret permutation is Somehow adding computational hardness because we can we know we can show that if you don't have this permutation if you just Just use the identity then the scheme is completely broken So for the rest of the talk, I'm going to try to look at this phenomenon a little non a little bit more like Does adding a secret permutation ever actually create computational hardness and more specifically for our candidate construction is this assumption plausible? So for the first question does a secret permutation ever create computational hardness? Well, we formalize this question a little bit better. That's just what these permuted puzzles are all about if you've been waiting and this builds on one of these TCC papers in 2017 and We do show that yes, there are actually examples you can you can like Show that permutations create computational harness from like standard crypto crypto graphic assumptions Although this won't we can't show that the PI the private information retrieval scheme is secure from standard cryptographic assumptions Just like this general phenomenon Can happen? okay, I mean the yeah, we show this based on random oracles or a DDH assumption or For me my showed us a very clean example of something based on LPN Okay, and this talk I'm not going to talk about the DDH. I'm just going to do everything in idealized models So on the second contribution is looking at whether or not like actual secret key DAPR schemes are plausible and We still don't really know but we do rule out one very broad class of attacks, which we sort of formalize So in terms of the abstraction of our pre-permuted puzzle So in secret key DAPR we have this Distribution we can think of it as a set of points You can think of that as being a binary string where ones indicate where you have a point in general We'll think about having just a bunch of distributions on n-bit strings And we wanted to say that when you permute all these points then Somehow stuff became indistinguishable, right? So in the and by permuting I just mean that you're taking the bits of x and you're just rearranging them according to one permutation You reuse the same permutation in every sample And we wanted to say that these are These like these permuted distributions are indistinguishable and like pretty standard sense You pick two different sequences of these distributions the challenger picks a random one of these sequences and picks around the permutation And now it'll answer according to the sort of permuted distributions Right and in this talk I'm just going to focus on K equals to where there's two choices of distributions or other than like n different choices of distributions just for simplicity because that seems to capture the phenomenon of the computational hardness so That's that's like what a permuted puzzle is you have some distributions They're easy to distinguish you add a random permutation and now they're hard to distinguish so it's Not too hard to construct these in the random oracle model I'll just show you the construction and assert that it's obviously a good permuted puzzle So you have a random oracle Just random function that takes inputs and just end bit outputs So in each sample I'm every time a sample I'm going to sample basically new random oracles You can derive some random oracles by choosing a seed and like whatever it's pretty standard You can derive n new oracles The first distribution is going to be a description of these n oracles and Like an input and then like just apply the oracles in order and then you get an output Okay, and the second distribution is going to be the same thing except at the end instead of doing the output You just you just the distribution just contains a uniformly random string Okay, so just so we're on the same page. It's very easy to distinguish these two distributions, right? You just Apply the hash functions in order and see that at the end you get the the claim value Hey, but now if and if you don't get s1 through sn in order if you just get them in some random order My claim is that like obviously in the random oracle model You can't tell whether why is the result of applying of check of like Computing or iteratively applying these hash functions in the correct order, right? So I'll just say that's Obviously true, and if you want a formal proof you can look at the paper All right I'll show you one more construction of a hard-promoted puzzle just to give you a taste of What are what things look like and it's in the generic? I'm going to argue that it's secure in the generic group model But the same construction actually you can show it to be a hard-promoted puzzle under the DDH assumption So you have your group She's a random vector over is EP He is order of your group and now your two distributions are just two Vectors of things in the exponent in the first case every the vector in the exponent is orthogonal to you And in the second case the things in the exponent have no relation to you whatsoever So again, these are two distributions that you can easily distinguish just by doing an Dot product in the exponent or you can check whether You dot the exponent is equal to zero just linear operations Sorry Like behind by one on my slide count Okay, so On the other hand if if you permute these two vectors The claim is that you just can't you can't do this simple inner product test anymore And in the generic model you can actually show that these sorts of inner products in the exponent are the only thing That you can do to try to distinguish a vector of encoded group elements from random So as long as you is not like say a constant vector or something close to it There will be no fixed vector that you can try to do an inner product with which will be zero with any Noticeable probability Okay, so now we know that there are hard for me to puzzle in principle at least permutations can make Easily distinguishable distributions become indistinguishable so the next thing I wanted to talk about was how to Just analyze permuted puzzles in general like how how to think about whether or not a specific puzzle should be computationally hard And in particular this is going to be motivated by our a secret key deep here constructions Okay, so The idea is and I don't have a like a formal statement or proof But once you when you have these distributions of and bit strings, and you apply a random permutation to each of the strings that kills a lot of the structure of the ways in which this Distribution these these strings could be non-random I will mention that it is important that I'm talking about distinguishing these strings from random if you talk about just General distinguishability of two distributions. I can construct kind of complicated Examples that I don't really know how to how to rule out so Yeah, does anybody have any good ideas for how you can what kind of structure you might look at I don't mind no talking so one one thing you can try to look at in In Like distributions after it's been commuted is something like say the number of bits that are still one because that's preserved by a permutation right, so you can if if a Distribution has has a hamming weight that's Distributed differently than uniformly random string that of course you'll be able to distinguish it even with a permutation Another thing you might try to do is look for some weird correlations between bits So maybe if you have a distribution There is some pair of bits that are never both one or there are just some like small number of bits that are distributed differently From uniformly random So you can look for all for these bits You can try all small subsets of the bits and you can do this even if you have a random permutation Okay, like one other thing and this is the last thing I promise that you can I can think of that you can try to do to Distinguish some permuted strings from random is look for some linear structure So for instance say that your x1 your all your x's are contained in a Like a low-dimensional subspace It don't necessarily know which one is just some subspace that they're contained in you can check this By just doing linear algebra like Gaussian elimination or something and this is also preserved if all of the samples are permuted by the same permutation And I don't really have any other like fundamentally different algorithmic ideas for how you can distinguish It's sort of permuted puzzle thingies from random strings And for the rest of the talk I'm going to focus on like a generalization of these ideas one and two called statistical query algorithms So yeah, what's a statistical query algorithm? So oh Everything appeared all at once, okay so I Guess you can just look at the slide while I talk but given a bunch of samples from this distribution you're trying to tell whether or not it's uniformly random or not right and so a statistical query algorithm just limits the algorithm in a Specific way where the algorithm doesn't get to directly look at the samples instead It gets to look at essentially one bit per sample So for each sample it says it specifies a function and it gets that function applied to that sample and it gets as many samples as it wants in this form where it just gets one bit per sample and And then at the end it tries to use this is one bit per sample Queries to to guess whether this this distribution is random or not And so this is a this is a like an idealized model of learning that was introduced by Kearns to study Pack learning that's robust against noise and we kind of thought it was a neat model to Like to study for just these distribution distinguishing problems that arise in deep here Okay, so The sounds like a really like silly moderate like if you're only looking at one bit of each sample, what can you possibly do? Actually quite a lot so In this whole field of like robust pack learning algorithms basically every single algorithm is In is a statistical query algorithm like there were some algorithms before statistical queries were introduced and those were later found to actually be equivalent to some statistical query algorithms There is one exception which is well-known BKW algorithm for learning parity with noise when the Yeah, the secret is very very short Just very slightly super logarithmic And like I said we're we're sort of Studying statistical query algorithms in a slightly different setting than they were originally considered but we're still able to show lower bounds which is I Guess part of why we consider them Okay, so what can't statistical query algorithms do? So one thing they can't do which is you might laugh that is I can't do Gaussian elimination Like there it's known that you can't learn parity with statistical query algorithm And that shouldn't that should make sense because there's this type correlation between statistical query algorithms and noise robust learning And we know that we can't learn parity with noise. We hope that we can't learn parity with noise And in this work, we show that these statistical query algorithms also cannot break Even like a toy problem in That's like similar to deep Europe, but presumably easier to break So what is this toy problem? Why is it different than the actual construction and why can't statistical query algorithms break it? So it looks kind of similar to a setup you saw like 10 minutes ago if you remember that at all except that instead of for these these polynomials instead of Being these parametric polynomials that can like intersect themselves and all that we're just looking at Graphs of polynomials, so these are just simpler to analyze. That's the main reason that they're part of the toy problem simpler to think about for sure and We want to say that when you have a Low degree again degree is a security parameter polynomial versus a random function These distributions become indistinguishable When you add a random permutation Okay, so I'll just try to sketch why this is true and Then I'll be almost done So suppose you have a function f that you're this is going to give you your one bit about some sample and Also, I'm just going to make your life easier as the attacker I'm going to say instead of a uniformly random permutation We're just looking at a restricted class of permutations Where you just sample each column separately and then you sample the ropes either need to serve then you permute the columns And in order to show that I want to show that Your statistical query cannot give you any information like so to do that I want to say that the probability that your query is returns a one Doesn't even really depend on the secret permutation or like what bit the Challenger chose so like with very high probability this Over the truth. Yeah, basically when the challenger chooses this permutation with very high probability This probability is very close to some fixed value. So you can simulate the result of the query by yourself Yeah, you can simulate the query by yourself Okay, so What I like to I want to explain why this is true and I'm just going to show that For the distribution on the left the variance in this probability is negligible so the variance of is Equal to this quantity, right? This is just how it's defined, but if you look at this You look at this quantity It's like saying you take a random permutation and then you take an x and take another x prime And yeah, and and like what's the probability that f of pi of x is one and f of pi of x prime is one If you if you like stare at this expression enough, that's what it is and here It's saying if you take a random permutation pi and another and a different random permutation pi prime An axe an axe prime Then like what it's the same thing It's like just stare at those those expressions enough and it turns out that what you need to bound is a statistical distance between These two different distributions. So one is if you take two samples and you apply the same permutation And the other distribution if you take two samples and you apply different permutations Okay And and then you and you yeah, yeah, so So the the difference between these distributions is just related to the the size of the intersection Between so when you when you apply the same permutation to two different sets The distribution is uniformly random Conditioned on like those two sets having the whatever overlap they had Whereas when you do two different permutations, they're just there's just some new distribution of like how many points They overlap on and other than that that distribution is also uniformly random So what we need to prove at the end of the day to finish this bound is that the when you take two Degree lambda polynomials you look at their graphs and you look at how many points these polynomials Agree on it's the same roughly distribution of number of points as if you'd have two random functions And there's some parameters about like the field size versus the degree and stuff to make this work But that's that's like what we that's like the lemma at the bottom of this statistical query lower bound All right, that's all the technical stuff So for future directions I said that there are these different types of algorithms that you can use to try to break permuted puzzles And I addressed two of them and said that those don't work. So what about the third one? We want to address that and actually in a Some work in progress we show that this toy problem, which was like the easiest thing that we didn't know how to break before We can actually break with a linearization with a sort of linearization attack We don't know how to break the actual full schemes We do want to figure out what's going on at these full schemes We can maybe attack them maybe reduce them to standard assumptions Maybe we can formulate some sort of idealized model of like these really are the only types of attacks can do a permuted puzzles And show that these attacks don't work I Sort of feel like there should be some reason that you can't reduce the standard assumptions But I don't know how to say anything formal. So that would be cool if we could do that And finally, I still want to have a not secret key double efficient pure scheme. This is a keyless scheme I Have no idea. It'd be really cool if you could figure out whether those exist That's it. Thank you That's just for just 10 Thanks. So one thing that you could consider maybe you did already Is to not just ask about the points that are on the curve But also add some dummy points and this intuitively is sort of like adding noise, which seems to be useful in many settings So do you think this would be helpful in particular? You have an attack on this toy problem So does your attack work also and on sort of a noisy version of the toy problem? Yeah, it's a great question We're definitely thinking about the effect of noise. We don't have anything definitive to say at the moment But yeah, like we don't we are attack doesn't work when you add noise Okay, any other questions? Have you thought about other applications of these assumptions? Like are they useful for building other things? I have no other Applications that I am aware of thanks Okay, let's thank Justin again