 Welcome to the talk about secret shared shuffle. I am Aksana and this is a joint work with Asha Gosh and Melissa Chase. What is secret shared shuffle? It is a protocol which allows two parties to obtain secret shares of a shuffle database, where the permutation of a shuffle is hidden. More concretely, assume Bob is an owner of a database X. After parties run the protocol, they should hold the shares of X permuted or the shares of X sub pi. For instance, if we denote shares of Bob by B sub pi, Alice should hold B sub pi plus X sub pi pi, where pi is a secret random permutation. In particular, Alice's first element would be B1 plus X sub pi 1, which means that the first shares of Alice and Bob together add up to element of X at position pi of 1. Note that this primitive incorporates two tasks at the same time, the task of permuting and the task of secret sharing. It is inherent to require them both at the same time. For instance, if we just ask the protocol to return the shuffle database to Bob, Bob can easily derive the permutation by linking the elements of his database. Thus, the protocol, besides shuffling, should also hide the output in some way. And in our primitive, this hiding is achieved by secret sharing the output. Note that this primitive can be also defined for the case where the database X was itself secret shared to begin with. In addition, we consider a variant of this protocol, which we call permute and share, where the permutation is an input to one of the parties. As usual, we require that parties do not learn anything beyond the outputs. In particular, Bob shouldn't learn pi and Alice shouldn't learn X. It is easy to construct secret shared shuffle by running this protocol twice, where in the first execution Alice provides her permutation and in the second execution Bob provides his permutation. This way the resulting permutation will be a composition of the two and thus hidden from each party. Whereas secret shared shuffle useful, this work originated as part of a larger project of doing secure machine learning on a private-centered intersection. And secret shared shuffle is one of the building blocks in the full protocol. In fact, it can be used in any secure computation on set intersection, as I'm going to describe next. So what is this problem of two-party computational set intersection? In this problem, there are two parties, each party holds a database, and they would like to securely compute the function on the intersection of the databases without revealing the intersection itself. The standard example of such two-party computation is estimating the efficiency of online advertisement, which requires intersecting the data of the merchant and the ad supplier. Ideally, we would like to run any algorithm for private-centered intersection, obtain the result in encrypted form, and then run your favorite two-party computation on it. The reality, however, is that efficient private-centered intersection protocols do not output an encrypted intersection as we like. Instead, they output an encrypted indicator bit of whether each element belongs to the intersection or not. It is possible to run two-party computation on the whole dataset and just filter out wrong elements within two-party computation, but this will mean a slowdown. Ideally, we would like to filter out wrong elements and only then run secure computation on the intersection only. An actual way to perform such filtering is to randomly shuffle elements and then to reveal the indicator bits in the clear and then discard all elements which have bits 0. Intuitively, no one learns which elements are in the intersection because the permutation is hidden. Again, note that it is very important that elements are re-randomized when they are shuffled, because otherwise, parties can link new and old elements together and learn the intersection. And our primitive secret shared shuffle is designed exactly for that and allows to perform such filtering. Finally, let me know that each element of the database can be relatively long. For example, in our example with machine learning each element can come with a payload such as a feature vector and during shuffle this payload has to be moved around together with the key. Secret shared shuffle is a special case of generic two-party computation and therefore it can be implemented using standard solutions. But in this work we focus on efficient implementations. One simple way to implement it efficiently is to use additively homomorphic encryption where roughly speaking Bob encrypts his database, sends it to Alice and Alice premutes it as needed and adds random shares homomorphically and sends them back. This protocol is very simple and it has linear computation and communication complexity and it only takes two rounds. But the downside is that it uses expensive public key operations and at the end it is much less efficient than asymptotically worse symmetric key-based approach based on shuffling networks. However, this approach has its own downsides logarithmic number of rounds and communication cost which is proportional to N log N times the size of each element in the database. Intuitively this is because shuffling networks consist of N log N swaps and parties have to feed the full element together with its payload into each swap, hence the complexity. We design an even faster algorithm for secret shared shuffle it is also based on symmetric crypto but it is also more communication efficient than network based approach both in number of rounds and in the amount of bits sent. If I were to state the idea in one sentence we find a way to reduce secret shared shuffle to its random or pseudo random variant and then it turns out that pseudo random variant actually admits a communication efficient solution. In particular, our communication is log T times less than in previous approach where T is a special parameter which best value we determine experimentally. At the end we achieve the speedup varying around 3x to 12x and the bigger each element is and the worse the network is the better is the advantage. All the protocols I discuss in this talk will be semi honest. Here is the plan for the rest of the talk. Because that secret shared shuffle can be constructed from two permutant shared protocols which does the same thing except that one party controls the permutation. To build this protocol we will build what we call a shared translation protocol which is exactly that pseudo random variant of permutant share I was talking about before. We will show how to build it from oblivious puncture vector which can in turn be built from puncturable piers. So let's try to build permutant share and just a reminder that we want to secret share Bob's database under Alice's permutation. So what do we do? Let's start with something. At the end Alice will need to hold shares of X. So let's make Bob pick random A's and send shares of X plus A to Alice. Alice who knows the permutation Pi can locally rearrange the shares such that X becomes permuted properly. For this to be secret shared of permuted X Bob needs to hold permuted A's such that for instance the first shares together add up to X3. The problem is that Bob knows A's but doesn't know in which order to rearrange that because he doesn't know the permutation. But even worse, even if Bob could somehow obtain permuted A's he could also link them to unpermuted A's and therefore learn the permutation. Intuitively the problem here stems from the fact that A's are used in two different roles as masks encrypting the database and as final shares. And the solution is to decouple them. We will have random A's which are masks for encrypting X and random B's which are the final shares of Bob. And at least for now these two values will be independent. If Bob's shares are B's then we want Alice's shares to be B plus X of Pi. How can Alice learn these values? For instance as a first share she needs X3 plus B1 but she only knows X3 plus A3. So we need some mechanism for Alice to be able to convert X3 plus A3 into X3 plus B1 or more generically to move from shares under A which is a vector X plus A to shares under permuted B which are X plus inverse of Pi of B. If Alice somehow knew the difference between the two she could easily move from one to the other. We call this difference share translation since it allows to translate between two types of shares and denote it by delta. For instance in our example of X3 plus A3 and X3 plus B1 delta will be the difference which is A3 plus B1. Plus our idea is to device an efficient protocol which allows Alice to learn these deltas. And then Alice can add delta vector to X plus A vector A's will cancel out and Alice will get X plus permuted B which it can then permute to get permuted X plus B. So once we describe how to build the share translation protocol we will be done. A little more formally here is what the share translation functionality looks like. The only input is the permutation Pi which is coming from Alice. The functionality internally samples two sets of random or pseudo random values A and B and then Bob learns both A's and B's in the correct order and Alice learns deltas. Each delta is A plus permuted B. That's what she needed in the previous protocol. Note that we did a small modification here. In the previous protocol Bob chose A's and B's but now they are being chosen by the functionality. This is not a big deal for Bob he can still use them as before but it will help us a lot because they can be generated in some correlated way which can be very efficient. Note that this protocol is essentially a permutant share but for random database. Indeed if we think of A as a random database which is out of Bob's control because it was chosen by functionality then we can view it like this. Alice and Bob obtain B and permuted A plus B which are secret shares of permuted A. So this is essentially a pseudo random variant of the same problem. Let me describe the protocol Let's assume we have a way for Alice and Bob to agree on a matrix with random values such that Bob learns the whole matrix but Alice doesn't learn some elements. Concretely she doesn't learn the elements which are sitting in positions corresponding to her permutation. That is at coordinates i and pi inverse of i. In particular in every row and in every column there will be exactly one element unknown to Alice. Assuming we can do this share translation protocol can be implemented as follows. Bob will set each A i to be the sum of elements in row i. Then it will set each B j to be the sum of elements in column j. If we look at the formula for delta i Alice should set her delta i to be A i, which is sum of row i plus B sub pi inverse of i which is the corresponding column. For instance back to our example delta 3 is A3 but plus B1 which means that Alice should add together the elements which are colored green on the picture. Note that Alice actually cannot learn A3 and also she cannot learn B1 because she is missing exactly one element from each but she still can compute the sum because the missing element in the bottom left corner would participate in the sum twice and cancel out. In some sense the missing elements are exactly in the right position so that they still allow Alice to compute deltas. Note that Alice cannot compute any wrong sum for instance she cannot compute A3 plus B2 which would be the sum of row and second column she is missing two elements for that. Note that this protocol so far has computation complexity proportional to the database size squared because of the matrix. This is too much and we will take care of it later. So now it remains to describe how to generate such a matrix and we will generate a row by row. To generate each row we will need the following protocol which we call oblivious punctured vector and which is very close to n minus 1 out of n ot. Alice inputs some position j and the functionality generates a vector of pseudorandom values and sends the whole vector to Bob but only a partial vector to Alice. In particular Alice doesn't learn what's at position j. We want a natural security requirement that Bob doesn't learn position j and Alice doesn't learn the value at position j. Note that if we don't care about communication and don't care about hiding j then there is a naive solution which is Bob generates the whole vector and sends all but one element to Alice. Instead we are going to present a solution which achieves logarithmic communication complexity based on GGM-PRFs. Our solution was inspired by a construction from Doerner and Schlatt and a very similar construction appears in recent works on optimizing OT and OLA. So let me describe that. Let's first achieve a logarithmic communication without hiding j. Bob should pick random root s and build the GGM-PRF3 out of it. The leaves of this tree will be our vector. Then to transfer all but one value to Alice Bob will send her the green nodes as in the picture. Alice can expand each of the green node into a sub-tree and thus learn all the leaves except for leaf number j. Now let's modify it to hide j. Let's look at the second layer of the tree. Previously Bob sent Alice s1 but now it shouldn't learn which way the black path goes and whether Alice learns s0 or s1. This is easily achievable by letting Bob send oblivious transfer of s0 s1. Now let's look at the next layer. We could let Bob send all four values using oblivious transfer and that would be secure and the last layer Bob has to send all of n information which is not succinct anymore. Here is where the idea from Doerner and Schlatt comes in. At the second layer Bob should send ot of two roots as before but at the next layer Bob will send ot of again two values as follows. The first ot value will be the sum of left children as denoted by red arc. Now that Alice already knows s1 and therefore she knows all values in that subtree including s10. If Alice pulls the first value from ot then she can remove s10 from the sum and therefore learn s00. The second ot value will be the sum of right children. This sum trick allows Bob to send only two values but still allows Alice to recover the root she needs. So the same happens at every layer. The first ot value contains the sum of all left children and the second one contains the sum of all right children. Thus the total communication is log n per vector or n log n for the whole matrix. Let me summarize what our protocol for permutant share does so far. First parties are going to generate the matrix which Alice only partially knows. For this Bob generates n seeds of prfs expands n trees and fills the matrix with n square leaves. It also sends n log n one out of two ot's to help Alice recover all values except for the hidden ones. In parallel with that Bob sends Alice to database x masked with a's and remember that each a is just a sum of the corresponding row. Then parties set the final shares as follows. Bob just sets them to be b's where b's are the sum of columns. Alice computes the shared translation delta by adding up a certain row and a certain column and conveniently the missing element does not participate in the sum so Alice doesn't need it. Then she adds deltas to x plus a's and gets x plus pi inverse of b and then she can permute them to get pi of x plus b. That's the protocol so far. It's pretty simple and it has low communication however the computation complexity is high proportional to n squared. Here is where our second idea comes in. We are going to exploit a special structure of Banach permutation. There is a certain way to count this network into layers such that in each layer the permutation of that layer turns out to be a collection of several disjoint smaller permutation each acting on t elements for some parameter t. So our idea is to run the protocol permutant share which I described it before on each small permutation of size t separately. Final protocol running time depends on t in somewhat convoluted way but the idea is that with the same values of t either too large or too small the protocol is somewhat slow but there is a sweet spot in between where its complexity is optimal. For instance when t is equal to n we essentially run permutant share on the whole database and therefore get n squared complexity which is bad. On the other extreme we have t equal to 2 which means that we use our whole protocol just to implement a single spot which is not good either. For comparison shuffling network-based approaches implement swaps simply using single ot and we are instead using all those trees and matrices and what not clearly this is slower. But as we start increasing t there will be a spot where complexity t squared is still relatively small but where we already get savings from our 16th communication which makes the protocol faster in our experiments best t was something like 128 or 256 and depending on other parameters we got a speedup of 3x to 12x compared to previous approaches. That's pretty much sums our work let me stop here and that was the work about secret shared shuffle thanks for listening.