 Hi, my name is Jun Young Song, and I'm here to present our work titled Efficient Boeing Search over Encrypted Data with Reduced Leakage. This is joint work with Sarbar Patel, Giuseppe Preciano, and Kevin Yeo. These days, millions of people use cloud services to upload their data on the cloud. However, uploading the data in plain text may impose serious security and privacy risks, as was shown in multiple data breaches that have happened over the years. On the other hand, naively uploading the data encrypted disables search functionality on this data, and this motivates the study of encrypted search. There have been extensive studies on encrypted search, and many different approaches have been proposed with different tradeoffs. For example, oblivious RAM and fully homomorphic encryption support expressive queries and leak no information about the underlying data or the executed algorithms. However, these primitives are very expensive and are not practical for large dataset. On the other hand, structured encryption considers a more relaxed privacy requirement with the hope of achieving small overhead necessary for practical applications. Our work is on structured encryption and in more details on encrypted Boolean multi-maps. Structured encryption, which was introduced by Chase and Camara, is a general cryptographic primitive that considers a scenario where a data owner, or commonly referred to as a client, wishes to store encrypted data structure on a potentially untrusted server such as a cloud provider. Structured encryption schemes should ensure that clients are able to perform all necessary data structure operations, a query or update, correctly over the server stored encrypted data. The privacy goal of a structured encryption scheme is to reveal as little information about the contents of the outsourced data structure as well as the operations that are performed on the data structure. In particular, structured encryption are defined by a leakage function that is an upper bound on the information that may be learned by the adversary and in this case, the untrusted server. Multi-map data structure maintains a set of labels to value tuple pairs where each label comes from the label universe U and each value comes from the value universe V. The multi-map data structure supports a query operation that receives a label from the label universe U and returns a value tuple associated with a given label. In the example, MM of L1 returns a tuple V1, V2 and V3. For our purpose, we consider the extended Boolean multi-map that enables more complex query operations beyond simply retrieving the value tuple associated with the label. A Boolean multi-map is associated with the supported class of Boolean formulae, queries over labels. For our purpose, we only consider conjunctions and conjunctive normal forms. Loosely speaking, logical and corresponds to set intersections while logical or corresponds to set union. In the example, MM of L1 and L2 returns V1 because the intersection of V1, V2, V3 and V1 and V4 is V1. Now we define the notion of an encrypted Boolean multi-map, which is a structured encryption for Boolean multi-maps. Encrypted Boolean multi-map consists of four algorithms, setup, token, search and resolve. The setup algorithm is executed by the client and takes as input the security parameter and the Boolean multi-map and outputs an encrypted Boolean multi-map and a master secret key. The token generation algorithm is executed by the client and takes as input the master secret key and a Boolean formulae fee and outputs a token that is sent to the server. The search algorithm is executed by the server and takes as input the encrypted Boolean multi-map in the token and outputs an encrypted answer that is sent back to the client. Finally, the resolve algorithm is executed by the client and takes as input the master secret key and the answer from the server and outputs MM of fee, the plain test elements in the multi-map. For encrypted Boolean multi-maps, we utilize the same security notions as typically done in structured encryption using leakage functions. The adversaries leakage is upper bounded by a pair L setup and L query of leakage functions. The leakage function L setup provides an upper bound on the knowledge gained by the adversarial server when given the encrypted Boolean multi-map. L query is an upper bound on the information gained by the server when receiving a token from the client generated using the token algorithm and when applying the token on the encrypted multi-map in the search algorithm. We use a simulation-based approach to formalize the security notions. In the setup phase of a real experiment, the adversaries sends a multi-map to the challenger and the challenger will encrypt it and send it back to the adversary. In the query phase, the adversary sends a polynomial number Boolean formulae to the challenger and the challenger sends back the encrypted token. We consider the adaptive setting where the adversary can observe previous tokens sent from the challenger before sending the next Boolean formula. In the setup phase of an ideal experiment, the adversary sends the multi-map to the challenger and the challenger sends a set of leakage of the multi-map to the simulator. The simulator generates the encrypted Boolean multi-map from the setup leakage and sends it back to the adversary. In the query phase, the adversary definitely sends a Boolean formula to the challenger and the challenger sends a query leakage to the simulator. The simulator generates the token from the query leakage and sends it back to the adversary. We say that the encrypted Boolean multi-map adaptably secure with respect to these setup and query leakages, if the real experiment and the ideal experiment are indistinguishable to the other theory. We now survey some of the existing constructions of encrypted Boolean multi-maps. Cache et al. presents an OXC protocol that is a non-interactive encrypted Boolean multi-map. OXC is able to handle all conjunctive queries and Boolean queries in searchable normal form. But for other queries, it may end up in linear search time. Furthermore, the core cryptographic operations in OXC are expensive public key operations, explanations in a Diffie-Hellman group, and hence may end up being slow in practice. The pause et al. presents lines here that handles all arbitrary Boolean queries with worst-case sublinear search time, unlike OXC. Also, the majority of the core cryptographic operations in BlindSeer end up being symmetric key operations. However, the search algorithm still ends up being slower than OXC as a secure computation techniques require multiple rounds of client-server interactions. Camara and Muattas present BIEX that combines several good properties of both OXC and BlindSeer, in particular, BIEX is the first non-interactive encrypted Boolean multi-map that is able to handle arbitrary Boolean queries with worst-case sublinear search times and non-trivial leakage. Furthermore, the search algorithm of BIEX only utilizes symmetric key parameters and ends up being faster than those of OXC and BlindSeer. In our work, we present new encrypted Boolean multi-maps which are adaptively secure, non-interactive, have similar or better efficiency, and have reduced leakage compared to prior works. In particular, we obtain new constructions for handling conjunctions and CNF queries with reduced leakage and optimal communication complexity. Furthermore, our scheme only uses symmetric key op primitives and ends up being more practical than prior works. To compare leakage, we introduced the notion of a base query set of leakage. Informally, base query set of leakage is just a set of Boolean formulae that the adversary can combine to learn volumes of more complex queries. We define spend of the base query set of leakage as a set of all queries for which an adversary can construct the volume from the volumes of the queries in the base leakage set. In particular, if an encrypted Boolean multi-map has a base query set of leakage B for some query fee, it means that the adversary can learn volumes for all Boolean formulae in span of B. The notion of base query set of leakage is quite useful for comparing leakages. In particular, we can say that a scheme has better query leakage than that of another scheme by comparing the spans of the base sets. For example, if the span of one scheme is a strict subset of that of another scheme, we can say that the leakage is definitely better. We will use the notion of a base query set of leakage as throughout the presentation. Before we dive into our constructions, we start by presenting the approach to handling conjunctive queries using prior works, such as KM17. Consider the conjunctive query L1 and L2 and all the way up to LQ. The main idea of prior works is to decompose the query into Q minus 1 to conjunctions, L1 and L2, L1 and L3, and so on. And each of the Q minus 1 to conjunction queries are computed independently such that the resulting response sets are all PRF evaluations under the key, solely depending on the first label, L1. Then the server returns the intersection of all Q minus 1 sets. It is easy to see that the response size is optimal this way. However, there are several drawbacks to using this approach. First, the scheme leaks the volumes of the Q minus 1 to conjunctions. Since the response sets are PRF evaluations under the same key, the server can learn volumes of more complex queries. In general, the server can compute any boolean function over the response sets. That is, the base query set of leakage is L1 and L2, L1 and L3, and up to L1 and LQ. Second, in terms of the computation costs, the server must perform computation on the order of the sum of the size of the response sets from the two conjunctions. This seems quite wasteful as the response set of M, M, L1 and L2 is already a superset of the final response. To address these drawbacks while keeping the size of the server's response optimal, we present the new filtering algorithm that will be utilized by our construction. The main idea is to only retrieve M, M of L1 and L2 and the filtering on this set instead of retrieving all of M, M of L1 and L2, M, M of L1 and L3, and so on, like in the Powerworks. To do this, we maintain an additional data structure that allows the server to check whether a value V in M, M of L1 and L2 belongs to M, M of L1 and L3, M, M of L1 and L4, and so on. Without having to retrieve M, M of L1 and L3, M, M of L1 and L4, and so on. At a high level, the data structure is just a set of double PR evaluations of the multi-map values. More concretely, for each label pair and a value such that the value appears with the label pair in the encrypted multi-map, the data structure extorts a double tag of V. This double tag is computed by first applying a PR with the key solely depending on the first label and then applying another PRF with the key depending on both labels. In the first row of the above example, we see that the data structure stores double tags with PRF evaluation with the key depending on label L1 and then another PRF evaluation with the key depending on label pairs L1 and L2. Now we will show by example, how we can use the filtering algorithm to compute the response to the conjunctive query. As an example, to compute M, M of L1 and L2 and L3 and L4, we first retrieve M, M of L1 and L2. Right now the intermediate response that contains these values, three values, V1, V2 and V3. We now check whether each of V1, V2 and V3 are in M, M of L1 and L3 by applying a PRF with the key depending on label pairs L1 and L3 on the multi-map stored PRF evaluations and check whether this resulting evaluation is in the data structure X. If it is, this means that the value is in M, M of L1 and L3 and we keep it, otherwise we throw it away. We can continue with this filtering screen for M, M of L1 and L4 and finally obtain the final response that which ultimately contains a single value V1. We now give a brief analysis of our conch filter scheme. It is not too hard to see that the base query set of leakage of conch filter consists of one, two conjunction with the rest being three conjunctions. Note that the scheme has smaller query leakage compared to prior works because it only leaks one, two conjunction L1 and L2. Compared to BIEX, conch filter maintains an additional data structure X but this reveals nothing more than the size of X and hence the set of leakage stays identical to that of BIEX. In terms of storage, the encrypted multi-map stores values for each pair of labels in the label universe. The size of the data structure X is asymptotically the same as that of the multi-map. Thus, the storage of conch filter is asymptotically the same as that of BIEX. The token size turns out to be a big O of Q where Q is a total number of labels in the conjunction. For the server computation, the server first computes a response set for the query MM of L1 and L2. Afterwards, the server filters out the set for each of the other labels L3 all the way up to LQ. As a result, the server computation becomes O or big O of Q times the size of the response set of L1 and L2. Note that in practice, conch filter outperforms prior works like BIEX since the size of the intermediate set becomes smaller after each round of filtering. We now show how to support CNF queries using the filtering algorithm as a building block. Before that, we start by reviewing the BIEX construction for CNF queries. Consider a CNF query of the form D1 and D2 and all the way up to DM where each DI consists of disjunctions of the labels. In the first step, BIEX computes MM of D1 using IEX, a black box construction for disjunctive queries. Note that IEX ends up leaking volumes for singleton labels in D1. The problem here is that there is no known scheme supporting disjunctions that do not reveal singleton query volumes. To avoid this leakage, our construction combines the first two clauses of D1 and D2 that may be rewritten as disjunction of two conjunctions. Then we can apply the algorithm for computing disjunctions. For example, IEX scheme from KM17 over the two conjunction results as to obtain MM of D1 and D2. Note that while the volumes of all two conjunction sets are revealed, no volumes or singleton queries are leaked with this formulation. We can now apply the filtering algorithm again to incorporate remaining clauses D3, D4, all the way up to DM. For example, if we let S2 be MM of D1 and D2, then we can see that S2 intersection of MM of D3 can be rewritten as a union of intersections of S2 and the responses to singleton labels in the D3 clause. We claim that each intersection can be computed using the filtering algorithm and computing it this way avoids volume leakage for many queries. There are some details omitted in this presentation for simplicity, so please refer to full paper for more details. And we can repeat this for other clauses to compute the final response set. We now give a brief analysis of our CNF filter scheme. We claim here with that proof that the span of this base set of query leakage of CNF filter is a subset of the span of a base query set of leakage of BIEX. This means that our scheme leaks no more than BIEX. Since a set of construction is identical to the contribution scheme, the set of leakage and storage stays identical to those of a CNF filter presented earlier. In particular, this means that the storage stays asymptotically the same as that of BIEX. In our experiment, we show that our scheme just incurs 20% more storage over compared to BIEX. We can upper bound the token size by order a big O of Q squared where Q is a total number of labels across the M disjunction clauses. In terms of the server computation, we can upper bound it as big O of Q squared times the size of response from MM of D1 and D2 where Q is a total number of labels across the M disjunction clauses. Note that our scheme outperforms BIEX in practice because in many cases the size of the response set of conjunctions of the two clauses is much smaller than the size of the response set of a single clause. We now present some of the results from our experiments. BIEX in KM17 is implemented in Java while our schemes were implemented in C++ and in general, C++ is much faster than Java. So we implemented BIEX and C++ for fair comparisons. The table shows micro benchmarks for the search time of CNF filter and BIX on randomly chosen queries of the form D1 and D2 and D3 where each DI is a four label disjunctions. The data set is on the Elon, yeah, Elon data set. The leftmost column, okay, going back, the leftmost column and the topmost row denote the number of values associated with each label in the first and the second clause respectively. The number of values associated with labels in D3 are fixed to 10,000 and all search times were measured in milliseconds. And from the table, we can see that our CNF filter scheme outperforms BIEX in all scenarios and in some cases as much as 20X faster. So the line charts here shows that the search token sizes of CNF filter and BIX for three clause CNF D1 and D2 and D3 where the D1 and D2 contain five labels and the X-axis indicates the number of labels in D3. We see that the token size of CNF filter is almost twice as small as data of BIEX. So we can see that our CNF filter scheme has a better token size in practice. So in this table shows storage and set of time of CNF filter and BIEX. Since CNF filter maintains an additional data structure X, we observed that the storage size and set of time are generally larger for CNF filter. However, we see that the storage size is only 20 to 25% larger than the BIX here. And as for the set of time and the storage time, we believe that the extra, I guess overhead is reasonable given the leakage communication and the server competition improvements that our CNF filter...