 Hello everyone, my name is Elie Bousquatier and I am a PhD candidate working with Guillem Castagnos at the University of Bordeaux and Olivier Sanderse at Orange Labs. This is a presentation of our work on public key encryption with flexible pattern matching. In classical public key encryption, the receiver generates a pair of keys that allow others to send him private messages. Today, most internet traffic is encrypted to ensure the privacy of users. But this encryption is incompatible with functionalities like intrusion detection systems, where a service provider searches the traffic for patterns that indicate malicious activity. In a searchable encryption scheme, the service provider can query the receiver with such a pattern and get a trapdoor that allows him to learn the presence of this pattern and know their information. Searchable encryption is a name that is used for very different schemes. In the case of email routing, the search is performed on encrypted keywords that are attached to the messages. But in the case of deep packet inspection, it is somewhere in the stream of the traffic that the pattern may appear. Another difference is that in the first case, keywords are known in advance, whereas in deep packet inspection, the list of patterns is evolving in order to search new viruses, for instance. Because of these differences, we call the kind of scheme we are designing a stream encryption scheme supporting pattern matching. If we take a look at the distribution of pattern lengths in the pattern list of Snort, which is a famous open source IDS, we find that there are many patterns of different lengths. But they are all short. So the task of an IDS is to search many patterns of varying short lengths in long streams. There exist generic solutions to process encrypted data, such as fully homomorphic encryption and multi-party computation. But their versatility comes with heavy costs in computation or in communication. So we want to use specific schemes that are more efficient. In 2015, Blindbox showed that it was possible to search a string in a stream using pseudo-hondon functions, but with a very strong limitation. Encryption is performed at a string level using a fixed length sliding window. Here, for instance, it is all the substrings of four consecutive letters that will be encrypted. And this will only allow the receiver to give trapdoors for keywords of length 4, like the keyword host, but not the keyword hostile. As such, the expressivity is very limited. One solution would be to encrypt the stream once for every possible pattern length, but we have seen that there are many different lengths of patterns, so this would be inefficient. Finally, if we decide to split longer keywords into smaller strings, the long list of snort with more than a thousand patterns shows us that the service provider can now learn a lot of extra information. In order to encrypt with a granularity of one symbol instead of strings and still be able to perform pattern matching, most recent solutions used by the newer pairings. Two papers have been published, one at Azure Crypt in 2018 and one at Azure Crypt in 2020. But their security is only proven under the same very strong interactive assumption, and the size of their public key depends on the size of the alphabet, which implies a factor 2 to the 8th as the patterns are byte strings. To see where these limitations come from and how we address them in our paper, I will first show you some common ground to all these constructions. Then we will come back to a more precise description of these existing constructions, and finally we will show our contributions. Byline error groups are a setting where we have three groups G1, G2 and Gt of prime order p and a map E that sends a pair from G1 cross G2 in Gt. The Bailanarrative property of E written here shows that this map can be used to perform a multiplication in the exponents. In the type 3 setting, we assume that there is no isomorphism between G1 and G2. As many modern schemes, we use this type 3 setting which is very efficient. To see how we can build a trapdoor mechanism from this setting, let's assume that there are public elements G and G tilde in G1 and G2, and that we know G to the a and G to the x without knowing a and x. Then the Decisional Diffie-Hellman assumption tells us that it is hard to distinguish G to the a x from a random element in G1. Now if we also know G tilde to the x, we have a test because the pairing of G to the a and G tilde to the x gives us an a x exponent. Because of the type 3 setting, it is not possible to compute this G2 element from the G1 element, and it must be given by someone who knows x. Notice that we have room to add some randomness s on this trapdoor by giving G tilde to the s and G tilde to the sx, because then we can pair the G1 and G2 elements to get a test, and this will give trapdoor unforgeability properties. From these features, we show a first simple encryption scheme with pattern matching. There is an upper bound n on the size of the messages that can be encrypted at once, so it is not yet adapted to streams. We first choose a secret encoding alpha that depends both on the symbol to be encrypted and on the position where it occurs in the message. The public key is computed by elevating G to these values. For example, we take n equals 5 with the message steam. First, the sender chooses a random integer a and gives G to the a as the first ciphertext element. Then he uses the public key element for the letter s when it is in first position and masks it with a. It does the same with the other elements. Now we show a trapdoor for the pattern t. The trapdoor contains one position trapdoor for each position of the pattern in the message. To compute the element t prime, the receiver sums the secret key elements corresponding to the pattern at this position, elevates G tilde to this power, and applies some randomness s. We see that t prime is built as a mirror element to the product of the corresponding ciphertext. If we mix the randomness elements, we can test the presence of the pattern at this position. As we encrypt with the granularity of one symbol, we can test patterns of any length under n, even with wild cards inside the pattern. One test at one position uses only two pairing evaluations, whatever length the pattern is. But the message cannot be arbitrary long. And if we try to handle streams by taking a very large value of n, then we get a very long public key and many possible offsets for the trapdoors. Now we look at the existing schemes with these ideas in mind. The azurecrypt18 paper shows us that it is interesting not to choose our secret encoding completely at random. Instead, this paper uses one encoding for the symbols and uses powers of some secret value z for position randomness. So this encoding consists only in monomials in z. Let's see how this changes the simple scheme. If we take a look at the position trapdoors, we see the exponents are polynomials that only differ by a factor z. So if we could multiply the exponents by power of z, we would only need the first position trapdoor. To do this, we will use pairings. So we add powers of z elements in the public key. We replace the randomness element by additional ciphertext elements. And now we only keep the first position trapdoor. On the right-hand side member of the equation, pairing t' with the right power of z shifts the trapdoor to the correct expression. But there is a forgability issue because the shifted elements share the same randomness and so they can be combined. This forces us to use one randomness element by term in the t' exponent. And this trapdoor now has more randomness elements, ti. The worst part of it is that in order to apply this randomness on the corresponding ciphertext elements, the product symbol has to go out of the pairing, making this test use a linear number of pairings in the length of the pattern instead of just two. Thanks to this shiftable trapdoor mechanism, the size of trapdoors doesn't depend on the length n anymore. But we keep the same other limitations and the cost for testing is higher. The azurecript 20 paper takes advantage of the pattern distribution to change the encryption strategy. The stream is long, but the patterns are short. We have seen that if n is large, so is the size of the public key. And anyway, this upper bound n is incompatible with streams. Instead, it is possible to split the stream into fragments about the size of a pattern. And encrypt each of them. To encrypt each fragment, we can picture that we use our simple scheme. A naive way to split our stream is to make consecutive fragments and encrypt them. If a pattern is contained in one fragment, then we can test it. But if the pattern straddles two fragments, we cannot. A good solution is to encrypt the message twice with some offset. Now, if the pattern is there, we can test it with the encryption above it. And if it's there or there, we can test it with the fragment below it. We see that the fragment can be about the size of the patterns, and that we can encrypt arbitrary long streams with this strategy. Also, a position trapdoor can be used to perform a test at its relative position in any fragment. So we don't need shiftable trapdoors anymore. This way, the azia-crypt 20 paper manages to remove at once all the limitations of the encryption of streams. Here, L is the maximum pattern length. But the public key depends on the size of the alphabet, which is 2 to the 8th in the IDS applications. And the security still rely on the same strong assumption. Now, let's present our contributions. The encryption used for fragments in the original paper was actually based on the shiftable trapdoor scheme. But as the shifting mechanism is now useless and costs many pairings, they break it by using even more sophisticated expressions. This sophistication becomes a real problem when it comes to develop additional properties like a better security. In the intuition of our paper, we show that the simple scheme that we have seen can advantageously replace the scheme they use to encrypt fragments. We get the same properties, but with a simpler scheme that uses half the number of ciphertext elements. We can instantiate the secret encoding alpha to have a public key size that doesn't depend on the size of the alphabet. Here, we assume that the alphabet is some publicly known subset of the integer's modular p. This is always possible, as p is very large. And we only use secret randomness xi and yi to encode the position. Alpha now takes the form of one random linear function of the symbol at each position in the fragment. Note that the symbol is used directly instead of some secret encoding. We then show that this choice of alpha gives a secure scheme under the same interactive assumption as the existing solutions. Again, we have the same scheme structure, but where the sender computes the ciphertext elements by using the symbols directly. Everything else works as in the simple scheme. Now, before showing how our second scheme achieves a better security assumption, let's see an intuition of the security definition for all these schemes. It is the usual selective in CPA security, but where the adversary has extra power. He can query trapdoors with minimal restriction. For example, if the two committed challenge messages are killed by a blushing crow and killed by a crushing blow, then the pattern blush is an invalid query, because it appears in only one of the two messages. And therefore, a trapdoor could be used to distinguish which one is encrypted. The pattern led by is a valid query because it matches both messages at the same position. This implies that the ciphertext cannot be given at random for positions where the messages are identical. And this is the main reason why adaptivity has not yet been achieved, even in our second scheme. Finally, the pattern blushed is a valid query because it doesn't appear in any of the messages. In the security proof, the simulator has to give some public key element for the letter B. And this simulator must be able to produce a trapdoor for the pattern blushed that contains the mirror element of the public key one. We see that a classical DfL man assumption doesn't give elements in both groups. Therefore, in our second scheme, we use exdh, which is a natural extension of the DfL man assumption. This extension is static, simple, and it is used in existing eCache schemes. To use all the elements of this assumption, we need more elements in our scheme. So we add some extra elements zi in the first scheme, we add them to the public key, and they are used to make additional ciphertext elements. Now the trapdoor can be randomized with exactly two values, and the expression of t' will have enough flexibility, yet simplicity, to show the security of this scheme based on exdh. This table is a summary of the improvements that we've achieved. Compared to the existing schemes, our first scheme uses half the number of ciphertext elements, and its public key is smaller by a 2 to the 4th factorial, and it is the best solution if one favors efficiency. The security achieved for the first time by our second scheme comes at a minor price in complexity that keep him very efficient in regard of the existing schemes. Thank you for your attention, and do not hesitate to send me an email if you have any questions.