 Hello everyone, my name is Enis Kekriya, in this video I am going to talk about our last results on pattern matching on encrypted data. This is joint work with Nora Kupans and Frédéric Kupans from Polytechnique Montreal. And to end encryption is becoming increasingly widespread on computer and smartphone devices and applications. It ensures the privacy of involved parties as well as the confidentiality of their However, the standard encryption techniques used to implement end-to-end encryption like for example TLS, do not allow any processing of the encrypted data which impact several use cases. For example, end-to-end encryption makes not possible to enforce pattern-to-control rules or to perform intrusion detection. The lack of visibility on the content of the extension information can be also exploited by adversaries to bypass traditional security tools. So our challenge here is how to perform pattern detection on end-to-end encrypted traffic. So let's consider a system composed of four entities, a sender, a receiver, a security pattern provider and a pattern matching service provider. We can't think of the receiver as a company employee and the sender as an interested server, the employee is downloading data from the server using an end-to-end encryption. The intrusion detection service provider is a third-party entity mandated by the company to perform intrusion detection on incoming traffic and the pattern provider represents the security editors that are providing attack signature. In this use case, SP and PPP are considered to be honest but curious entities while the sender is considered to be an interested entity as it may intentionally include malicious content inside the data. So in this work, we are interested in a solution that is first, security-aware in the sense that it should allow pattern matching on encrypted data, second, market-compliant meaning that except the pattern provider, detection patterns should be indistinguishable to all other involved parties. In addition, the pattern provided by the pattern matching service provider should be useless to analyze other third-party encrypted data, otherwise the pattern matching service provider can create value by selling them to other security service providers. The solution should also be privacy-preserving meaning that except the sender and the receiver, encrypted data should be indistinguishable to all other parties. Finally, the solution should be more practical compared to existing ones. Several techniques can be used to perform pattern matching on encrypted data. Generic solutions such as fully homomorphic encryption and functional encryption can be used to perform pattern matching. Still, these techniques are unpractical because of the high cost of the computation they require. Multi-party computation can be also used to perform interactive computation on encrypted data. However, performing pattern matching using multi-party computation requires a high number of interactions that is, in the worst case, linear to the size of the data, which results in a very high communication costs. Blind Box and Blind IDS are two solutions that have been proposed for signature-based and traditional detection on encrypted data. The idea consists of splitting the data to be encrypted into a fragment of the same size. Then they use the construction called decryptable searchable encryption to check the presence of attack signature in these fragments. Unfortunately, Blind Box and Blind IDS are useful only if all patterns have the same size. These, however, not the case in practice. For example, the detection patterns used by this node tool have around 900 different sizes. In addition, these solutions may cause false negatives since even if the patterns are all of the same size, they cannot be detected if they straddle two fragments. Predicate encryption can be also adapted to allow pattern matching on encrypted data. At a very high level, the idea consists of considering each symbol in a detection pattern as an attribute of a predicate. Then to allow finding the patterns anywhere in the encrypted data, a predicate will be assigned to each pattern and to each possible position in the data to be analyzed. So we will end up requiring a secret key of the size linear to the size of traffic to be analyzed. Searchable encryption with siftable trapdoor has turned out to be the most interesting solution for pattern matching or encrypted data. It is based on an asymmetric construction that supports searching for arbitrary patterns at any offset of the data while ensuring traffic and distinguishability to the pattern matching service provider. Still, this solution cannot fully protect the detection patterns. In addition, it requires a public key of the size linear to the size of the traffic to be analyzed. So concretely, if we want to exchange one gigabyte of data, we will need a public key of eight terabytes. Compared to the previous solutions, our scheme is the first that allow correct arbitrary pattern matching and ensure both data and pattern privacy without requiring a very large number of trapdoors, which is the case of predicate encryption, or very large keys, which is the case of SEST. So let us move forward and see the security model we consider. Data and distinguishability requires that it's not feasible for the service provider or the pattern provider to learn any information about the traffic other than the presence or the absence of specific patterns. We model this property as a game between an adversary and a challenger. So the adversary starts by getting the encryption and trapdoor generation key. Then he is allowed to adaptively perform a polynomial number of trapdoor generation queries for patterns of his choice. Once he decides that this query phase is over, the adversary chooses two traffic that do not contain the previously queried patterns and send them to the challenger who chooses one of them, encrypts it and send it back to the adversary. The adversary wins the game if he managed to figure out the traffic that has been encrypted by the challenger. In addition, we consider the pattern and distinguishability property, meaning that it's not feasible for the SP or the receiver to learn any information about the detection patterns. We will consider the case of a public key encryption in which the adversary can encrypt any traffic of his choice, which allows him to mount brute force attack on the detection patterns. Unfortunately, in this case, no pattern matching based solution over either plain text or public key encryption ciphertext can resist such an attack. Therefore, this brute force attack should not be considered in our security model. One solution to dismiss this type of attack is to consider an adversary having high mean entropy, which is usually modeled using two non-colluding entities, AF and AG. As usual, the entity AG of the adversary starts by getting the encryption and the trapdoor generation keys and adaptively perform a polynomial number of trapdoor generation queries for patterns of his choice. Then the same entity chooses two trapdoors that were not used in the previous trapdoor generation queries and sends them to the challenger who chooses one of them, issues a trapdoor for the chosen pattern and sends the trapdoor to the entity AF of the adversary. The adversary wins the game if the entity AF manages to figure out the pattern that has been chosen by the challenger. All right, let's see now our construction more in details. The intuition behind our solution relies on the fact that the sizes of the attack's pattern are very often much smaller than the sizes of the exchanger data. For example, the largest detection pattern used by this Nord tool contains only 52 bytes. So the idea is then to fragment the data into a set of small segments of the same size and encrypt each of those segments separately. Now instead of requiring keys and trapdoors of size linear to the size of the data, which is the case of the most existing solutions, this fragmentation technique will enable us to construct a scheme that requires keys and trapdoors of the size only linear to the size of data fragment. And to allow detecting the patterns that may straddles two fragments, for each two neighboring fragments, we build a third one which will be composed of the last elements of the left side fragment and the first element of the right side one, with L representing the largest pattern size that can be searched on the data. Concretely, if you want to be able to search all detection patterns that are used by this Nord tool, L should be greater or equal to 52 bytes. Our construction is based on a symmetric linear groups, roughly speaking, we consider three cyclic groups G1, G2, and Gt of a large prime order P and an efficiently computable mapping function E that maps two elements of G1 and G2 to an element of Gt in such a way that the exponents multiply. The security of our construction holds as far as there is no efficiently computable isomorphism between G1 and G2. And these groups can easily be associated using elliptic curves. Let us now dive a little bit into the details of our construction. So let's start by key generation. It is performed by the receiver who randomly chooses a number of large scholars. Some of them will be used to encode the different symbols used to represent the data. If you consider the data as a bit string, we will encode each of the bits 0 and 1 using two random scholars, alpha and alpha prime. These random scholars will represent the secret key of our construction. Using this secret key, the receiver generates a public key composed of a set of couples of elements of the cyclic group G1. The left side elements of these couples can be seen as a public basis where i represents the different offsets in the data fragment, while the right side elements represent the projection of the different symbols of the alphabet used to represent the data on the basis. Finally, the receiver computes the trapdoor generation key KAT and sends it to the pattern provider. The elements of KT are actually randomized projections of the alphabet symbols on the basis. The randomization here is used to prevent the pattern provider from using the elements of the key KAT to learn information about the encrypted traffic. To encrypt the traffic, the sender starts by generating for each fragment a large random scalar that will be used to randomize the encryption of each data fragment, which of course allows to prevent frequency attacks on the encrypted data. Then each symbol of the data will be encrypted separately. For that, the sender encodes each symbol into two elements of the group G1. These elements are retrieved from the public key where i represents the position of the symbol inside the fragment, and sigma represents the symbol to be encrypted. Then the two retrieved elements are randomized using this scalar A1. And we do the same for encrypting other bits. For bits that belongs to two data fragments, they will be encrypted into four elements of G1. Each couple of elements representing the encryption of the bit regarding its position in the corresponding data fragment. So the elements colored in blue represent the encryption of the bit regarding its position in the fragment F1, while the elements in red represent the encryption of the bit regarding its position in the fragment F1 overlined. So let's move now to see how the trapdoors will be generated by the pattern provider. Let us suppose that we want to generate a trapdoor for the binary pattern 010, and that the following scalar presents the different offsets in a data fragment. Using the key KT for each possible offset, the pattern provider will compute two elements of the group G2 as following. He will get three elements from KT such that sigma is the value of the bit of the pattern, I is the offset of the symbol in the data fragment, and G is the offset of the pattern in the data fragment. The retrieved three elements are then aggregated and randomized to prevent trapdoor forgeries. The computed values V0 and the random value parted at zero are then added to the trapdoor as the elements allowing to search the pattern 010 on the first position of a data fragment. We repeat the same steps to compute and add to the trapdoor the elements of G2 that will be used to check the presence of the pattern at every possible position of a data fragment. At the end, the trapdoor will contain two times the number of symbols in a data fragment, elements of G2. All right, now we have all the ingredients to understand how the pattern matching is performed by the service provider. Let us suppose that we have the following binary traffic and the service provider want to look for the pattern 010. So for that, he will check the presence of the pattern in all offsets starting from the offset zero. So the service provider will get the encryption of the first three symbols of the traffic together with the elements of the pattern trapdoor V0 and var theta zero that allow to check the presence of the pattern at offset zero. Then he computes the pairing of the product of C1, C2 and C3 and G tied to the power of var theta zero and check if the result is equal to the pairing of C prime zero V0. If the equation does not hold, which is the case in our example, the service provider deduced that the pattern is not present in the offset zero with overhelming probability. Otherwise, if the equation holds, which for example is the case for the offset one, then the service provider will be sure that the pattern exists in the checked offset. I want to emphasize here that regardless of the results of this equation, the values of the two sides of the equation will be random looking elements of GT, which will be useless to the service provider for learning any additional information about the plain text or the search pattern. Now let's jump to an offset in which the pattern straddles two blue fragments. The idea here is to check the presence of the pattern in the red fragments. So use the encrypted elements regarding the offset of the pattern inside the red fragments and we check again the equality of the two pairings. All right, let's move on now to see the obtained results. From the security point of view, our construction provides correct detection of arbitrary patterns. Correctness here means that our construction returns no false negative and that the probability that it returns a false positive is negligible. Our construction ensures traffic and distinguishability to both the pattern provider and the service provider. This means that these two entities will learn nothing about the encrypted data except the presence or the absence of the detection patterns. In addition, our construction ensures pattern and distinguishability to both the service provider and the receiver and this for detection patterns of high mean entropy. For evaluation results, our construction requires a secret key composed of two times the size of the considered alphabet plus two elements of G1. So concretely, if you work with byte strings, the size of KS will be around 17 kilobytes when using the BN elliptic curve. The public key is composed of four times the size of the largest pattern that should be matched. Concretely, if you want to match the detection patterns used by the Snor tool, we will require public key of only seven kilobytes. In addition, our construction requires a trapdoor generation key composed of two times the square of the largest pattern size multiplied by the size of the considered alphabet, elements of the group G2. Concretely, issuing trapdoors for Snor detection patterns requires a key KT of the size around one megabytes. Moreover, our construction requires trapdoor that are composed of two times the size of the largest pattern, elements of the group G2. Concretely, the total size of the trapdoors allowing to search all Snor detection patterns is around 19 megabytes. Our construction requires a safer text that is 128 times the size of the plain text. This is due mainly to the fact that we are encrypting each symbol of the plain text separately. As for encryption complexity, the encryption of each symbol requires four expanciations in the group G1. Concretely, our construction can achieve an encryption throughput of 25 kilobytes. For pattern detection, we are only able to achieve a throughput of 11 kilobytes per second. So despite this quite low detection throughput, our construction is up to four orders of magnitude faster than the most efficient state of the art solution. Okay, this brings me to the conclusion of this talk. So we proposed the first solution for arbitrary pattern matching or encrypted data that provides correct detection and privacy protection for both the encrypted data and the analysis patterns without requiring cryptographic keys with unpractical sizes. Our solution drastically improves the detection performance compared to similar existing approaches. However, the achieved performance are still not adapted for real-time usage. As a future work, we are working to enhance our construction to achieve better efficiency and to support regular express and search on encrypted data. Thank you for your attention and I hope you enjoy this talk.