 For the second talk, I'm going to talk about text search protocols with simulation-based security. This is a joint work with Sarah Jenara and Jeff Sorenson. The classic text search problem or pattern matching, as we call it, is basically we are given a text. You can think of it as a large database and a small pattern or a keyword search. The goal is to find all the text location where the pattern appears in the text or in the database. When you consider a secure version of this problem, as you can imagine, one party is given the database and the other party basically tries to search the database of the other party. And it is not supposed to learn anything but the text locations. This problem has many, many, many applications also, in particular, when we want to compare sensitive data. For instance, if you want to compare DNA strings, which is highly sensitive because it gives information, medical information that you don't want to reveal, you can think even when you search Google. Google uses the information of your keywords in order to determine the ads that she wants to place in the web page that you see for other matters. So even in this case, you can think of some privacy notion that you would like to or hope to achieve in this case. So there aren't many works for this problem. One particular paper is a paper by Tron Cosso, Kallsen-Bachler and Selleck, who basically showed how to solve this problem in the semi-honest setting using Oblivious Automata Evaluation, which I'm going to define a bit later. And another paper by myself and Yuda Lindel, we showed how to solve this problem with a limited notion of security with one-side simulation, which I'm going to define soon. So we used the same idea from the previous talk on Oblivious PRF, but it's definitely not clear how to extend the solution for the malicious setting. So I'm not going to talk, I already talked about our setting and model in the previous talk. This is the same setting. We are talking about malicious adversary whose behavior is arbitrary and is computationally bounded, and our proof shows how to achieve full simulation. Just a quick overview, I already said what's a semi-honest adversary, which means that we do not assume anything. We assume that you follow the protocol specification, but it tries to examine the messages in order to learn some additional information that you should not learn in an honest execution. Just one thing regarding one-sided simulatable. This means that we don't know how to simulate, to construct simulators in a sense of ideal real paradigm notion that I talked before for both corruption cases, only for one case, and for the other case, which in this context, I'm considering parties that do not receive output, we only know how to guarantee privacy, meaning that this corrupted party does not learn anything about the other party's input, but we don't know how to guarantee correctness or other properties. So this is a much more restricted notion of security. Just to get you into our protocol, I'm going to present another much simpler protocol that does not work for the malicious case. So what can we do here? Bob can build a matrix of size 2 of m as follows. Recall that m is the size of the keyword search, the pattern. So basically, encrypt bits of 0s and 1s in the following sense. If the bits that you observe equals the bit 0 that he looks at is sent to encryption of 0, otherwise he sends the encryption of 1. Just using a showing example would be much simpler. You can think of the first line correspond to 0 and the second line correspond to 1. So basically, in the first line, you would send 0, 0, 1, which is exactly the pattern. You can think of sending the pattern and the complement of the pattern. So how does it give us a solution, a secure solution? So this is a scary formula, just using an example would be much more simpler. This is the text and Alice now observes the first three bits and she goes to the table that she received from Bob and just takes the value that corresponds to the bit in the text. So she takes 0, 0, which corresponds to the first two bits in the text and then 1. Okay, this is the second line, which corresponds to the third bits. And now she adds them together and multiplies by some masking and this stays equal to 0. And when Bob decrypted, he learns that there is a match in the first location. If you see when we consider the second text location, we will not get a match because basically it will add up to 1. So this is very simple protocol just to illustrate it. However, it works for the semi-honest and even for the one-sided simulatable, but we definitely don't know how to extend it for the malicious because we don't know how to prove that Alice indeed conducted her computations according to some well-defined text. This is even much more simple than the protocol I told you in the prior work of myself and Yuda Linda that also achieved the same notion of security. So how do we switch to the malicious setting? We need a different technique. So we basically took the KMP algorithm, which I'm going to explain and showed how to evaluate it securely. So the KMP algorithm takes the pattern and transforms it into an automata. So what is an automata? A quick definition. It's a table of five elements. We have Q, which is the set of states, which includes the initial states where the computation starts from and a set of accepting states. We have the alphabet, which in this talk is going to be binary, although all the results or the protocols can be extended for the arbitrary case. And we have delta, the transition function, which basically have current states and alphabet character and gives me the next state which I need to go to. This is just the figure. You can see that Q0 is the initial state. And if I see one, then I go to Q3 and Q3 is an accepting state. So the automata accepts one. If I see 0, 0, also I go to Q2, which also accepted by the automata. But 0 is not accepted by the automata, for instance. So considering this transformity into a secure version, you can already guess we have two parties where one party holds the decryption of the automata and the other party has the input. And the goal is to learn, or the goal for Bob, is to learn whether the automata accepts or rejects this string, not even learn the states that the automata ended in, only whether it accepts or rejects. So these are, again, the security requirements. Alice should not learn anything about Bob's input to the automata and Bob definitely should not learn anything about the evaluation on X. So this was also considered in the past. So there is the paper by Ishae and Paskin who show how to obliviously evaluate any branching programs where automata is a special case in this model of computation. But they considered a different model, where in their model they wanted the communication to be restricted to the amount that is proportional to the size of the input, to the size of X, independent of the description of the automata. And therefore, they can only talk about privacy or security for semi-honest. Our results in this context is to show how to compute or how to evaluate automata in the presence of malicious adversaries with full simulation. Just a few words on motivations. Although our initial motivation was how to compute or to solve the text search problem, this functionality can be useful in other applications. For instance, it is interesting to see whether the solution can be generalized for stronger models of computations. As I mentioned, branching programs in other models. And the automata itself can be used. We think that we can use it to verify correctness of programs. These are the motivations. And let's see the KMP techniques. So this is a beautiful algorithm from 77 that solves the pattern matching. This is the most famous, I think, the most famous algorithm in pattern matching. So basically here, the algorithm takes the pattern and transforms it into an automata in the following way. So for instance, if the pattern is 0011, then we have a single path that leads to an accepting state. This is the 0011. Now, every time that we basically encounter a mismatch, let's see how to construct the automata using an example. So for instance, let's assume that this is the text. And we started to go over the automata to evaluate the automata. And we had 001, so far so good. And now we encountered a mismatch. We saw 0 instead of 1. So essentially, you will think that the automata will bring me back to the initial state. But this is not the case here. This is how the automata saves and evaluate the entire text in linear time. Because it takes me not to the initial state, but he remembers that he saw 0. And he practically thinks, where's the best place I can take you? Think of it that he tried to find the largest prefix of the pattern that matches the suffix of what he saw so far. So he already saw 0. And he knows he remembers that the pattern starts with a 0. So there is no sense to go over 0 again. So basically, it goes to the place where he already remembers that he saw 0. So you can think of it. So these are the other edges that gives me the automata. Basically, this is a simple keyword. So all the zeros take me to the same place. And the one takes me to the beginning. But essentially, if you want to say how to compute this automata, this is very simple, using all of m squared computation, by essentially comparing the pattern against itself for every shift. So for every shift, I look for the largest substring that match. So if I was starting at the beginning in the first location, and now I encountered a mismatch as before, so a 1 instead of 0, basically, I compute what would be already the largest prefix of the pattern that I already saw, assuming that I started in the second location. And this was the 0, basically. So computing the automata will be very efficient, assuming that the keyword is very small. So if you want to consider a secure computation of this functionality, the i-level ID would be OK. So one party has the automata that descriptive the pattern. It can construct the KMP automata. It can encrypt it and send it to Alice. But this does not work entirely because the parties should not learn any intermediate computations along the way. So at one shot, the party cannot evaluate the automata. But we are not allowed to reveal any intermediate information. So the solution for the semi-honest protocol that I talked before is very nice and very simple. Basically, they mask the automata, or the party that Alice, who has the automata, has the masking. And she basically shifts the automata, the state, the transition function in the automata according to the masking. And now the party is just engaging in oblivious transfer execution where Bob learns the next state. She doesn't even need to send the encryption of the automata. This doesn't work for the malicious. And at least efficiently. And in order to deal with malicious adversary, we need to do something else. So basically, in every iteration, the parties, or at least the parties, Bob would learn the encryption of the evaluation of some partial substring of the pattern. So in iteration in the IC iteration, he knows the evaluation on PI, on P1 through PI. And in the next stage would be to select. So I'm going to show an example. But just to high level, we have the transition matrix. He can select from this transition the column that corresponds to the next bit in the pattern. And then the parties should obliviously evaluate the next state. So how basically do we do it? OK, so again, think of the transition table as a table with three columns. So the first column is the current state. Remember, I'm in a specific state. And I see either 0 or 1. So one column corresponds to 0, and the other column corresponds to 1. So of course, the table is encrypted using some amorphous encryption. And upon reaching the I iteration, we already said that Bob knows the evaluation on P1 through PI minus 1. Now the parties want to evaluate the following bit. So let's assume that we have the column. Bob picks the column for the corresponding. So he knows the bit PI, right? It either 0 or 1. So let's assume that C prime is the column for this bit. Now there is the current state. And he has the encryption of the current evaluation. So he basically uses the amorphous encryption to subtract the current state from the encrypted current state that he has. And this gives me two columns. So he has the current state columns, which is subtracted the partial evaluation that he had so far. And the columns correspond to the next bit. Now he has to permute these two columns. He has to use the same permutation for these two columns and to send it to Alice, which mask it and decrypt it for Bob. This essentially gives Bob the encryption. I didn't say all the details. But this essentially gives Bob the evaluation for the next bit. And how do we use it for pattern matching or text search? So if you saw so far, the number of rounds is proportional to the size of the string that we need to evaluate over the automata, which are bad news in our case because the database is evaluated on the automata. It could be very large. So we use this trick to break the text into substrings of length 2 of m. And evaluate each substring independently and in parallel. So if I can show you an example, you take the text and you break it into substring that overlap with m bits because there could be a match exactly in the overlap. And now we evaluate the automata on every substring of length 2 of m in parallel. And this gives me total work of n time m. But the round complexity now decrease to of m. So I know this round complexity is not constant, as we would hope for. But in the context of text search, these are not bad news because usually the keyword or the pattern is very short. Think of words that you search in search engines. And just for summarizing, just from the area of pattern matching, there are many, many, many interesting and motivated problems in the context of privacy and secure complication that are still waiting to be solved. And just one assumption or conjecture that I have is that if you are familiar with suffix trees or suffix array and you're familiar with the algorithms that are used there, my conjecture is that these cannot be used in the context of secure computation because of intermediate computations they had there. And so this is one direction. And another direction is to use or extend the solution for the oblivious automata evaluation to push it further either to other models of computations or extend it in a sense. And instead of using the comparison, right? So for now, the edges, we only compare the edges. We only compare whether we have 0 or 1. But we can extend it. You can think of the edges as some function, maybe linear or maybe nonlinear function. And perhaps we can use it to verify correctness of programs or other applications. I don't know. I'm sure there's many interesting questions in this area. And that's it. Question? Thank you.