 It's actually nice to encrypt your data, as you may probably know. And it's even nicer if you can store it online. But the only drawback is that the cloud storage provider actually can't do search operations on the encrypted data. The next talk actually presents some solutions to this problem. Tobias Müller and Christian Forle present different approaches how others can search through encrypted data without actually knowing the key nor the plain text. So please welcome warmly Christian Forle and Tobias Müller. Perfect. Thank you very much. And thanks for being here at day four after these many parties and having partied hard. And I'm surprised to see so many people here. And by that, I mean, I'm surprised that I can actually see so many people after yesterday's night. And I'm glad that I've made it up here in this very morning. It's correct. And we are concerned with the problem of encrypting data in such a way that you can still search over these ciphertext. And over the course of this talk, we will present some well schemes or solution to that problem, which will hopefully inspire you to, well, firstly, demand these services from your, say, cloud provider. And secondly, if you're inclined to do the programming, then to hack on these schemes and implement them and make them practical. So I'm Toby, this Christian. And over the course of the next 45 minutes, well, we'll talk about this. And hopefully, we'll leave some room for Q&A. We also intend to, before I forget that, we intend to hang around at the bar just outside this lecture hall after this talk in case you want to chat about these techniques or encryption or just buy us a beer. Well, maybe not at 12, but later. So this is our agenda. We want to talk a little bit about how we see the current world or the world around the cloud is organized and where your position as the customer is in that architecture. Then we present a couple of schemes which allow you to perform searching over, well, your ciphertext, which you have uploaded to the cloud. And then we'll wrap up. So let's talk a little bit about the cloud. So let me ask you, how many of you use the cloud, any external third party provider to upload, well, data, whatever it is? That's many. That's probably the majority. And that's how we actually imagine this world right now to be that many people use external third party, services to upload to hold their data, well, for the customers to avail of these data, well, at any point in time with any device they have. So I, for myself, I've went with Stan, trustworthy guy. I can upload my data. The first couple of gigabytes are for free. And he promises me that I can, at any point in time, download and access my data. And well, there's many of those, right? Must not be him. There's many companies offering cloud services. And they not only allow you to store your files, you can also upload your contacts or your calendars, your messages, your files. And all of them probably promise you that they will not be malicious or go through your data and run analyses or even sell your user profile. Except, well, a couple of providers actually do so. And they do tell you straight in your face that they will mine your data to present you, for example, better ads, right? So they go through your email, they look at your keywords, and they will determine what ads might be best for you, what you are most inclined to click at. So anyway, you can say, well, I'm paying money, right? I'm using this premium provider for my email, my calendars, my data. So their incentive to cheat on me is not that big. Well, you might be right, except if they could make an extra profit, they might possibly do so. And also, we are being cryptographers. We don't necessarily want the guarantee that they don't look through your data on a piece of paper. We want to have, well, a rather mathematical, say, guarantee or proof. Or we want the provider to not be actually able technically to go through your data, not only, well, the privacy statement from this provider should guarantee that they don't look through your data. So we are looking for cryptographic solutions to that problem of, well, uploading your data, but still enable you to execute operations on that encrypted data. So the problem, as I've said, is that these providers go through your data and they extract information about you, about your usage, about your behavior. And it turns out that mining plaintext data is actually not that hard. So the data mining being performed is, well, relatively easy because it's plaintext, and you can just run your analysis over the plaintext. And we think that we are on the wrong track. And we think we, as a community, as the cryptographic or the hackers community, should actively work towards the goal of providers not being able to look through your data and to not, well, create profiles about you and to not be able to predict what you're doing next. So we will hopefully, well, thank you. So we asked the question, why do you know encryption? Why do you not do put some encryption on your data before you upload it? Because the encryption will save you or will save us from the dark side, which performs all these analyses. In our scenario that we have in mind right now when presenting these slides, in our scenario, the user has some data locally. And the user uploads their data to a third party and then forgets about the data locally because you have uploaded your massive movie library and then you're going with your mobile online and you don't necessarily know about the full data that you have online. Yet at a later point in time, you want, well, to perform search, right? You want to know which files do I have, which contacts do I have without having necessarily seen that data previously on that device that you want to access your data with. And what we will do is we will present certain schemes that allow you to perform these operations. But these, well, schemes or these technologies are not necessarily plug and play, right? It's not that you could drop in, say, the scheme and then you're done. You need to, you know, say, manage keys and look at the rough edges of cryptographic schemes. We're in the science track, by the way, right? So we are taking academic work and we're trying to make it practical. And academic work, well, sometimes it's complicated, say, to apply it for real life. So just keep that in mind when listening to these slides. Yes, as I've said, we are trying to point you into a direction. We're trying to give you some inspiration as to what to look for when, well, either implementing encrypted search or when demanding encrypted search from your cloud provider. So let me start by a very simple scheme, a very simple encryption scheme that allows you to perform encrypted search. As an engineer, I'm trying to think about the most minimal solution, the most easy solution first and then try to build it up to make it better and better. So my engineering approach to that very simple encryption scheme is to just simply encrypt all the things. Sounds simple, right? It actually is. In this scheme, we have our plain text data, and we simply encrypt each and every entry of our database with a secure encryption scheme, and we upload that data to the third party. Simple, right? We're done. Any questions? So when you want to perform your operation, what you need to do is, well, you need to download all the things, right? So you need to ask the database on the internet, on the cloud to give you all the entries. Then you need to decrypt locally, and, well, then you have the plain text, and then you can perform whatever operation you want. In this scheme, well, it's very simple, right? It's probably not what we want. Why? Because, well, if you're on your mobile, you don't want to download your three terabyte movies and you don't want to find a file. So this is not necessarily, well, a good scheme for us. So we are trying to build up, well, solutions which perform better in that regard. However, I just want to point out that this, as far as I'm aware, is being implemented in commercial products that you could buy. So as far as I'm aware, proxy products that sell as an appliance that you put between you and say Gmail, very as far as I'm aware, they will do exactly that. So the question now, as the engineer that I am, is can we do better? And that question, I ask Christian. Christian, can we do better? Yes, we can do better. Yes, we can. Sure, we can. OK, let's do it. One, that the cloud has performed this much for us. Because this looks like cloud service and I'm not willing to perform the search on my computer, especially when I have big data stuff on the cloud. And searching on encrypted data sounds a little bit like magic. But in fact, it's super simple. I show you the next scheme, a super simple scheme, how to perform a search on encrypted data in the cloud just by using a deterministic encryption scheme. OK, the deterministic encryption means same plaintext, same ciphertext. So we have two identical plaintext. Then we will have two identical ciphertext. Super easy. We can use such a scheme, AES-based, to encrypt all our keywords one by one. And then we have a ciphertext collection. OK, that's fine. Now, it's not working. It's broken. Oh, maybe. Sorry about that. Oh, that was very bad. Well, hang on. So shall I go through this again? Did anybody not understand anything? So sorry about that. We are here, right? Oh, that's the wrong button. It's not this one. OK, sorry about that. I'm better with crypto than with computers. OK, I will... OK, that's a two-button interface, right? No, that's OK. I know. Hey, it's a... Well, we should have practiced that beforehand. Yeah, yeah, it's a... I don't think this computer thing will ever, you know, gonna succeed if it's too complicated. Yeah, that's the right button, yeah. Oh, jeez, yeah. We are hacked. OK, oh, yeah, I've received a crash report. All right, that's handy. There we are, lovely. Please stop performing the hacks right now, eh? OK, that's terrible. But we're gonna do it anyway. Skip the slide, yeah. No, don't press the button, right? OK, OK. Maybe I press the button. Oh, are you serious? Next? Yeah, obviously, next. Right, it's easy to say, but... Seriously, it crashes on that very slide. That's terrible. OK, can you tell a joke, meanwhile? Let me just have this finishing the pre-rendering. No, it's not that we would have never, you know, gone through the slides with that very program. It's, of course, a random problem right now. OK. Ah, there we go. OK, now it's no... No, it's locking, yeah, OK. Right, I'm not skipping through the slide, I'm opening it directly. No, I'm not. OK, I have a backup plan. Don't worry, that's all right, because as an end... Oh, OK, right. There we are. Right, switching to the latex source would probably be better anyway, right? There we are. Whoo! Finally. OK, deterministic encryption. Once again, if you have same plain text, means same ciphertext, so it's quite easy to search, perform a search. On server side, just encrypt your keywords that they're looking for. And then for each ciphertext in your collection, you just check if it's a match or not, and when it's a match, it's fine. Then the keyword is part of your ciphertext collection. That's quite easy. Indeed, a little bit too easy, yeah. OK, let's try it one more time. So it's working, OK. There must be a catch, yeah. And, yeah, indeed. There is a serious problem, because deterministic encryption is not really secure. You cannot consider this as secure, because you all know the example of the penguin, yeah? When you use perform deterministic encryption, it only partially hides the plain text, and so now it's a question, can we do better? Yeah, because this is not what we want, yeah? In the end, we want to hide our data and do it well, very well, OK? And now there is some idea from Songbar and Amperik. They present this idea in 2001, and it works like a charm. You just use this deterministic encryption scheme, and then you know, OK, the result is not really secure, though I have to fix something. And there is a fixing step, and what we do is just the XOR mask. Mask means a random bit string, yeah? If I just XOR a random bit string, and I have something like a one-time pet, and we all know one-time pet works like a charm, it's secure, OK? It can generate the mask by using a stream-side or whatever. It's not a big problem. OK. But then again, we have the problem, when there is queue encryption, how we can perform a search on the encrypted data. OK, and this is about the point where we lose the audience, right? We're using the audience? We're losing the audience, because now it's getting complicated. Yeah, OK. So this means now we have some magic, yeah? And we cannot use any mask at all. We need a magic mask to perform this kind of encryption. OK, and, OK, let's see how we can craft such a mask. OK, we divide the mask in the left side and the right side, and the left side, we are using random bits, yeah? We can generate this random bits once again by using a stream cipher, yeah? A boring stream cipher, but boring and secure stream cipher, a no big deal. Then we perform the deterministic encryption on our keywords, and then we divide this keyword in the left part and the right part, yeah? And from the left part, we derive a search key, key i here. And we can do this by using a key-tash function, key-tash function is one that can use HMAC, yeah? You only know the HMAC, you can perform here. We just here hash for left part of our search query using HMAC, and then we have our search key, yeah? And then with the left side of our mask and the search key, we craft the right side of our mask. We derive the right side, and the right side is ti. The left side of our mask is si, and the right side is ti. And now the trick is, from our deterministic encryption from our interim cipher text, and the left side of the mask, we can compute the right side of the mask. And this is handy, okay? And now we exploit this, yeah, to perform search of encrypted data. So the basic idea, once again, we can compute the right side of our mask from the left side. So this is the basic idea. And now let's work with our magic mask. Let's do some magic. Okay. First of all, we have to upload our stuff. Each key word, or each plain text, we have to first perform deterministic encryption. Then we have to exhort the result with our magic mask, and then we upload it to the cloud, to a server. So far so good. And now the magic starts. We want to search at some point, at time. We want to perform a search query. And then this means for a search, a search query consists of the deterministic encryption of our key word and the search key. So each time when we perform a search query, we upload the interim cipher text and the search key. Okay, and then the cloud can test for each cipher text from our cipher text collection if the XOR with our deterministic cipher text is a magic mask. And if so, if this turns out to be a magic mask, we have a hit, we have a match. And now we can figure out if this is part of our cipher text collection or not. So now this enable us to search over encrypted data. That works beautiful and it's fine. Although the talk is done, not really, there is a problem. Every blessing comes with a curse. And we have here the two curses, the first curse. No, the scheme is not super secure. It's vulnerable to statistic analysis. What does it mean? Okay, if I know, maybe my target, my victim, is someone from the Wikipedia community and the stuff this my target is uploading on my victim, maybe it's connected with the Wikipedia. And then I can make some estimation about the search pattern. I can make some guesses while guesses. And this means here, I guess the frequency of a search. Yeah. And then here, large font means high frequency and small size means low frequency. And there I can make estimation. And then what, next, I monitor the user behavior. Yeah. And this means I monitor the search queries. Yeah. And then after a while, I can make a quick compare my guess with what I've monitored. And then here, see, oh, this guy looked a lot for 0x94 and so on. E7, blah, blah, blah. Oh, that's a good chance that this is searched for. This might be the ciphertext for Wikipedia, or Foundation One, or Foundation, or so on. Yeah. And my target only looked twice for the small ciphertext, 0x, d8, v9, so on. So this must be one of the smaller stuff here on the right side, like I've got a lease, license, or whatever. And this is how I can partially reconstruct the ciphertext. So yeah, this is a problem. If you perform a lot of search queries, you can decrypt parts of the ciphertext. And yeah, this can become a problem. OK. And then there is the next problem, speed. OK. This is, we've used symmetric encryption. It's not so bad than using full homomorphic encryption. OK. First of all, I implemented the stuff. And this is some performance benchmarks. And first of all, you see, oh, the ciphertext is about six times larger than the plaintext. What happened? I patted each word to 32 bytes before I encrypted it. So this patting is crucial, because when you encrypt natural language stuff, words, you can see from the side and then you will reveal a lot of information. Yeah, if I encrypt yes and no. And then I look on the ciphertext, without patting, and I look on the ciphertext, the ciphertext of yes should be longer than the ciphertext of no. And then from the ciphertext, I can learn a lot about the language of the ciphertext. I can learn a lot about plaintext. And this is why I just patted each word to 32 bytes and then performed the encryption. OK. This is the fact. It depends on what you encrypt. OK. And the other thing is time to encrypt. This is quite fast, because I used AS, and with the AS native instruction stuff from Intel, so it's quite fast. And this is only, I met all the stuff on the single core mesh, on the normal, regular notebook, on the single core. It's OK. And the search, yeah, OK, the search might be a problem because you need linear time. Yeah, for each and any word, there's such query, I have to go to each entry of my ciphertext collection and check if it's a match or not. And when I perform big data, or large data, huge data, this means I have to wait a couple of minutes. It's quite good if you like to drink coffee a lot, and you can make coffee breaks, but this is not what you really want, to wait a couple of minutes before you have the search result. OK. What can we do better? Can we optimize the search time? And then what we can do, OK, we can look, what do the guys, the database guys, or the operating system guys, they have a lot of data, and they have to search quick. And they are using indexes, or using an index. And, OK, let's use an index, so the speed up things. OK. Then once again, you can here have the most simple stuff is to have a plaintext index on your client device. Yeah, great. And the plaintext encrypted using, you can encrypt the plaintext using a secure encryption scheme, your favorite secure encryption scheme, just encrypt it and upload it to the cloud. Everything is fine. And your index is on a local device. And nowadays, you have your smartphone, your tablet, your notebook, your PC, your server, your other server, and 50,000 bots. It becomes quite a mess when you try to synchronize stuff, the index on the client side. Some guys know what they're talking about. It will not work. Yeah, OK, next approach. Let's encrypt the index and upload it to the server. OK, you can do it. Just encrypt it as whole with a secure encryption scheme. And then on demand, you can download the index, the up-to-date index, and can perform your encryption on the index. Yeah, well, if you're doing big data again, your index can become a couple of hundred megabytes, maybe. And this makes no fun to download at all, especially when you are at the countryside in Germany, where you have a DSLite or something, it makes no fun at all. So I want to have my index now and not in 10 minutes or 20 minutes or whatever. This is a bad user experience. Yeah, we want to have a nice user experience. OK, then to achieve this, we need a little bit advanced cryptos stuff. OK, here, first of all, we have to generate a special index that fits our purpose. And how we can do this? Therefore, we need a search key and the index key. And for this example, we want to generate an index for the last name. We have user data, and we want to search for last names. And then for each last name, we generate a search key and the index key. And then with a search key, we just hash zero, and this is our search token. So we derive from zero and the search key and then the key hash function, our search token. And then with the index key, we just securely encrypt the raw IDs in this example. Yeah, it's quite fine. And this works. So if we want to perform a search, we just upload, perform a search query by just uploading the search key and the index key for our last name. And then the cloud can perform a lookup in the index and when it gets a hit, it decrypts the index and send us the result. Oh, this is quite impressive. Okay, there is a problem. Yeah, because if you have a very common last name, like Smith or Müller, you will have a lot of... Well, you have a lot of hits, a lot of values that fit. A lot of raw IDs. You have a huge set of raw IDs that match for Smith, people with last name Smith, and therefore the size of the value for Smith is larger than the size of the value for, for example, Fowler, because Fowler is the widely uncommon, so it's an uncommon name. So if you get access to this index, you can, from the size of the value, you can try to estimate the plaintext of the last name on just looking at the value size. And this is a problem. Yeah, we have to hide the size, the number of occurrences, the frequency of the last name. If not, we have a bad time, because we want to be able to lose our index if we want to give our adversary the index without becoming trouble. This is the whole idea. If our assumption is that the adversary has never, ever access to our index, we don't need to encrypt our index. And we assume that our adversary has access to our index, even if he has access, since the index would not reveal any information about our plaintext. Therefore, we have to hide the size. Okay. And there's a cool idea from Keshet Al from last year that published a paper on how to hide the size. And this is to flatten out the index. It's a quite cool idea. Okay, therefore, we have to remember the occurrences of the last name. Okay. We'll start here. And if the first row, we have here, L is foo. This means last name foo. Okay. Let's make an entry for foo. And the occurrences is zero. This means we hash here zero and then encrypt the row ID here one. Okay. Then next, we have Bob foo. Same last name. Last name occurred once before. So now we hash one instead of zero. Okay. And sure, the value is the row ID. And then we have if bar. And then bar. Our occurrences so far none. Then again, we hash just zero under the new search key for bar. Okay. And then we encrypt the value, the row ID. And now we see we encrypt for each and the entry of our index, only one row ID. Yeah. So this, we align the size. This size is always the same. And this means our adversary sees not so much anymore to get access. Just a bunch of random values. Okay. And once again, how this works, in reality, just you encrypt your plain text using your favorite super secure encryption scheme and upload it. And then you generate your index and upload the index. Okay. And now it's, we can, the cloud can now have the capability to search over encrypted data. Yeah. Once again, for each search query, we compute the search key and the index key, upload it to the clouds. And now the clouds can make lookups in the index. First, we start by making lookup for zero. Yeah. We have a hit. Okay. Then let's try one. Hit again. Yeah. Let's try two. No hit. Oh, it's over. Then for our hits, we decrypt just the values and send the results to the client. Yeah. And this works like a charm. So it's, okay. I, once again, I use the same plain text. It's just a King James Bible. Okay. And because I use the King James Bible because a lot of other researchers are doing it and it contains a lot of keywords, a lot of words, about 800,000 words. So it's, it's quite okay. Okay. And here I didn't care about the patting. I use it as a blob, and I re-blob the entire King James Bible. This is why the cybertext is equal length and the plain text. Depending on your scenario, you have to do some patting. So it's most likely that the cybertext is larger than your plain text. That's okay. Okay. But the index size you see here, it's okay-ish. Depending on your, on your plain text, you need about 32 bytes for each entry. Or 16 bytes for the search token and 60 bytes to encrypt the value, the row ID. Yeah. And 60 bytes for each entry might be okay. Depending on your scenario. But the cool thing now is we can perform search in constant time, more or less. So it's, it's, I make a lot of tests and it was all, all the results I got was, my search, the answer of the search query was less than one millisecond. Less than one millisecond is quite good. So it's, it's, it's working. All right. And the speed is fine. And yeah. So this is everything. It's the sunshine and rainbows. And not really. Yeah. Again, we have the problem with our statistic analysis. Because we, the search key and the index key are the same when I have multiple queries for the same last name. So when I've searched 20 times for the same last name, I have 20 times the same search query. And yeah. Okay. There are techniques how I get rid of those, but then the performance, performance breakdown, then you have usually wait a couple of hours or a minute or your ciphertext size explodes. If you want to make it practical, then yeah. Till now you have to live with this statistical analysis. Maybe we get rid of those in the future. Depends. Okay. And now, probably we'll conclude the talk and give an outlook what's going on in the future. Thank you very much, Christian. So that was fascinating, isn't it? Yeah. So the you enable a third party to execute a search operation on encrypted data, although, you know, you've encrypted the data. How could anyone possibly execute any operations? Any operation on that encrypted data? But it's possible. And these were just a couple of schemes that we've presented which were those that we've implemented. And there's many more. And from what we've shown you, these schemes have the problem of the deterministic search token. So whenever you query two times for the same token or for the same keyword, then you will generate the same token and the database or the service provider might very well interfere what you are searching for based only on your queries, based only on your token. And there's attempts to or there's other techniques to deal with the problem. But making those practical is a major challenge currently. Fully homomorphic encryption has been on everyone's mind for the last couple of years and there's massive research efforts going on right now. But for now, you could not use that because it's simply to well, to demanding the most important computational memory. So this is not an option, but if you happen to have a few spare cycles, you may very well enter this area of research and try to find solutions to these problems. We have seen that we've implemented those schemes and there's many more. And again, if you have a few spare cycles but rather want to hack instead of research, then go off, read these papers and build libraries for encrypted search. Build libraries so that service providers can use these libraries and offer encrypted services. Ideally, we'd have a collaborative effort to demand encrypted services and to write these programs, these libraries for third parties to offer these services. We will not kick off the new Let's Encrypt initiative. But we will hopefully inspire some of you to go into that direction and to bring more encryption to the internet, to the cloud. We have seen a couple of schemes. We have seen the very first deterministic keyword encryption scheme. It's very easy to set up and you can do that well with low computational effort on your well end machine. The search, however, does not perform very well in terms of security. We have seen a well, probably better scheme in that regard. We can search over the size of the database because the server has to go through each and every entry in the database. That may or may not be what you want. If you don't want to have such a scheme, you may want to look into the cash out scheme. You can search in basically no time because you have the index. If you store the index cleverly, then you can search in one and all of one. However, whenever you have an index, you need to think about what happens when you add new entries, when you delete entries, when you change entries. If you are going the cash out route, then keep that in mind. There are so many schemes, as I've said already. All of them have slightly different features. Depending on what you actually want, you can build a very efficient scheme. If you cut down on the functionality that you expect from your scheme, then by doing some clever engineering, you can cut down on runtime and memory demands significantly. If you are about to build a scheme, think about what your actual requirements are. And as I've said, many more exists. If you do some research on the internet, search for encryption, that term will find you quite a few academic papers on that topic. You will also find a couple of libraries. There are already software implementations available. I haven't evaluated them all, but some need work. Let's put it that way. If you want to build some spare cycles, go and look at these libraries and make them built in first place, that would be good. So that we can have nice things in the future. As we've hopefully presented, searching of encrypted data is practical. You can build your encrypted database. You can have clients that search of encrypted data without the server side learning that you're looking for, directly at least. So whenever you are in a discussion about whether such a thing is possible now, you hopefully know that this is possible and we should use that. And with that, we'd like to conclude. We'd like to thank you very much for your attention. Again, before I forget it, we will be at the bar at quarter to one or whatever it is. So let's go over to Q&A. We'll still have plenty of time left. Please, if you leave the room now, do so quietly, because we want to actually hear what is being asked into the microphones. So please do it quietly and take all the trash with you. So now let's start. Microphone over there, please. Hi. In Philip Rogaway's very good paper, the moral character of photography, he writes in critique of FHE. Providing strong funding for FHE and IEU provides risk free political cover. It supports the storyline that cloud storage and computing is safe. It helps entrench favored values within the cryptographic community, speculative, theory-centric directions, and it helps keep harmless academics who could, if they got faced, start to innovate in more sensitive directions. So I read this as a critique of FHE and IEU. It's a very interesting paper, by the way. If you have a couple of hours, you should go off and read. It's a couple of weeks old, right? Like three weeks, maybe. The paper in and of itself is, well, as you said, criticizing the crypto people for being, well, way off. It's a very interesting paper, by the way. For being, well, way off the real world, essentially, right? That's the bottom line that I took away. And, rightfully, the full homomorphic, well, say, string of research, well, it's complicated in the sense that it's, say, very demanding, at least. I think it's perfectly right, and I think we should focus, we as the community should focus on making real world things happen and enabling these techniques that we have, like, for real-world usage, for usage in such a context such that you could actually upload your context and still, well, perform your queries over the encrypted data. So I sympathize with that paper, like, to the full extent possible to be, again, I encourage everyone to read that paper. And I hope that we don't fall into that category of good crypto assets as DJB called it, I think. I hope this is becoming boring crypto and that this is a commodity that you can use it just like that. Thank you. And everybody, read the paper. Yeah. And, oh, as we are on papers, before you forget, for the completeness sake, these are the references in case you want to go off and read these papers yourself. But you will find them anyway. Is there a question from the Internet? Yes, I have two questions. The first one is, can you compare this scheme with Iron Goldberg's private information retrieval? Okay. I'm not a super expert at the private information retrieval topic. But I think that it's just almost practical. And I think the stuff you present is indeed practical. So with the index stuff, usually when you have, when you use these other techniques, you have problems with the ciphertext, with the length of the search query or the index size, or you have problems with the complexity. So this private information retrieval might not work as well in practice with big data at the thing it will be. There are scaling problems. So for small stuff it might be fine, but on large scale I think it's very challenging. And also, private information retrieval serves a different purpose. With private information retrieval you will download something from the server or the server network without the server or the server network learning what you're downloading. This is different from performing search operations over encrypted data. Okay. Here, the schemes we present, the database is learning, of the server's learning, of the server. If you hide this, it will come with a coast. It will be coastly. I don't think that you can implement it now on each database and it will run smoothly. I think there is a lot of work to do. That's my opinion. Next question on the microphone, please. Hello. Hi. I have a light on me. So, one problem that I potentially see in this... You mentioned that you pad each word to have the same length, so that you can't analyze the word length. That's good. There's one other issue in that there's... I don't know if you've heard of the distributional hypothesis, but basically the co-location of words within large sets of text defines the semantics of the word themselves. So, just by... if I get access to your encrypted data and looking at how encrypted words occur with each other and knowing that, okay, this might be English or even without knowing the language potentially, I could kind of reverse engineer what specific words might be just by the fact that they occur together or how often they... those words again occur compared to other ones. So, you can kind of cluster them and there's a lot of research that's going on in deep learning on this word vector modeling, basically. And the thing there works even if you don't look at the surface form of the word, like the actual letters. So, I don't know if you've gone into this a lot in your research at all, but it's a potential vulnerability that if you encrypt your data word by word, just the fact that each word is still the same token makes them vulnerable to being, yeah, reverse engineered sort of thing. Yeah, yeah. Totally right, yeah. This is a problem. Okay, I want to emphasize once again that it's much better than upload to the plaintext stuff. So, yeah. Now we upload plaintext and recover plaintext from plaintext is not challenging at all. So, even if you perform such stuff it will cause it will cause closely and you have to you have to be evil, yeah? So, really, you have to... Now you can just look on the data and then with our schemes you have to mount an active attack. And I think it's it's it's much better than uploading. Yeah, there are problems, okay? I told you it's not maybe not the final of the best solution, but we should, I think we should now shift to upload and encrypt the data even if you can partially decrypt it with if you perform a lot of amount of stuff of computation. It still makes it harder to analyze the data. I think it's super hard to decrypt all the data partially, okay? But it's better than uploading plaintext, yeah. Okay, thanks. Please come to the front to pick up your sweets for having asked. Next question now in the middle, Eil, please. Hello. Thanks for the talk. I was quite pleasantly interested to see this talk sponsored by a company was started by a person from the Chinese National Army or People's Army. So kudos to that that they are sponsoring such kind of research. The question I have is the new anti-terror laws in China. How do they affect this kind of research? Basically you need to either build in a backdoor or have the keys sent to China. So can you in your working life use this in Huawei or is it not possible because you need to have the backdoor built in? Yes, I don't know. I'm not concerned with any of that. Okay, maybe you should look into it. Good. Yes, I have one more question. What about context? Normal search gives you context in the result or can consider the context when doing search. What's your approach to this? Difficult. You mean context in the sense that in the morning the time, maybe all the place where you are queuing from? No, just the words that are around the thing you're looking for. Some hits might be more relevant to the search performed than others despite having the same keyword. Yeah, once again, this might be a problem. Yeah, right. We have no practical solution yet. But once again, it's better than uploading your plaintext. That's why I recommend to encrypt it, use this technique and the adversary, the agencies will have much harder time than now than just looking at the plaintext. Yeah. You can say, yeah, there is this problem, this problem, so don't use it. But don't use it means you upload stuff in plaintext. And then the agencies, the military is super happy if you upload plaintext. So you make the job more difficult if you perform encryption. Next question at the middle aisle please. Are you aware of any schemes to also offload the index calculation to the provider and work on encrypted data? Whether we are aware of index schemes that offload? The index calculation as such, I understood your scheme that the client is doing the index calculation. Now, is it possible to use the index to be calculated by the server? Is that possible too? It's much easier to create index when you know the plaintext. That's something you thought about searching too in the beginning. I'm not aware of a practical scheme. There may be a scheme, but I'm not aware of any. Thank you. Next question now on this side in the middle aisle please. Are there any approaches to do a little bit more complicated queries like get all the people in your database that are older than 30, for example? Okay, yeah. Yes, but no, but yes. Well, I mean, there are schemes, right? There are schemes to, well, for example, order your entries, you know, you can encrypt in a clever way such that you could still order them by ciphertext and you would have them ordered if you decrypted them, then you would still have the very same order in plaintext. But it's difficult because some people might not necessarily consider that to be as secure as you would like to be. If you make order preserving really secure, you have exponential size in ciphertext. It's not practical. You can make better if you sacrifice some security, but yeah, there are some schemes, but yeah, right now we make some, we have made some analysis and stuff and tests, but it's not as easy as the schemes I showed you. You can do it, but it's much more tricky in Harry. Yeah. You have to consult a lot of crypto people to achieve this. Next question, please, on that side. You have presented two ways for implementing index-based searchable encryption. One with just encrypted, line-wise encrypted index and one where you additional hide the length of the index. Do you think it's a worthwhile step to put the additional effort into the second solution since each time you're searching you're basically disclosing the relationship of those rows since you will be sending a bunch of search requests and if you just search them one by one, you will also get a lot of latency and you get a lot of additional network overhead, so is it worth it? It depends on your needs, I guess. In the first scenario where you don't hide the length, you don't say vulnerable against an attacker sealing your heart disk, your database, because then you can run the analyses on the size of the index values. You don't have that in the second scheme where you hide the length. So if that's of your concern then, well, you better go for the hiding one instead of the cheaper one, say. Again, if you know your requirements, you can engineer your scheme such that it performs very well. Do you know how much information this leaks? It just totally depends on your plain text, on the structure of your plain text. And also it's a difficult subject matter to define the leakage, that state of current discussions among cryptographers what does leakage actually mean. Thank you. Pick up your sweep. There is still someone standing there, please. Yeah, so for the scenario you briefly mentioned like I have a 3TB offloaded data and I want to search it. I see that it makes a lot of sense to cooperate with the cloud provider and make them search. But I'm sort of wondering what the break even is because if I have way less data, let's say I don't know just a few gigabytes like many of us probably have it might also make sense to just compute the index on my site, upload the index somewhere I know and then downloading just the index which is way smaller than my actual data and then use that. So have you looked into at which point of volume of data does it start to make sense to employ a solution like that because I think that most people just have few gigabytes and for a few gigabytes I don't I think it's faster to just download the index, search and download the specific part of the encrypted data I want. So when does it start to pay off? No, it didn't look into that because we want to build a cloud service that's searching for the stuff. So we're interested in implementing cloud service that can perform search of encrypted data. Therefore we have not looked at the solution a little bit out of scope of our research. We still have time for one last question please go ahead. I have thought about tokenizing all your text before you encrypt them because a tokenized list doesn't change so much, you can keep it on your devices. It would also defeat frequency analysis and encrypted data would be a lot smaller as well. Yes, you're right you can do it this way. So as it seems there are no questions left. What about the internet? No, the internet is quiet. Thank you very much for your talk. Thank you.