 Thank you. So, first let me apologize. I lost my voice yesterday, so that's why I sound like this. So, hopefully you can hear me and it's not too horrible. So, we'll see how far I can make it through the talk. I'll probably skip some parts to try to keep it a little bit shorter than what I planned. So, this is a project. It's joint work with Tariq Muattaz and Martin Zhu. So, the project is called Pixic. And so, I think at this point it's pretty obvious that we have a problem with data breaches. You know, I don't have to go through all the different examples, you know, the big ones like Equifax, Yahoo, etc. Pretty much every industry has been affected by this healthcare, government, political organizations. And I found some statistics and, you know, at least one place said that since 2013 there's been 9 billion records that have been leaked as a result of data breaches. And interestingly, they also said that only 4% of those records had been encrypted. Okay, so, as a cryptographer, the first thing that you think of is, okay, well, you know, why were so few records encrypted? And so, we can speculate, you know, as far as what the reasons could be. So, one of them could be maybe, you know, people are incompetent, right? That could be one possibility. Or maybe they're lazy, that's another possibility. Or it could be that it's just very expensive to encrypt these records, right? And I don't really know the answer to this question. It's probably a mix of many different things, right? It's a complicated question. But there was one quote in The New York Times that I found interesting. This is by John Bonforte from Yahoo. This was an executive at Yahoo. And at the time of the Yahoo leak, The New York Times asked him why Yahoo didn't encrypt its records. And the answer that he gave is he said because it would have hurt Yahoo's ability to index and search messages. So, is this the only reason that records aren't encrypted? No, right? As I said, there's probably many different reasons. But at least this quote suggests that this is one possible reason in some cases, okay? So, this, of course, naturally motivates the question, can we search unencrypted data, right? Because maybe if we could, then some of these breaches or some number of these breaches, I don't know what percentage, could have been prevented. So, it turns out that we can search unencrypted data. Sorry, so I have, okay, not everything is coming up in the slides. But so, yeah, so we can search unencrypted data. We have many different techniques and approaches to do this. One approach is called property preserving encryption, which includes things like deterministic encryption and order preserving encryption. We could also use functional encryption. There's something else called structured encryption. We could use fully homomorphic encryption. And we could also use oblivious RAM. Okay, so we have many different techniques and approaches that we could use to solve this problem. But what's really important to understand is that all of these different techniques provide different trade-offs. Okay, so there's no kind of like magic solution to this problem. You either have to, you know, so there's different trade-offs. One trade-off is between efficiency and functionality. Another one is between efficiency and security in the form of leakage, right? Because all of these techniques leak something. The question is how much do they leak? And how much are you willing to leak in order to achieve performance? Okay, so these are the kinds of things that you have to think about when you're evaluating a solution for this problem of searching unencrypted data. That's very, very important. So this is a field that's evolved really starting in 2001. So it's like 17 years old at this point. And, you know, we've sort of done a lot. I think it's a, you know, it's a pretty, it's starting to be a pretty mature field. Here I only have some, you know, some of the steps for, oh, sorry, some, I only have some of the steps for some of the more efficient approaches. So for example, with property preserving encryption, we had deterministic encryption and order preserving encryption in 06 and 09. The first proofs for order preserving encryption in 11. We actually had systems built based on these primitives, until in 12, the CryptDB system. Microsoft is using deterministic encryption in SQL Server. And in 2015, we also published attacks as well. And there was a recent paper by Kevin Lewy and David Wu on how to use or how to design property preserving encryption schemes that can provide security as a snapshot adversary. And that's a particularly interesting development, I think, because they actually argue and, you know, and I think it's a reasonable argument that property preserving encryption could be useful for that particular adversarial model, at least with their, with their schemes. So what's interesting here is that we have, you know, constructions and we have attacks as well, we have a form of crypt analysis and we also have some systems, okay? And the same thing exists for all of the different approaches. We have this for oblivious RAM as well. We had the seminal work of Goloch and Ostrowski in 96 on ORAM. We had these great constructions by Stefanoff and she, tree-based ORAM, path ORAM. We have systems based off of these. Oblivious Store, Oblivious PDP, Tau Store. We had some recent crypt analysis as well in 2016. And in the third world of structured encryption, and I put this in red box because the work that I'm going to talk about today is based on that approach. We had early work in 2001 on SSC and then we have structured encryption. We had crypt analysis starting in 2012. The first system is CS2. These are systems that we built at Microsoft Research. And we have other systems like OSP here. Well, this is sort of the name I'm giving you. This is a system from IBM Research. Blind Seer from Columbia, et cetera, et cetera. So I think we've made a lot of progress and we've had some interesting results. And I think the exciting bit is that we actually have systems, this research kind of prototypes built. So as I mentioned, the work I'll talk about today is based on structured encryption. And so I'll just give a very high-level view of how this works. So the idea is that we have a client that has a set of documents and we have a cloud server that's untrusted. And what the client is going to do is the client will take its data, its document, and it's going to encrypt the documents using a standard encryption scheme, something like AES. And then it's going to produce an encrypted data structure. And that's that little blue box thing with the data structure in it. So this is a data structure. It allows us to search. You can think of it as a hash table or search tree or whatever you want. But it's encrypted. So the data structure is encrypted and you can also query this data structure using encrypted queries. So the client will send the encrypted documents and the encrypted data structure to the cloud. And at a later point in time, when the client wants to query its data, it generates a token. That's that TK. And this token basically encapsulates the query that the client wants to make. So the client sends this token to the cloud. The cloud is going to take the encrypted data structure and the token. It's going to combine them in some way. And the result of that is going to be pointers into the encrypted documents that it needs to return. And then it just returns those encrypted documents. So the point here is that the cloud only sees the encrypted documents, the encrypted data structure, and the tokens. It doesn't see anything else. And at a later point in time, if the client wants to update the data structure, it can send an update token. That's the UTK there. So that's sort of at a high level what we do. And of course, there's going to be, depending on the solutions that you choose, there's going to be more or less leakage at these different stages. So before I keep going, so I mentioned I motivated this by saying that we have a problem with data breaches and we do. But you can also ask a reasonable question which is, well, would encryption really prevent breaches anyways? And data breaches occur in many different ways. Sometimes people use phishing to get people's passwords and credentials. Sometimes people pick weak passwords and you can have a dictionary attack or somebody could just steal the data off your server. And so encryption is not going to help with everything, but it'll help with some of these data breaches. So that's sort of as much as I want to claim. So we have this technology which is encrypted search with all these different approaches. We have systems that we've built and these research prototypes that we've built. We started doing some crypt analysis on these things. So I think the natural question is, can these schemes or can this cryptography be deployed? And that's not an obvious question. And when you talk to people about this, there are a lot of reactions. So some people are really optimistic and they say, of course, it's going to be awesome and it's going to change everything. And some people are very negative and they say, oh, it'll never happen. It's all broken. It's all bad. So you get a little bit of everything. So what we decided to do with Tariq and with Martin is that we said, well, let's see for ourselves. Let's see if we can actually deploy something. And so we looked at the space of anti-encryption and we saw that we're doing really well on messaging. We have a lot of really great anti-encrypted apps. Obviously, like Signal, WhatsApp. We have Facebook Messenger. And we're doing sort of OK on video. We recently learned that Skype is going to get anti-encryption. FaceTime already has it. So I think we're doing, this is really exciting. We actually have real world anti-encrypted apps, at least for messaging and for video. But one thing that we didn't find is encrypted apps for photos. So this is an estimate that I got in 2017. The estimate is that there were 1.2 trillion digital photos taken. And 85% of those were on smartphones. 4% on tablets and 10% on digital cameras. And something interesting about photos and especially people's photo collections is that one thing is that they're very large. If you're anything like me, you have tons and tons of pictures. They also have high sentimental value. These may be pictures of your family, of your kids, your parents, et cetera. You really, really care about these pictures. The thought of losing any one of them is actually a big deal. And so what that motivates, it makes, I think this suggests that people want to store their photos on the cloud. At least I do. This is something that I do. Because the cloud, of course, has a lot of storage. You can store a lot on the cloud more than on your phone. And also the cloud will back up your data. So you have a smaller chance of losing your pictures. But photos are also private. These are personal moments between you and your family, you and your friends, et cetera. And you don't necessarily want everybody to see them. Sometimes you have really goofy pictures with your friends. Like your students to see you with a beer or something. I don't know. So encryption actually makes sense in this setting as well. You also want to protect the privacy of your pictures. So there are also other reasons. Another reason is, for example, in 2014, Edward Majursik hacked 30 Gmail accounts, NICloud accounts, and was able to get 500 private pictures and leak them. And these included very sort of compromising pictures of many different celebrities. And you probably heard about this. This was actually all over the news. But even if you're not worried about data breaches, even if that's not your concern, maybe your job requires you to have very sensitive photos. Maybe you're a journalist or photojournalist. And your pictures are very sensitive and you want to avoid censure. Or maybe you're an activist. And you take pictures. You have pictures that can show certain acts or shows government abuse. Those pictures are very important for the world to see, and you want to protect them. You could be a citizen journalist. In some parts of the world, the only journalists that are actually on the ground are just regular citizens with smartphones. So the point I'm trying to make here is that I think there's many reasons why you would want to protect your photos. I think this is a well-motivated problem. And so what we did is we said, okay, well, let's try and see if we can build an app to protect photos, an antenna-encrypted app. And so basically that's what PICSAC is. It's an antenna-encrypted camera app. And this is sort of what it looks like, at least right now. So the first screenshot is basically the login screen, your email, your password. The middle one is taking a picture. So it's like a regular photo app. And the third one is your photo collection. And here everything is encrypted and backed up on the cloud. It's antenna-encrypted. The key stays on your device. But we have a few additional features that we think are interesting. So as built-in blocks, we use Clusion, which is an encrypted search library that we built at Brown. And this has all this state-of-the-art searchable encryption schemes. And we have some new ones that we've sort of that we've designed. They're on the submission. They're not published yet. But we have the implementations for some of them already. So we're going to include those. We also use TensorFlow, which is an open-source machine learning library from Google. And we use Geomobile, which is a geolocation database. We're going to explain exactly how these components fit together. So this is a statue called Lamp Bear. And this is on the Brown campus, not too far from the computer science building. It's a massive statue. It's like 23 feet tall. While we have this on campus, I'm not exactly sure. But we do. And so suppose that one day you're walking on the Brown campus and you take a picture of Lamp Bear. So the way that the app is going to work is that you're going to take a picture. And the first thing that the app is going to do is it's going to down sample. So it'll create a thumbnail, essentially. Something smaller. And then it's going to encrypt that thumbnail. Then it's going to take the picture and it's going to encrypt the picture. So far, this is pretty straightforward. The third thing it's going to do is it's going to use TensorFlow, which is Google's machine learning library, to figure out what is in this picture. So in this case, it would figure out that there's a bear and there's a lamp. Maybe it'll figure out that there's something blue as well. At this point, the app is also going to add a tag with the picture's geolocation. So it knows where you are on campus or in Providence. So it adds Providence for Ireland. And finally, you have the option of adding your own tag. You can modify previous tags or you can add your own. In this case, maybe you add a tag Brown University. So once all the tags are generated for this picture, each tag is encrypted. This is actually the reason we do this is not for... Well, it's kind of for a special reason. Maybe I'll get to that later. But the interesting bit is that these tags are then used to produce update tokens. And these are the tokens that I was referring to before. So I should have mentioned on the right-hand side, we have the cloud. So we have servers running on the cloud. We don't make any assumptions about the cloud. We're using Amazon's cloud. We have EC2 servers. And we're using S3 for storage. And we have that encrypted data structure for every user stored there. So once the tags are created, we generate update tokens for the encrypted structure. And then everything gets sent to the cloud. So the encrypted thumbnail, the encrypted picture, the encrypted keywords, all gets sent to S3. And these tokens get sent to EC2. And then we update this encrypted data structure with these tokens. And the point is that everything is encrypted here. So our servers don't actually see anything. And I'll talk a little bit about leakage later. So far so good. Does this kind of make sense? OK. So now, all your pictures are stored in the cloud. So we cash some of them on your device. But let's say the vast majority are in the cloud and they're encrypted. But now maybe you want to search over your photo collection. Maybe you want to retrieve all the pictures from summer 2003 or something. Or all pictures from Providence, Rhode Island. So what you do is you say, OK, well, I want to retrieve all the pictures with the bear in it. And what we're going to do is we're going to generate a search token for bear. And then we're going to send that to our servers. And the cloud is then going to take this token, combine it with the encrypted data structure, find the encrypted picture that it needs to return, and send back the thumbnail, which is encrypted. That gets decrypted into the thumbnail. So this gets your least snapshot or thumbnail very quickly. And in the background, we also send the full picture as well. So it's encrypted. It gets sent back. It gets decrypted to the server. Sorry, there's some items missing from the slide here. But hopefully you can see the difference between the cloud and the client, even though it's not sort of showing on the screen. Yeah, so that's roughly what's going on. There's a lot of things that I'm not talking about. So of course, caching and crash recovery, password recovery, how to use multiple devices. We also have a local mode in the app. So if you don't trust the cloud for some reason, even though we're using encryption, if for whatever reason you don't want to use the cloud, you can just store everything on your device. That's perfectly fine. So we have this local mode. So the current version that we have, so one interesting thing to notice is that this is sort of a streaming process. You take pictures kind of one at a time, and then we're going to send them to the cloud. And so this encrypted data structure that we're using has to have some stronger properties than normal. And in particular, it has to have a property that's called forward privacy. So for those of you that are familiar with this field, you'll recognize what this property is. I'm not going to get into the details, but I'll just mention that. So this was a property introduced by Stefanoff of Amonto and Shea in 2014 for a long time, or for a few years, I guess. It's a long time in this world. It was a very expensive property to achieve. Like most of us thought it was expensive. And then in 2016, Raphael Boss actually showed a CCS that you could actually do it efficiently. So this was a really nice result. And so we actually use a forward-private scheme. We don't use kind of the published state-of-the-art. We sort of designed or we modified a construction by Cache, Yeager, Jarek Jardler, Kopschik, Russell Steiner called Python. We modified it to be forward-private. This allows us to avoid public key operations and constrained PRFs. Though the asymptotics of our version are worse than the published state-of-the-art. But we actually have a trick that we can use. We haven't implemented it yet, but we have a trick that we can use to make it actually asymptotically optimal as well. So that's what we're using. So I was going to go into the details of the Cache-Adult construction, but yeah, I don't have that much time because my voice is getting worse. So if you're interested in the details, you can just read the Cache-Adult paper and then we'll be happy to describe how we modified it. It's really a small change. Yeah, I'll mention also we're sort of in the process of writing up the details of our architecture. So at some point we'll have a write-up if you're interested in seeing all of the gory details of how the app works. So yeah, so I mean just kind of... Yeah, so that's sort of our for-private version. Okay, so yeah, so this I don't want to skip, so this is the leakage. So we leak or our scheme, our encrypted search solution leaks two things. It leaks the search pattern and the access pattern. So intuitively what the search pattern means is that our servers are going to see if your query is repeated. So if you search for bear three times, we're going to know that you search for something three times. We're not going to know what, but we know that you search for something three times, for the same thing three times. And the access pattern, what that means is essentially that when you do a search, if you search for bear, we know that these three encrypted pictures that we're sending back, that they contain something in common. So we know that these three encrypted pictures that we're sending back to you are related to your search query. We don't know what that search query is, but there's sort of a bit of information. Of course the question is, what is the consequence of this leakage? What is the implication of this? It's not a sort of easy question to answer, but basically one way to try and give some intuition about it is to say that in order for us to actually see your pictures, we would have to break AES. Clearly, we like to think we're pretty good cryptographers, but we're not that good. The other thing is that if we wanted to learn information about your queries, at least with the state-of-the-art, which is kind of this counting attack, the cash-it-all, if you want to know something about your queries, we would have to know about 90% of your tags. So either we'd have to guess them, or we would have to somehow find out 90% of the tags that your client, that your PIXAC client has generated. And we would also have to know the occurrence of each of those tags. So how many times this tag has been associated to a picture? So how likely is it that we can do this? From our point of view, we think it's pretty unlikely, but we just want to make sure that everybody is aware of it so that if you somehow think that we can guess most of your tags and we can guess the occurrence, then if you're worried about that, then you shouldn't use the app, okay? So, yeah. So the last sort of point I wanted to make is that, you know, this is, so the app is really just kind of in testing phase, right? So we need your help. It's available on Google Play. It's only available on Android. Right now it's not sort of, not everybody can get it. You have to sort of be registered to be able to download it. So if you're interested, you can send us a message on Twitter, Pixec app. Send us a message and we'll sort of add your name and you can download the app and play around with it. And, you know, we'd be really grateful for any feedback, feedback related to usability, feedback related to design, any kind of thing that you can think of will be greatly appreciated. So, yeah, this is the website, Pixec.io, and again the Twitter name handle if you want to get in touch. Thank you. Yeah, questions. Thanks, Sami. So one of the examples you gave is search for the photos taken in summer 2013. So how flexible is your query language? Can you literally say summer 2013 and it'll search a date range, or do you have to have tagged every picture with 2013, summer 2013, August 2013, and August 29, 2013? Yeah, that's a great question. So right now, it's not flexible at all. So basically it's single keyword search. You just have to say, you know, bear or like lamp, and that's it. Now, there are a lot, there are schemes. The literature is some that we've designed, some that other people have designed, that are more flexible where you can say like some, you know, you have boolean queries, basically. So ands, ores, et cetera. And those are actually, like now we're at a point where those are efficient, like really efficient. So we could actually implement those. We haven't done so yet. Now, the caveat is that those schemes, even though they're more flexible and they're very efficient, they leak more, right? So this is sort of always a straight off. So we sort of haven't made a decision yet as to like where we're going to go, right? And that's going to be also feedback that you can give us, you know, like, you know, we think the leakage is reasonable, but you know, that leakage, the leakage profile of the boolean schemes hasn't been analyzed as much as the leakage profiles of the single keyword search schemes. And dates in particular are more like range schemes. Yes, exactly. So that's another great question, also another great area of research. Range queries, this is an area that's sort of evolving. So clearly it's a very useful functionality. We have, there have been some constructions published on range queries. There also have been very recent attacks. Kenny has a paper at Oakland that breaks many of them. So, you know, the jury is still out on range queries. We actually have some work in progress that, you know, that is not susceptible to Kenny's attacks. But, you know, this is an evolving problem. Great. Hey, Sunny, thanks a lot for this work. I use Google Photos and God help me if my library ever gets hacked. So I really appreciate it. I had one question about the password mechanism. So it seems that you derive a secret key from a password and then you encrypt everything with that secret key. Is that at a high level how things work? Not, no, not quite. So basically the, we generate a random, a random key. And that's the key that we use to encrypt and the key that we use for the encrypted data structures. I mean, we have, you know, we then derive multiple keys. And then we encrypt that key using a password derived key on the device. So our assumption is that your device, right, is going to be secure, right? But the keys that we use with the server, with anything that's related to the server, those are real keys that they're not password derived. So we can't like, you know, so, so somebody that's trying to like fish you, for example, right? There's no, there's no password that they can derive or that they can get from you. They would have to actually compromise your device. Sure. So, so yeah, my question was if I lose my password or I crack my device, I'm, I basically lose access to my photos. Yeah. Yeah. So that's a great question. So we have kind of a, so we have a recovery mechanism. Essentially you're that key, that random key, not the password generated, that random key, is encrypted using the answers to your security questions. And that is backed up in the cloud, right? So if you have good security questions or, you know, with reasonable entropy, then you'll be okay. If you don't, what's going to happen is that an adversary or we would have to sort of, we would have, an adversary would have to recover that encryption and guess your recovery answers or break your device. Sure. Great, thanks. Yeah. Hey, Sony. Hey. So one quick comment clarification about your New York Times quote, that kind of sound like it was a delivery thing not to encrypt stuff. And in fact, so the credentials were encrypted. The reason was, you know, unprotected servers, even if stuff was encrypted, the keys were compromised. Okay. So you could still search on, you know, mail or whatever and push ads, having encrypted credentials. Okay. But, you know, that is our context. All right. No, no, no. No, no. I'm going to write a letter to the New York Times and tell them, you know, thanks. Hi. So I think you didn't mention where the TensorFlow model would run. So that's the model, right, to actually detect the tags, say, in the, to predict the tags from the picture. Now, it seems to me that if it run on the client, well, that would be quite inefficient. Oh, yeah, yeah. And if it run on the cloud, then you would look essentially the picture, right? Yeah, yeah, yeah. That's a great question. Thanks. So, right. So the machine learning algorithm is running on your phone. Okay. So it's not being sent to Google or being sent to IBM to, you know, like Watson or something to do. So that's completely local to your phone. So this is using TensorFlow mobile, which is the mobile version of TensorFlow that's optimized for phones. Right. And that's efficient from your experiments. Yeah. Yeah. I mean, of course, I'm, you know, so I'm not a machine learning expert, but clearly my guess is there's some trade-offs between efficiency and machine learning, you know, but sort of. If I have an accident and I get to coma and I wake up in 20 years and pixel, that doesn't exist anymore. Did I lose my photos? So, yeah, if you're in a coma and if, like, all of us, you know, get hit by a bus, yeah, I think you're, you may be in trouble, yeah. Sorry, I didn't hear. But yeah. There's, you know, there's some probability that, like, we'll all die tomorrow because of something that, like, that Donald Trump did. So, you know, I don't know. But, yeah. Okay, let's thank the speaker and all the speakers for the session.