 Hello, everyone. So I'm going to talk about the other side With respect to the previous talk namely a constructive side So how can we build better and better systems? How can we provide more and more security guarantees while having a practical system? And that's very challenging but we've been able to make one step at a time and I'm going to tell you about Two steps in this direction Myler and Verena and this is joint work with my wonderful collaborators some of which are here as well So the problem we're looking at is the fact that web server could be compromised and Confidential data can leak for example, you have the web servers stored in a cloud and Hacker breaks in and steals a snapshot of the data Myler and Verena aim to improve confidentiality and integrity in the face of compromised web servers So to understand these systems, I like to think about this the following security spectrum On the left side, you have today's systems that process data in the cryptid form So an attacker breaking into the server is going to See all the data so everything leaks a first big step towards more security is To be encrypt with end-to-end encryption sensitive fields For example, the developer can say that the medical history in this medical app is sensitive and Encrypting it with end-to-end encryption will make a big step forward. This is particularly This particular relevant to passive attackers that just get a snapshot of the data and Can see that the and that the sensitive fields are encrypted but then we have active attacks that can compromise the server in arbitrary ways and An important step towards security for these attackers is to make sure that the attacker cannot compromise the client side code Because client side code has access to the cryptid data and in the web setting This is not so trivial because the client side code comes from the server who's compromised another important Another important aspect is to make sure an active attacker does not tamper with key distribution because keys can decrypt Another property going forward towards more security is to not allow any tampering from the attacker No tampering with data. No tampering with computation and results of computation Then yet another step is to hide metadata So not just the sensitive fields, but the information about them such as who has access to what? Then we also have access patterns. How do we hide access patterns during computation as well during search as well? And this is very difficult to do today in a practical way because we know that or n-based techniques are quite heavy and quite slow and Finally, there's this miscellaneous of topics that also leak And we would like to hide them and those are what operations are you running on the data? What's the runtime? When are you running a query the data size and the structure and these are very hard to hide as well? Because many times they involve worst-case padding and that can bring a lot of performance overheads But if we can hide all of these then we can give very strong security I'm not saying perfect because there's always usability concerns and assumptions we make but you can get very very strong security So unfortunately today, we don't know how to build practical systems systems with these target security We have a bunch of relevant tools in theory such as ORAM garble RAM and FHE, but they are too slow for practical systems Building a practical system is much more challenging and we've been able to make steps at the time now each step contributes a Conceptual property on top of what existed even if it doesn't give you perfect security So I've done a lot of work along this spectrum and let me tell you today about two systems Miler and Verena Miler provides end-to-end encryption for annotated fields and Against the attacker that compromises everything it provides two guarantees one that the attacker cannot tamper with the client side code and the second that The attacker cannot lie during key lookup Then comes Verena Which tries to prevent the attacker from tampering with the data and the query results So the attacker cannot give you crap Okay, even if the even if the web server is compromised is going to give you as a result the correct results And actually fresh of the press we have a new system opaque that we're preparing the camera ready for that Actually hides all metadata and hides access patterns including during computation opaque is based on oblivious algorithms combined with Harder enclaves and if you're interested in that then I can come next year and tell you about it But opaque focuses on data analytics so not the web setting and there's a bunch of other Challenges to address in the web setting. So there's a bunch of to-do left here Although we expect that some of the techniques from opaque will apply to the web setting because opaque supports rich functionality And for the last Category it's big question mark. It's really not clear how to hide that information in an efficient way okay, so Let me tell you now about my learn Verena So my lawyer is a web framework that protects sensitive data fields with end-to-end encryption the developers specify which are sensitive fields for example private message in a chat and the browser encrypts these fields and Gives encrypted data to the server. So for example if Alice has Encrypted sensitive data the browser encrypts it if it's a sensitive field that was marked by the developer and sends it encrypted to the web Server, okay, and the web server is only going to see this field in encrypted form It's never going to get the encryption key from the browsers And when Alice accesses the data the browser decrypts it for Alice so from Alice's perspective everything is seamless Prior work has already proposed encryption the web setting However, they were far from sufficient to use in real web apps for a few reasons first We show that the common web framework such as Django Ruby on rails is not compatible with encryption and we propose a different architecture Second in these systems the data sharing was burdensome Users had to do manual key distribution many times third their active attacks on client-sized search code and key distribution that can lead to significant data leakage and Finally, there's no support for efficient a server-side search so Milo addressed these challenges and In this talk I can only tell you at the high level about two of them how do you prevent active attacks on key distribution and How do you add support for search server side? All right, so the API in Milo is that the developer says which fields are sensitive and Milo is going to encrypt them Milo does not protect anything else The developer defines the access control so who has access to those encrypted fields using a principal graph So the principal graph Looks like this. Let's say in a chat application the developers specified that Alice Bob Eve chat work and chat party are principles and The edges denote access. Okay, so Alice and Bob has access to the chat work and no one else Each of these principles has a key actually multiple keys But think of a key and these keys are actually stored at the server encrypted They're encrypted with key chains, which I won't tell you how they work But the property they give you is that only the users that have access to in the principal graph can actually decrypt the key So only Alice and Bob can get access to the green key and Now the information inside each chat is encrypted with the key for the chat. However, there can be an active attack Key distribution when a user looks up a key So consider that Bob wants to write some text in the chat work Okay So He needs to fetch the green key from the server. So he tells the server. Okay, give me the green key Give me the key for the chat work, but the server is malicious might give him the purple key And as a result of that Bob might encrypt the data in a way that it can see it and that's not what he intended To protect against this attack mylar certifies keys Okay For example The green key certified by Alice so she produces a digital signature saying this green key corresponds to the chat work Okay by Alice and mylar supports arbitrarily long certificate paths similar to x509 Okay, the problem though is that Okay, now when Bob checks Fetches the key he can check the digital signatures on the key But how does Bob's mylar client know that he's supposed to check the key against Alice's signature as opposed to Someone else's signature for example if can also create a chat work Only Bob knows that He wants the chat work with Alice. Okay, so Bob has to tell this to mylar and For this reason in mylar principles have human meaningful names The developer is supposed to display The certification path for every relevant principle that the user might give data to for example in this case We have three available chat rooms. One is work created by Alice That's the certification path for work work created by Eve anyone can create a chat with a name work especially if they're malicious and Party by Eve. Okay, and the user is going to choose the work by Alice. That's where he wants to input his data He wants to talk to Alice Then mylar can check that the green that the key he gets from the server is certified by Alice And he won't be tricked into accepting the purple key Okay, so to prevent against this active attack. We made an explicit design decision That is that mylar does not hide the name of the principles and does not hide what principles have access to it Okay, we want any user any user coming in and wanting to give access to a principle to some data To know who that principle is who gets to see it who certifies it? Okay So the second part about mylar. I want to tell you about is search So mylar provides searchable encryption as a separate package from the main mylar system And it works by if the developer specifies a field the searchable then Mylar is gonna enable a searchable encryption for that one and we have an access controls function It's very important here allow search that controls who can see search queries Okay, so let me tell you how mylar does search first. Let's consider using a standard searchable encryption you tokenize every Chat into keywords and then when Bob wants to search for let's say SSN He has to give a token for each one of those chats with each one of those keys And then using regular searchable encryption the server sees that there's a match or that there's no match and We okay, we build and the problem is that as already Paul mentioned The client has to generate tokens for each one of these chats which could be burdensome So it'd be really much nicer to generate only one token and that's basically what mylar contributes here So mylar gives a server a delta Which allows the server to convert so an encryption of SSN into an encryption with the two different keys And then it's just regular searchable encryption from their own Okay, and because mylar builds on a searchable encryption It has the same properties the same leakage patterns a searchable encryption Namely that the server is going to know that there was a first match there and then there was no match And there were no matches. Okay, so the server is going to see the matching part pattern Okay, so to mitigate the amount of information that seen in a multi-user setting we designed specifically to allow search functions So we designed it for the following attack which we described in the mylar paper The attacker compromises the server The attacker creates a principle called attacker with The key so he has the key for the principle. It gives access to Bob Bob generates a delta automatically and then when Bob searches for SSN the attacker can convert that Into an encryption under the red key. That's what the delta enables you to do And then can mount a dictionary attack. Okay, just because the attacker has the red key He can mount a dictionary attack and figure out that SSN is what's encrypted And then the attacker is going to see that oh that matches in other documents as well just by the searchable encryption functionality so Because we were aware of this attack we designed to allow search functions specifically for it The point is that a user should not automatically give the delta to anyone giving them access the delta should only be given to trustworthy principles Okay, so the developer must call a law search on trustworthy principles Sometimes it has to was the user do you trust do you trust this principle this person to know your search queries? Are you fine if they know your search queries? Or other times a developer can just figure out from the application logic for example the boss can have the law search Okay, but the point is that the attacker should not get the delta Okay, so the law search make sure that only shows worthy principles get access to the queries so because of this the attacker and The other thing is that the attacker cannot fake that he is boss or doctor because my lawyer certifies principles as we just described Okay, so because of a law search the attacker is not gonna get delta and he's not gonna know the query what was searched All right so Going back to our security spectrum. Here's where Myler lies and It's natural right now to discuss the previous paper but that poll presented The very first point that was not so clear in the presentation is that my lawyer does not claim to provide Full confidentiality perfect security. Okay, the paper has a bunch of places where it clearly says tax It does not protect against That are out of scope Okay, now looking at the three leakage scenarios that grabs a tally evaluate We find out the first two are out of scope and the third we believe it does not work if you use Myler correctly Now I want to say that even if things are out of scope. They're still useful right because in my security spectrum Right, we want to get all the way there. We want to have all these properties So by all means even out of scope attacks are useful But we just have to be aware that they're out of scope and that they don't actually affect what the system gives you So concretely if we have a medical application, let's say Myler encrypts the Medical history of each patient. Okay, so it protects the contents of the medical history Myler does not protect That the doctor Alice can see Bob's medical history Okay, that's the this explicit design decision that we made It will be known who has access to what for certifying principles as we discussed Second Myler does not prevent does not protect the fact that Doctor Alice has accessed Bob's medical history Okay, and protecting that is really hard to do efficiently. We just or I'm based techniques are very slow So most most practical systems don't protect such leakage Okay, what Myler gives you is the fact that the content of the medical history is encrypted Regarding the attack on search This is the same attack that we described in our Myler paper the one I just showed you it has the same seven technical steps and We designed allow search specifically to stop this attack Okay, so it is the same attack we were aware of in 2014 and we designed allow search to prevent it And if you want more details step by step Please take a look at the Myler's website Alright, so going back to our security spectrum. This is where the attacks of Grabs et al are They don't affect the two steps of security that Myler gives you they're still very useful because we want to get all the way There so we won't understand what we have left to do Speaking of left to do let me make another step forward in this spectrum and tell you about Verena Verena ensures that an active server attacker cannot tamper with data computation code or keys and It provides two benefits first it gives you confidentiality against the wider class of active attacks than Myler did and Verena was designed for Myler. It has the same architecture. So it was designed with Myler in mind for example if the attacker if the server Removes one of the users from a ban list Then it might be the case that later some user gives access to that banned user because they didn't know he was banned and leak Okay, so it's important to take care of tampering for confidentiality as well And then of course for end-to-end integrity. Well, you want your users to get back correct data results The threat model in Verena is that the web servers can be fully compromised. Okay, really fully compromised And here's what Verena gives you so let's say you have a pacemaker application where the user's pacemaker sends heart rate samples to the web server and You have a doctor that analyzes this to give a diagnosis There for example a doctor wants to know the average heart rate for Alice in a timestamp interval The doctor is going to receive back a result from the web server along with a proof Okay, so in Verena the web server is going to prove that the result is correct And what do I mean by correct? Well, if the doctor checks this proof and the proof was correct Then the doctor is guaranteed that The data was only changed by Alice so the heart rates come from her so write access control That the data was complete. No sample was removed that It was fresh latest and that the average was computed correctly There's been a bunch of other work that try to provide these properties But they're not applicable to the web setting first because it didn't deal with multiple users and there you have access control You have lack of coordination and it's not fit for the web setting because it's the web setting is stateless So let me tell you a very high level flavor of how Verena achieves this properties in the web setting First if we consider that the web server can be fully compromised and that users do not coordinate with each other The problem is that there's an impossibility result saying that you can't actually give freshness The the attacker can fork the users namely after a while the doctor is no longer going to see Alice's updates So you can in order to bypass this impossibility result One makes some trust assumption for this we introduce a hash server and the trust assumption is that It does not call it with the server Okay, so at the most one of the two servers can be malicious Okay, either one can be malicious including the hash server, but at most one and our hash server is very simple it's mostly a key value store for hashes and it has a small tcb and What we are going to use it for at the high level is to store the latest hashes of the data So that when the users get data back they can check them against those hashes and make sure they're fresh in Verena the developer has to specify the integrity policy and it does so using Our integrity API that interestingly it's attached to queries. No data In fact in Verena, we don't really care what happens to the data the server can be changed It can be deleted as long as the server answers correctly to all read queries Every query in Verena runs in a trust context This denotes the members the users that are the only ones allowed to affect the result of the query For example, if you select the average heart rate for patient Alice, then the trust context is the patient Alice Okay, only she her device can change the result of this query If you want to select the list of patients, then only the admins can affect the result of the list of patients Okay, so if this is our integrity API at a very high level, how do we enforce it and Very important building block for us is authenticated data structures. These are basically search trees Sorted by sorted by the range field in this case timestamp The leaves also have the values like heart rate the values we want to aggregate and then each internal node has a partial aggregate That is the sum of the values in the subtree the sum or some other function. Let's go with the sum for now Okay, so for example in this subtree a hundred fifty is a sum of eighty and seventy and Then there's a Merkel tree that's built on top of this Okay, and if the client has the root of the Merkel tree Then the client can efficiently verify a proof from the server in which the server proves that the interval of interest of the user Has that aggregate? Okay, using the properties of the Merkel hash tree So this is all work that existed and we're going to use it Verena is going to compile the integrity policy into a forest of ADS trees that are linked with each other The root of each ADS tree will be stored at the hash server to ensure freshness So let's consider a complicated query and let's see how Verena handles it We want the average rate. So we're joining we're joining two tables patient lists and patients Okay from the patients list we want the patients with age greater than 70 and then for each one of them We want their average heart rate in a time interval So what Verena does is it goes? The server goes through all this trust context to assemble a proof of correctness First it uses the trust context admins to assemble a proof that these list of patients that it's returning is complete and Contains all the patients with age over 40 70 Then it goes into the trust context for each patient to compute the average heart rate for a time interval and Prove that that one is correct. Okay, so using this forest of trees at the high level Verena can give you the properties We talked about and There's so much more. I haven't covered So please take a look at the papers if you're interested in that One point I want to make is that both Mylar and Verena have modest performance overheads of their practical systems So in conclusion Mylar and Verena make significant steps towards providing security in the face of compromised servers a We open source Mylar and we plan to open source opaque as well. Hopefully it will help the community Building real-world Crypto based systems is very challenging But we can make one step at a time and each step removes a whole class of attackers Even if it's not does not give you perfect security And I hope I showed you that there's a bunch of really cool work left to do and hopefully this session will get you interested in working on it So, thank you So we have time for questions Hi there On on one of your slides in the middle of the Mylar portion of this you suggested that users Could select which principles they wish to trust I After decades in this field I have been unable to find a deployed system in which this works Users are not capable of making security decisions so far as I can tell and if you leave security decisions to users systems fail And and when I was young and naive I I actually believed in constructing systems on the principle that you could let users make Decisions like this and I have long since been disabused by by bitter experience Yeah, so I agree with you that a lot of users don't make the right decisions and I do almost none I mean the number you can you can basically guarantee that you know Perhaps a fraction of a percent of users will make the correct and and I also agree that It's not necessarily easy It sounds somewhat inherent because we need to know what the user wants who do they want to give access to and Moreover, they're thinking is that they will use this kind of system for Critical applications where hopefully they could be more educated than be more incentivized I agree with you that it's hard to make these decisions correctly But I I don't know of any other better technique because the user has to convey to us What principle they mean at the same time? I think there's really important to do more work on this and on the usability studies Maybe they don't select the path Maybe you make sequence of pages be so natural that they go to the right place So that would be usability work great point. Oh, let me just under intervene for a moment We're actually already eight minutes late for our excellent sandwiches So if you could keep the questions succinct and also try to keep the answer short that would be much appreciated. Thank you So I have a short philosophical question You're saying this is better than nothing because you're protecting in some attacks even if not all But it's also more dangerous than nothing because if people have false trust that the system is good They're gonna put all their Vulnerable information and use it with extra trust So I just wanted to hear what you think about that So I think that the user should not have this false feeling I think they have to be they should be informed, right? We're giving you one-to-one encryption for fields you mark sensitive. We don't hide anything else. They should be informed right Building comprehensive secure system is Seizific and hard task. Yes, and what you what you call users call them Highly trained veteran security experts Because it's not Sorry, but my voice It's not like you you don't have a so if you have some Primitive in cryptography that is like, you know, zero knowledge proof if I add it to an input and I verify it in any Context I can do it. It's a very robust and you you can you can put it in you don't need to be an expert if you use a system that is like a Swiss cheese it covers certainly but there's holes you need really experts to know where is the hole and where is not the hole and how the system is going and You need to monitor it all the time where you violate the Cheese and gets into the hole. It's it's very complicated But in in certain it's not robust in the in the sense security is not going to be maintained And I think don't use users use experts Yeah, so I agree that Users have to be they have to be careful here The one thing I want to say is that these applies to many other systems we have today, right? So whenever you have end to end encryption, they need to understand that they don't protect access patterns or metadata Okay, so this is not necessarily specific to mylar Pretty much every practical system has some things it offers and some it does not and it's really a mistake to think that It gives you all the security guarantees there because there's always going to be things outside of the threat model And the second point I want to make is that we absolutely should work on usability of these systems thinking how to make them usable How to make users not mistake? And that requires some work that's complementary to mylar right now We're trying to even figure out how can you technically do this before mylar there wasn't end to an encryption embedded with web apps at all Sure, I have a question about the Searchability you don't produce a reverse index on the server instead. We're encrypting each term with some randomness And that's why these attacks that reveal the order of keywords work. What's the difficulty with reverse indexing? No, so you can totally you can totally actually Randomize the order for each document you don't have to keep them in document order And we did actually have a reverse index Implementation as well and that reverse index index is done based on encryption So in some sense, they are they're randomized already so it will not reveal the order. Yeah, okay. Well, actually we're out of time So let's thank