 Thank you. All right. Welcome. The basic structure of this talk is sort of twofold. The first thing is to provide an overview of the different mechanisms that exist in this space of secure communication and try to tease apart a bunch of the individual choices and tradeoffs that have to be made and the implications of them. There's a lot of times we talk about security or privacy as very broad terms that cover a bunch of individual things, and breaking that down gives us a better way to understand what it is we're giving up or why these decisions actually get made for the systems that we end up using. And the way that it's going to sort of the arc that I'll cover is first trying to provide a sort of taxonomy or classification of a bunch of the different systems that we see around us, and from there identify the threats that we often are trying to protect against and the mechanisms that we have to mitigate those threats, and then go into some of these mechanisms and look at what's happening right now on different systems, and by the end we'll sort of be closer to the research frontier of what is still happening, where are places where we have new ideas, but there's still quite a high tradeoff to usability or for other reasons where these haven't gained mass adoption. So I'll introduce our actors, Alice and Bob. The basic structure for pretty much all of this is one-to-one messaging. So this is primarily systems that are enabling us to have a conversation that looks a lot like what we would have in person. That's sort of the thing that we're modeling is I want to have a somewhat synchronous real-time communication over a span of weeks, months, years, resume it, and in the same way that in real life I know someone and I recognize them when I come and talk to them again, I expect the system to give me similar sorts of properties. So the way we're going to then think about systems is initially we have systems that look very much the same as how we would have a real-life communication, where I can, on a local network, use AirDrop or use a bunch of things that just work directly between my device and a friend's device to communicate. So on a computer this might look like using Netcat or a command line tool to just push data directly to the other person. And this actually results in a form of communication that looks very similar. It's ephemeral, it goes away afterwards unless the other person saves it. But there's already a set of adversaries or threats that we can think about, how do we secure this sort of communication? One of those would be the network. So can someone else see this communication and how do we hide from that? And we have mechanisms against that, namely encryption. I can disguise my communication and encrypt it so that someone who is not my intended recipient cannot see what's happening. And then the other one would be the other, these end devices themselves. So there's a couple of things that we need to think about when we think about what is it that we're trying to protect against on an end device. One is there might be other bad software that either, you know, later gets installed and tries to steal or learn about what was said, either at the same time or afterwards. And so we have mechanisms there. One of them would be message expiry, so we can make the messages go away, make sure we delete them from disk at some point. And the other would be making sure that we've sort of isolated our chat so that it doesn't overlap and other applications can't see what's happening there. So we have these direct communication patterns, but that's a small minority of most of what we think of when we chat. Instead, most of the systems that we're using online use a centralized server. There's some logically centralized thing in the cloud, and I send my messages there, and it then forwards them to my intended recipient. And so whether it's Facebook or WhatsApp or Signal or, sorry, Slack or IRC or Signal or Wire or Threema or whatever, you know, cloud chat app we're using today, this same model applies. So we can identify additional threats here, and then we can think about why we do this. So one threat is the network, and I'll tear that apart a little bit. You've got the local network that we had before, so someone who's on the network near the person who's sending messages or receiving messages. So someone else in the coffee shop, your local organization, your school, your work, you've got the internet as a whole that messages are passing over. So the ISPs or the countries that you're in may want to look at or prevent you from sending messages. You've also got an adversary in the network sort of local or near the server that can see most of the messages going in and out of the server, because these services have to exist somewhere, be that in a data center that they physically have computers in or in AWS or Google or one of these other clouds. And now you've got a set of actors that you need to think about that are near the server that can see most of the traffic going in and out of that server. We also have to think about the server itself as a potential adversary. There's a few different threats that we need to think about. The server could get hacked or otherwise compromised, so parts of the communication or bugs in the software can potentially be a problem. You've got a legal entity typically that is running this server. And so the jurisdiction that it's in can send requests to get data from users or to compel it to provide information. So there's this whole threat of what is the server required to turn over. And then you've got sort of how is the server actually or this company making money and sustaining itself? Is it going to get acquired by someone that you don't trust even if you trust it now? So there's this future view of how do we ensure that the messages I have now don't get misused in the future. And we have a set of techniques that mitigate these problems as well. So one of them would be we can use traffic obfuscation or circumvention techniques to make our traffic look less obvious to the network. And that prevents a large amount of these. And then I'm calling this server hardening, but it's really a sort of a broad set of techniques around how do we trust the server less and how do we make those potential compromises of the server either code-based or it having to reveal information less damaging. It's worth saying that there are a bunch of reasons why we have primarily used centralized messaging. You've got availability. It's very easy to go to a single place and it also makes a bunch of problems like handling multiple devices and mobile push in particular because both Google and Apple expect or allocate sort of a single authorized provider who can send notifications to the app users, mobile devices. And so that sort of requires you to have a centralized place that knows when to send those messages if you want to provide real-time alerts to your application users. The cons is that it is both cost, right? There's some entity now that is responsible for all of this cost and has to have a business model. And also that there's a single entity that people can come to and that now faces the legal and regulatory issues. So this is not the only type of system we have, right? The next most common is probably federated. Email is a great example of this. And email is nice that now as a user I can choose an email provider that I trust out of many or if I don't trust any of the ones that I see, I can even spin up my own with a small group so we can decentralize cost. We can make this more approachable. And so while I can gain more confidence in my individual provider, I don't have as much trust in, you know, is the recipient, is Bob in this case? I don't know how secure his connection is to his provider, because we've separated and decentralized that. There's also a bunch of problems both in figuring out identity and discovery securely and mobile push. But we have a number of successful examples of this. So beyond email, the Fediverse and Mastodon, RiotChat and even SMS are examples of federated systems where there's a bunch of providers and it's not a single central place. As you continue this sort of metaphor of splitting apart and decentralizing and reducing the trust in a single party, you end up with a set of decentralized messaging systems as well. And so it's worth mentioning that as we sort of get onto this fringe. There's sort of two types. One is using Gossip protocol, so things like secure Scuttlebutt. And in those, you connect to either the people around you or people that you know. And when you get messages, you gossip, you send them on to all of the people around you. And so messages spread through the network. That is still an area where we are learning the tradeoff of how much metadata gets leaked and things, but is nice in its level of decentralization. The others basically try to make all of the users have some relatively low trusted participation in the serving infrastructure. And so you can think of this as evolving out of things like distributed hash tables that are used in BitTorrent. You see something very similar in things like Ricochet or TalksChat, which will use either Tor-like relays for sending messages or have an explicit DHT for routing, where all of the members provide some amount of lookup to help with discovery and finding other participants. OK, so let's now turn to some of these mechanisms that we've uncovered. And we can start with encryption. So when you're sending messages to a server, by default, there's no encryption. This is things like IRC. Email used to be primarily unencrypted. And you can think of that like a postcard. So you've got a letter or a postcard in this case that you're sending. It has where that message is coming from, where it's going to, and the contents. In contrast, when you use transport encryption, and so this is now a standard for most of the centralized things, what that means is you're taking that postcard and you're putting it in an envelope that the network can't open. And that's what TLS and other forms of transport encryption are going to give you, is the network link just sees the source and destination. It sees there's a message coming between Alice and Facebook or whatever cloud provider, but can't look into that and see that that's really a message for Bob or what's being said. It just sees individuals communicating with that cloud provider. And so, SMTPS, there are secured versions of IRC and email and most other protocols are using transport security at this point. The thing that we have now is called end-to-end encryption or E to E. And so now the difference here is the message that Alice is sending is addressed to Bob. And it's encrypted so that the provider, Facebook, can't open that either and can't look at the contents. Okay. So the network just sees a message going between Alice and Facebook still, but Facebook can't open that and actually see the contents of the message. And so end-to-end encryption has gained pretty widespread adoption. We have this in Signal for the most part in iMessage. We have tools like PGP and GPG that are implementing forms of this. For messaging, there's a few that are worth sort of covering in this space. The Signal protocol, which was initially called Axewaddle, is adopted in WhatsApp, in Facebook private messaging and sort of is, I guess, has generalized into something called the Noise Framework and is gaining a lot of adoption. OMEMO looks a lot like that specifically for XMPP. And so it is a specific implementation. The other one is called Off the Record or OTR. And Off the Record, sort of developed a little bit or independently from this, thinks a lot about deniability. I'm not going to go too deep into the specific knits of what these protocols are doing, but I guess the intuition is the hard parts here is not encrypting a message. But rather, the hard parts is how do you send that first message and establish a session, especially if the other person is offline? So I want to start a communication. I type in the first message I'm sending to someone. I need to somehow get a key and then send a message that only that person can read and also establish this sort of shared secret. And doing all of that in one message or with the other device, not online, ends up being tricky. Additionally, figuring out the mapping between a user and their devices, especially as that changes and making sure you've appropriately revoked devices, added new devices without keys falling over or getting too many warnings to the error, too many warnings to the user, ends up being a lot of the trick in these systems. There's two problems that sort of come into play when we start using end-to-end. One is we need to think about connection establishment. So this is the problem of saying who is Bob? So I find a contact and I know them in some way by an e-mail address, by a phone number. Signal uses phone numbers, you know, a lot of systems may be using e-mail address. There's things like Threema that use a unique identifier that they generate for you. But somehow I have to go from that identifier to some actual key or some knowledge of a cryptographic secret that identifies the other person. And I have to figure out who I trust to do that mapping of gaining this thing that I'm now using for encryption. And then also there's this, well, how do we match? So a lot of systems do this by uploading your address book or trying to match with existing contacts to solve the user interface problem of discovery, which is if they can already know the identifiers and have this mapping, then when someone new comes in they can suggest and have pre-found these keys. And you just sort of trust the server to hold this address book and to do this mapping between what they're using as their identifier and the keys themselves that you're getting out. Signal is nice here. It says it's not uploading your contacts, which is true. They're uploading hashes of your phone number rather than the actual phone numbers. But it's a similar thing. They've got a directory of known phone numbers. And then as people search, you'll search for a hash of the phone number and get back the key that you hope Signal has correctly given you. So there's sort of a couple of ways that you reduce your trust here. Signal has been going down a path using SGX to raise the cost of attacks, oblivious RAM, and a bunch of sort of systems mechanisms to reduce the cost or increase the cost of attack against their discovery mechanism. The other way that you do this is you allow for people to use pseudonyms or anonymous identifiers. So wire you can just register an anonymous email address. And now the cost to you is potentially less if that gets compromised. And it's worth noting Moxie will be talking tomorrow at 4 p.m. about the evolution of the space around Signal. So there's probably a bunch more depth there that you can expect. So what if we don't want to trust the server to do matchmaking? One of the early things that has been around is the web of trust around GPG. And this is the notion that if I have in real life or otherwise associated an identifier with a key, I can publicly provide a signed statement saying that I trust that mapping. And then people who don't know someone but have a link socially maybe can find these proofs and use that to trust this mapping. So I know an identifier and I know that I trust someone who has said, well, this is the key associated with that identifier. And I can use that network to eventually find an identifier that I'm willing to trust or a key that I'm willing to encrypt to. There's some user interface trade off here. This is a manual process in general. And this year we've had a set of denial of service attacks on the web of trust infrastructure. And so the specific attack is that anyone can upload these attestations to trust and so if a bunch of random users or sipples start uploading trusts, when you go to try and download this, you end up overwhelmed by the amount of information. And so the system does not scale because it's very hard to filter to people you care about without telling the system who you care about and revealing your network, which you're trying to avoid. Keybase takes another approach. They made the observation that when I go to try and talk to someone, what I actually care about is the person that I believe owns a specific GitHub or Twitter or other social profile. And so I can provide an attestation where I say, well, this is a key that's associated with the account that controls this Twitter account or this Reddit account or this, you know, Facebook account. And so by having that trust of proofs, I can connect an individual and a cryptographic identity with the person behind who has the passwords to a set of other systems. Keybase also this year began to provide a monetary incentive for users and then struggled with the number of signups. And so there's a lot of work in figuring out, okay, do these identities actually correspond to real people? And how do you prevent a similar denial of service style attack that the web of trust faced in identifying things here? On our devices, we end up in general resorting to a concept called tofu or trust on first use. And what that means is when I first see a key that identifies someone, I'll save that. And if I ever get another need to communicate with that person again, I've already got a key and I can keep using that same key and expect that key to stay the same. And so that continuation and the ability to pin keys once you've seen them means that if when you first establish a connection with someone, it's the real person, then someone who compromises them later can't take over or change that. Finally, one of the sort of exciting things that came out, this is circa 2015 and is largely defunct now, was a system by Adam Langley called Pond that looked at hardening a modern version of email. And one of the things that Pond did was it had something called a password authenticated key exchange. And so this is a evolving cryptographic area where you're saying if two people can start with some weak shared secret, so I can perhaps publicly or in plain text ask the challenge, the other person, where were we at a specific day? And so now we both know something that maybe has a few bits of entropy at least if we can write the same textual answer. We can take that, run a key derivation function to end up with a larger amount of shared entropy and use that as a bootstrapping method to do a key exchange and end up finding a strong cryptographic identity for the other person. So Pond has a system that they call Panda for linking to individuals based on a challenge response. And this is also something that you'll find in off-the-record systems that you'll find in the future. The other thing that we need to be careful about in end-to-end encrypted systems is deniability. When I'm chatting one-on-one with someone, that conversation is eventually fairly deniable. Either person can have their recollection of what happened and there's no proof that the other person said something unless you've recorded it or otherwise brought some other technology into play. But with an encrypted thing where I've authenticated the other person, I end up with a transcript potentially that I can turn over later and say, look, this person said this. And we've seen recently that things like emails that come out are authenticated in this way. The DKIM system that authenticates email senders showed up in the WikiLeaks releases of Hillary Clinton's emails and was able to say, look, the text in these hasn't been changed and it was signed by the real server that we would expect. So the thing that we get from off-the-record and signal protocol is something called deniability or reputability. And this plays into a concept of forward secrecy, which is we're going to sort of throw away stuff afterwards in a way that our chat goes back to being more ephemeral. And so we can think about this in two ways. We have two properties that interlink in this. We have keys that we're using to form our shared session that we're expecting to use to have our secret message. And each time I send a message, I'm going to also provide some new keying material and begin changing that secret key that we're using. So I provide a next key and when Bob replies, he's going to now use my next key as part of that and give me his next key. So what I do is when I send a message, I can provide the secret bit of my previous key. So I can say my last private key that I used to send you that previous message was this. And now at the end of our conversation, we both know all of the private keys, such that we both could have created that whole conversation on our own computer. At any given time, it's only the most recent message that only could have been sent by the other person and the rest of the transcript that you have is something you could have generated yourself. There is a talk on day three about off-the-record v4, the fourth version of that that will go a bit deeper into that. That's at 9 p.m. in the About Freedom Assembly. So I encourage you to do that if you're interested in this. Okay. The next one to talk about is expiry. This is sort of a follow-on to this concept of the word secrecy. But there are sort of two attacks here to consider. One is something that we should maybe, I guess, give credit to Snapchat for popularizing, which is this concept of the message goes away after some amount of time. And really this is protecting against not fully trusting the other person from like sharing it later or sharing it away you didn't attend. And this is also like a snapshot adversary, so a bunch of apps will alert you if you take a screenshot. This is why some apps will blank the screen when they go to the task switcher. So if you're swapping between apps, you'll see that some of your applications will just show a blank screen or will not show contents. And that's because the mobile operating systems, APIs, don't tell them when you're in that mode if you take a screenshot. And so they want to be able to notify you if the other person does. And it's worth noting that this is all just raising sort of a social incentive not to. I can still use another camera to take a picture of my phone and get evidence of something that has been said, but it's discouraging it and setting social norms. The other reason for expiry is after the fact, compromise of a device, so whether that's, someone gets hold of the device and tries to do forensic analysis to pull off previous messages or the chat database or whether someone tries to install an application that then scans through your phone. So that's, FungTai is an application that's been installed as a surveillance app in China. And this also boils down to a user interface and user experience question, which is how long are you going to save logs? How much history are you going to save? And what norms are you going to have? And there's a trade-off here. It's useful sometimes to scroll back and especially for companies that believe that they have value-added services around being able to do data analytics on your chat history. They're wary of getting rid of that. The next thing that we have is isolation and OS sandboxing, right? So this is, a lot of this is up one layer, which is what is the operating system doing to secure your application, your chat system from the other things, the malware or the compromises to the broader device that it's running on. We have a bunch of projects around us at Congress that are innovating on this. There are chat systems that also attempt to do this sort of on their own. One sort of extreme example is called tinfoil chat, which makes use of three devices and a physical diode, which is designed to have one device that is sending messages and another device that is receiving messages. So what is if you receive a message that somehow compromises the device, the malware or the malicious file can never get any communication back out and so becomes much less valuable to have compromised. And they implement this with like a physical little hardware diode. The other side of this is recovery and backups, which is you've got a user experience trade-off between a lot of people losing their devices and wanting to get back their contact list or their chat history and the fact that now you're keeping this extra copy and have this additional place for things to get compromised. Apple has done a lot of work here that we don't look at so much. They gave a black cat talk a few years ago where they discuss how they use custom hardware security modules in their data centers, much like the T2 chip in the end devices that will hold the backup keys that get used for their iCloud backups and do similar amounts of rate limiting. And they consider a set, a pretty wide set of adversaries more than we might expect. So including things like what happens when the government comes and asks us to write new software to compromise this. And so they set up their HSM such that they cannot provide software updates to them, which is, you know, this cloud security side that we don't think about as much. So there's a set of slides that you can find from this and these slides will be online too as a pointer to look at their solution, which considers a large number of adversaries that you might not have thought about. So traffic obfuscation is primarily a network side adversary. The technique that is getting used and sort of what people are using if they feel they need to do this is something called domain fronting and domain fronting had its heyday maybe in 2014-ish and has become somewhat less effective but is still effective enough for most of the chat things. The basic idea behind domain fronting is that there's a separation of layers behind that envelope and the message inside of it that we get in HTTP in the web. So when I create a secure connection to a CDN, to a content provider like Amazon or Google or Microsoft, I can make that connection and perform the security layer and provide a fairly generic service that I'm connecting to. I just want to establish a secure connection to Cloudflare and then once I've done that the message that I can send inside can be a chat message to a specific customer of that CDN or that cloud provider which is a very effective way to prevent the network from knowing what specific service you're accessing. It got used for a bunch of circumvention things. It then got used for a bunch of malware things and this caused a bunch of the cloud providers to stop allowing you to do this. But it's still getting used. This is still what's sort of happening when you turn on a specific technique is getting another revival with DNS over HTTPS and encrypted SNI extensions to TLS which allow for a standardized approach to establish a connection to a service without providing any specific identifiers to the network for which service you want to connect to. It's worth sort of mentioning that the probably the most active chat service is telegram which has a bunch of users in countries that are not fans of having lots of users of telegram and so they have both systems where they can bounce between IPs very quickly and change where their servers appear to be and they've also used techniques like sending messages over DNS tunnels to mitigate some of these censorship things. Accessing their user population they're not really thinking about your local network or caring about that as much as as much as they are like oh there's millions of users that we should probably have still have access to us. So we can maybe hide the characteristics of traffic in terms of what specific service we're connecting. There's some other things about traffic though that also are revealing to the network and this is sort of this additional metadata that we need to think about. So one of these is padding or the size of messages can be revealing. So one sort of immediate thing is the size of a chat or a text message is going to be very different from the size of an image or voice or movies and you see this on airplanes or another bandwidth limited settings they might allow text messages to go through but images won't. There's been research that shows for instance on voice even if I encrypt my voice we've actually gotten really good at compressing audio of human speech so much so that different phenomes, different sounds that we make take up different sizes and so I can say something compress it, encrypt it and then recover what was said based on the relative sizes of different sounds. So there was a paper in 2011 at Oakland S&P that demonstrated this potential for attacks and so what this is telling us perhaps is that there's a trade-off between how efficiently I want to send things and how much metadata or revealing information for distinguishing them I'm giving up. So I can use a less efficient compression that's constant bit rate or that otherwise is not revealing this information but it has higher overhead and won't work as well in constrained network environments. The other place that this shows up is just when people are active so if I can look at when someone is tweeting or when messages are sent I can probably figure out pretty quickly what time zone they're in. And so this leads to a whole set of these metadata-based attacks and in particular there's confirmation attacks and intersection attacks and so intersection attacks is looking at the relative activity of multiple people and trying to figure out okay when Alice sent a message who else was online or active at the same time and over time can I narrow down or filter to specific people that were likely who Alice was talking to. Pond also is a service to look at or a system to look at in this regard their approach was that a client would hopefully be always online and would at a regular pattern check in with the server with the same amount of data regardless of whether there was a real message to send or not so that from the network's perspective every user looked the same the downside being that you've now got this message being sent by every client every minute or so and that creates a huge amount of overhead of just padded data that doesn't have any meaning so finally I'll take a look at server hardening and the things that we're doing to reduce trust in the server there's a few examples of why we would want to do this so one is that you've had messaging servers plenty of times that have not been as secure as they claim one example being that there was a period where the Skype subsidiary in China was using a blacklist of keywords on the server to either prevent or intercept some subset of their user's messages without telling anyone that they were doing that and then also just serve this uncertain future of okay I trust the data now but what can we do so that I don't worry about what the corporate future of this service entails for my data there one of the sort of elephants in the room is that the software development is probably pretty centralized so even if I don't trust the server there's some pretty small number of developers who are writing the code and how do I trust updates that they are making to this either the server or to my client that they pushed to my client isn't reducing my security open source is a great start to mitigating that but it's certainly not solving all of this so one thing one way we can think about how we reduce trust in the server is by looking at what the server knows after end to end encryption it knows things about the size it knows where the message is coming from it knows where the message is going to size we've talked about some of these things that we can use to mitigate so how do we reduce the amount of information about sources and destinations and this network graph that the server knows so this is a concept called linkability which is being able to link the source and destination of a message we start to see some mitigations or approaches to reducing linkability entering mainstream systems so Signal has a system called sealed sender that you can enable where the source of the message goes within the encrypted envelope so that Signal doesn't see that the downside being that Signal is still seeing your IP address but the thought is that they will throw those out relatively quickly and so they will have less logs about this source to destination theoretically though there is a bunch of work in this the first thing I'll point to is a set of systems that we classify as mix nets a mix net works by having a set of providers rather than a single entity that's running the servers a bunch of users will send messages to the first provider which will shuffle all of them and send them to the next provider which will shuffle them again and send them to a final provider that will shuffle them and then be able to send them to destinations and this delinks where none of the individual providers know both the source and destination of these messages so this looks maybe a bit like Tours Onion Routing but differs in sort of a couple of technicalities one is that typically you will wait for some number of messages rather than just going through with bandwidth and low latency and so by doing that you can get a theoretical guarantee that this batch had at least N messages that got shuffled or you can prevent there being some time where only one user was using the system and so you get a stronger theoretic guarantee there's an active project making a messaging system using MixNets called cats in post they gave a talk at camp this summer and I'd encourage you to look at their website or go back to that talk to learn more about MixNets the project that I was I guess tendentially helping with is in a space called private information retrieval it's another technique for doing this delinking and so private information retrieval frames the question a little bit differently and what it asks is if I have a server that has a database of messages and I want a client to be able to retrieve one of those messages without the server knowing which message the client got or asked for so this sounds maybe hard I can give you a strawman to convince yourself that this is do-able as the server for its entire database and then take the message that I want and the server hasn't learned anything about which message I cared about but I spent a lot of network bandwidth probably doing that so there's a couple constructions for this I'm going to focus on the information theoretic private information retrieval and so we're going to use a similar setup to what we had in our threat model for a MixNet which is and I'm going to assume that they're not all talking to each other or colluding so I just need at least one of them to be honest and one of the things that we'll use here is something called the exclusive or operation to refresh your memory here exclusive or is a binary bitwise operation and the nice property that we get is if I XOR something with itself it cancels out so if I have some piece of data and I XOR it so if I have my systems that have the database I can ask each one to give me a superposition of some random subset of its database so I can ask the first server give me items 4, 11, 14 and 20 XOR together I'm assuming all of the items are the same size so that you can do these XORs and then if I structure that it can appear that each server independently or as in the request that it sees that I just ask for some random subset but I can do that so that when I XOR the things I get back everything just cancels out except the item that I care about unless you saw all of the requests that I made you wouldn't be able to tell which item I cared about so by doing this I've reduced the network bandwidth I'm only getting one item of size back from every server now you might have a concern asking the server to do a whole lot of work here it has to look through its entire database and compute this super position thing and that seems potentially like a lot of work right the thing that I think is exciting about this space is it turns out this sort of operation of going out to a large database and like searching for all of the things and then coming back with a small amount of data looks a lot like the hardware like things and so this runs really quite well on a GPU where I can have all of those thousands of cores compute little small parts of the XOR and then pull back this relatively small amount of information and so with GPUs you can actually have databases of gigabytes tens of gigabytes of data and compute these XORs across all of it in order of a millisecond or less so a couple things in this space TALIC is the system that I helped with that demonstrates this working the converse problem is called private information storage and that one is how do I write an item into a database without the database knowing which item I wrote the mathematical construction there is not quite as simple to explain but there's there's pretty cool new work in the last month or two out of Dan Bonet and at Stanford called Express and Saba's first author that is showing how to fairly practically perform that operation cool I'll finish just with a couple minutes on multi-party chat or group chat so small groups you've sort of got a choice here in terms of how chat systems are implementing group chat one is you can not tell the server about the group and as someone who's part of the group I just send the same message to everyone in the group and maybe I tag it for them so that they know it's part of the group or you do something more efficient where you tell the server about group membership and I send the message once to the server and it sends it to everyone in the group even if you don't tell the server about it though you've got a bunch of things to worry about about leaked correlation which is if at a single time someone sends the same size message to five other people and then later someone else sends the same size message to five other people and those basically overlap someone in the network basically knows who the group membership is so it's actually quite difficult to conceal group membership the other thing that breaks down is our concept of deniability once again which is now if multiple people have this log even if both of them individually could have written it the fact that they have the same cryptographic keys from this message so there continues to be work here signal is working on providing again an SGX and centralized construction for group management to be able to scale better given I think the pretty realistic fact that the server in these cases is probably going to be able to figure out group membership in some case you might as well make it scale on the other side one of the systems that's being prototyped is called CUTCH out of open privacy and this is an extension to Ricochet that allows for offline messages and small group chats it works for order of 5 to 20 people and it works by having a server that obliviously forwards on messages to everyone connected to it so when I send a message to a group the server sends the message to everyone it knows about not just the people and therefore the server doesn't actually know the subgroups that exist it just knows who's connected to it and that's a neat way it doesn't necessarily scale to large groups but it allows for some concealing of group membership they've got an android prototype as well that's sort of a nice extension to make this usable wonderful I guess the final thought here is there's a lot of systems I'm sure I haven't mentioned all of them but this community is really closely tied to the innovations that are happening in the space of private chat and this is the infrastructure that supports communities and is some of the most meaningful stuff you can possibly work on and I encourage you to find new ones and look at a bunch of them and think about the tradeoffs and encourage friends to play with new systems how they gain adoption and how people figure out what mechanisms do and don't work so with that I will take questions wasn't necessary to encourage you to come with an applause there are microphones that are numbered in the room so if you start lining up behind the microphones then we can take your questions we already have a question from the internet popularity and independence are a contradiction how can I be sure that an messenger like signals stays independent I guess I would question whether independence is a goal in and of itself it's true that the value is increasing and so one of the things to think about is using systems that have open protocols or that are federated or otherwise not centralized and again this is reducing that need to have confidence in the future business model of a single legal entity but I don't know independence is of the company is the thing that you're trying to trade off with popularity cool and we have questions at the microphones we'll start a microphone number one thanks for the talk first of all we talked you talked a lot about content and encryption what about the initial problem history shows that if I'm an individual already a plurved in a sensitive area there might no need to encrypt decrypt the message I'm sending it's already identified I'm sending at a specific location at a specific time is there any chance to hide that or do something against it so it makes things hidden again after the fact that seems very hard I mean so there's there's a couple thoughts there maybe there's sort of a real world intersection attack which is if there's like a real world observable action of like who actually shows up at the protest that's a pretty good way to figure out who was chatting about the protest beforehand potentially and so I mean I think what we've seen in real world organizing is things like either really decentralizing that where it happens across a lot of platforms and happens very often so there's not enough time to respond in advance or using yeah or hiding your presence or otherwise trying to stagger your actual actions so that they are harder to correlate to a specific group it's not something the chat systems are talking about I don't think we have time for more questions so please line up in the microphones and if you're leaving then leave quietly we have a question from microphone so if network address translation is the origin sin to the end to end principle and due to that we now have to run servers someone has to pay for it do you know any solution to that economic problem I mean we had to pay for things even without network address translation but we could move more of that cost to end users and so we have another opportunity with IPv6 to potentially keep more of the cost with end users or develop protocols that are more decentralized where that cost stays more fairly distributed our phones have a huge amount of computation power and figuring out how we make our protocols so that work happens there is I think an ongoing balance I think some of the reasons why network address translation or centralization is so common is because distributed systems are pretty hard to build and pretty hard to gain confidence in so more tools around how we can test and feel like we understand and that a system actually is going to work 99.99% of the time for distributed systems is going to make people less wary of working with them so better tools on distributed systems is maybe the best answer we also have another question from the internet right now what do you think of technical novices acceptance and dealings with OTR keys for example matrix riot most people I know just click I verify this key and even if they didn't absolutely so this I think goes back to a lot of these problems are sort of a user experience tradeoff which is you know we saw initial versions of signal where you would actually try and regularly verify between each other and then that sort of has gotten pushed back to a harder to access part of the user interface because not many people wanted to deal with that and in early matrix riot you would get a lot of warnings about there's a new device do you want to verify this new device do you only want to send to the previous devices that you trusted and now you're getting the ability to sort of more automatically just sort of accept these changes and you're weakening some amount of the encryption security better smoother user interface because most users are just going to sort of click yes because they want to send the message right and so there's this tradeoff of when you have built the protocol such that you are standing in the way of the person doing what they want to do that's not really where you want to put that friction so figuring out other ways where you can have this on the side or supporting the communication rather than hindering what you should be thinking about that can be successful we have a couple of more questions we'll start at microphone number three yeah and thank you for your talk you talked about diniability by sending the private key with the last message and how do I get the private key for the last message in the whole conversation in the OTR XMPP Jabber systems there would be action to end the conversation that would then make it repudiatable that would send that final message to close it what you have in things like signal is it's actually happening every message as part of the confirmation of the message okay thank you we still probably have questions time for more questions so please line up if you have any don't hold back we have a question from microphone number seven so first of all brief comment the riot thing still doesn't even do tofu they haven't figured this out but I think there's a much more subtle conversation that needs to happen around the diniability because most of the time if you have people with a power imbalance the non repudiatable conversation actually benefits the weaker person so we actually don't want diniability in most of our chat applications or whatever but that's still more subtle than that because when you have people with equal power maybe you do it's kind of weird absolutely and I guess the other part of that is that something that should be shown to users and is that a concept is there a way that you express that notion in a way that users can understand it and make good choices or is it just something that we have one more question at microphone number 7 please line up if you have any more we still have a couple of more minutes microphone number 7 please thanks for the talk we talked about the private information retrieval and how that would stop the server from knowing who retrieves the message but for me the question is how do I find out in the first place which message is for me because if I had 14 then obviously over a conversation it would again be possible to de-anonymize the users in like okay they always accessing this one in like all those queries absolutely so I didn't explain that part the trick is that between the two people we will share some secret which is a conversation secret and what we will use that conversation secret for is to seed a pseudo random number generator and so we will be able to generate the same stream of random numbers and so each next message will go at the place determined by the next item in that random number generator and so now the person writing can just write at random places as far as the server tells and when it wants to write the next message in this random number generator for that conversation there's a paper that will describe a bunch more of that system but that's the basic sketch thank you we have a question from the internet it seems like identity is the weak point of the new breed of messaging apps how do we solve this part of SoCo's triangle the need for identifiers and to find people identity is hard and I think identity has always been hard and will continue to be hard having a variety of ways to be identified I think remains important and is why there isn't a single winner takes all system that we use for chat but rather you have a lot of different chat protocols that you use for different circles and different social circles that you find yourself in and part of that is our desire to not be confined to a single identity but to be able to have different facets to our personalities there are systems where you can identify yourself as a unique identifier to each person you talk to rather than having a single identity within the system so that's something else that pond would use was that the identifier that you gave out to each separate friend was different and so you would appear as a totally separate user to each of them it turns out that's at the same time very difficult because if I post an identifier publicly so you have to give these out privately in a one-on-one setting which limits your discoverability so that that concept of how we deal with identities I think is inherently messy and inherently something that there's not going to be something satisfying that solves and that was the final question concluding this talk please give a big round of applause for Will Scott thank you