 My name is Tom. Today I'll be talking to you about servers, why they can't be trusted, and how we mitigate that in EtterSync, which is a project I work on. And this is, you know, you can find the slides over there on the Fossum website. You can scan the QR for it to get the slides. And this is it. Let's jump in. So, first of all, let's look at, like, simple server communications. So, like, the most basic thing, how it looks like. So you have Alice, the user, probably she uses TLS or whatever variation, and connects to the server, so everything is encrypted. And let's, you know, for the purpose of this talk, let's assume that everything between Alice and the server is safe. There's no man in the middle, although this is actually a risk, especially if you use a company laptop and they store a root certificate or anything. It is a risk, but let's assume not. Let's just only talk about the servers today. So, let's start by that communication. What are we leaking when we do that? So, first of all, we have data. Actually, if we use Gmail, for example, we leak the emails, we leak the calendars, we leak any personal notes that we may have, our deeper secrets, secret business information. You know, we're going to do a minute, like, by this company or whatever. And also, there's, like, what's usually looked, I mean, as less important, like the metadata, so the IP address, which, you know, if you're a government, you can link it back to a certain person, but at the very least, you can probably get location information out of it, like a social graph, so who do I know, who knows me, group of friends, I'll cover that more in a second. Time of access, so when in the day, I log in, so if I always log in, let's say from 9 to 5 in London time, you can probably assume, you know, this is my business hours, I work in London, I live in London, so that also a lot of information is leaked there. Which data is used and how often? So, for example, you will always call a certain person, you know that person is a person of interest. Always open a certain file, you know that file, like you just hacked my computer and you want to know, you saw, like, the last 10 files I used, those are probably more important than my book report from 6th grade. And again, when specific data is accessed, so for example, if I have my computer, during work hours, I open this, during after hours, I open this, so you know this is personal information, this is work information. There's a lot actually being leaked there. So let's look at some parts in specific. So let's look at exploiting social graphs. So we have a group of people, we know that we have four, I don't know, rebels or whatever we're trying to catch. We already caught three of them, A, C and E. And now, just looking at the graphs, like by who they know, so let's assume they all use signal. So just by getting access to the signal server, which signal is usually, I mean, it's encrypted, everything like best practices, but still because they leak the social graph, like who knows who, we can already know, we can know that B, just like you see by all the connections, that B is actually part of the group. And we also can see that D, that we actually never knew existed, is also part of that group. So all of this, by using best practices and to an encryption, all of that, this is a leak. And another example actually I really like giving is exploiting access patterns. I don't remember which government agency was that wrote a report about this, but these are two mobile phones. And the graphs are access times to the server. So like when they, essentially when they're on. So one of them is a burner phone used by a spy, and the other one is a normal phone. It's very easy to see. I mean, it's a good idea to turn your phone off because then you can't be triangulated, but you see, you still leak a lot of information. Now all of a sudden you're a target, and if you're A, you're not a target. Your information is no, but you don't stick out. But you know, this is all like a bit theoretical. So let's take it to something more practical, which I care about. A cloud app, sorry. So cloud app, essentially, is how most information, like most address books nowadays are synced. It's used by iCloud, NextCloud, OnCloud, everything with cloud in the name, essentially. And it's a good, you know, it's a standard, so it's great, it's inter-portable, supported in every client. But let's look at roughly how it works. So you have Alice, Bob, Cher, all of those are my contacts. You just have files on the server in clear text with all of the contact information in each one of them. So let's look at how, you know, what's leaked there. So first of all, all the address books information. So yeah, I have, if I hack the server, I have all the, you know, I have all the phone numbers, emails, positioning the company, relationship if you put it, like I'm very pedantic. I have addresses, I have everything in my address book. IP address, obviously, you're connecting to a server. And social graph because you can see which contacts I have and which contacts have me and what groups I group them. So if I had, oh, people that work on enlightenment, which is another project I work on, this already gives you a very good image of groups. And time of access, obviously, when I access the server. And again, which data, like recover, which data, when do I use it and everything. So let's, you know, let's solve some of them. Like it's not a big deal. You can use Tor to hide the origin. You know, Tor is not always perfect because first of all, it's a bit slower nowadays. Sorry, it's a bit slower. But also it doesn't work with all services. So Cloudflare and another, like other services, flag it because a lot of spammers use it. So it's, you'll have issues. You'll have to fill in a lot of captures while using the anything. And another thing, which I don't know actually if this is verified, but I would assume that every government agency flags you as a person of interest the moment you use Tor. Like, it's easy to detect and I'm sure they do it. But it's not verified, so don't take me to it. And we can try to control the access patterns. So I think, I don't remember if it's true, but I think Satoshi Nakamoto, the guy who did them, or guy or guys or girls or, I don't know, what combination of things, that did Bitcoin, used to write posts and emails, like in different times of the day. So you can't know which continent, which country, which everything is from. And another example of access patterns is when, another Bitcoin example, actually, there really, I don't know, it was a lot in the news, is when the guy behind Silk Road, Ross Ulrich was caught, they were monitoring the server and waiting for him to log in. And then by just seeing this access pattern, they were waiting for him to unlock everything that's encrypted. So they waited until logged in. And then this was enough of a cue for them to raise him and, you know, and go after it. And this again, another very easy leak, that it's obvious, but we leak. And another solution, obviously, is trusting the server. So using a trusted provider, I don't know, Google or whatever, we do trust or hosting your own. Obviously, the title of this talk kind of means that I don't believe in many of that. So should we even trust the server? It could get hacked. We get a lot of hacks. Everything is hacked nowadays. And, you know, it's, every server is running a lot of complex software. It's so many, you know, so many places to have issues. And even at the sync, I don't even know how many dependencies I depend on because I want, you know, I want the Java, like for the Android client, I want the Java runtime library and I want probably some, not Google stuff, but I don't know. I have Gson, like passing JSON. There's so such a big attack surface nowadays that just assume you're going to get hacked. So it's better to protect them and be proactive. And it could get stolen. Like, literally, someone could break into your house if you're self-hosted at home, still the hard drive, and just steal all your data. Again, maybe it's not, it's not interesting for you, but if you're a high-profile lawyer that does a lot of, you know, high-profile cases, you are a target that will get attacked and this is a real threat. And with hosted, you can have rogue employees. So employees are actually selling your data because they're evil or, I don't know why else they would do that. They're compelled by the government or compelled by the mafia or whatever. All of those risks that exist. And also, when it comes to self-hosting, and don't get me wrong, I think everyone should self-host, it's a lot of work. You have to be a techie, at least to an extent to do it. You have to make sure that your finger is on the pulse with really making sure that you're always updated to all of the latest security issues and all of that. So it's a lot of work and it's not that available. So hosted solutions, I think are essential to make a private internet or private world. And this is why, again, I try to make sure to create like an environment where the server can't be trusted and we can use hosted solutions safely. So let's talk about reducing service trust. So first of all, end-to-end encryption. I mean, I'm biased, but don't use a project that does not or a project or a service that does not do end-to-end encryption. It comes with flaws, but for example, if you lose your key, all your data is gone. This is why it's not mainstream and everywhere, but still, you know, there are solutions to that as well, like escrow keys, like just end-to-end is the bare minimum. And you can do mostly offline operation if possible. So let's go back to the COWDAF example. Let's assume I want to write a web client. So normally how I would write it, you have a website, you click on a link, open a certain page, and when that page is open, it's requested from the web server. So I want to see Alice's contacts. I click on Alice. The page is requested from the server, so the server knows I viewed Alice. Another alternative would be to write it all in JavaScript and then on the first time you visit the website, all of your contact list, which is probably, if you're crazy, one megabyte, which is less than my logo, most likely. I mean, in most, or the JavaScript framework, it's really low compared to everything else that you serve. And then always access the data from the local storage, from the local cache, and then you don't leak that information. So there are a lot of solutions to make that possible. And another thing, and I put in a question mark because you can fake access patterns, so you can, for example, when you edit a contact, you can fake edit five others every time randomly, so you don't know which one you edited, but I'm sure I'm not a mathematician, but I'm sure that with enough statistic this, I mean, it will be uncovered. Yes, I don't know if this is actually a sustainable solution. So let's talk about hard and card-off. So we know the problems, we know how to solve them, let's start solving them. And so, first of all, as you can see, we now use IDs instead of the names, so everything is secret. We enter any script, all the data, so nothing is visible to the server. And you can see which one is changed or anything like that, but there are still issues. For example, let's assume I just, you know, I finished this talk, I met, actually just before, I met Caleb, we exchanged contact information, I put it on, now all of you, including whoever's seen it online, knows that the moment, let's see, this file was created, it's Caleb, like you don't need to know the data, but you know by the date of creation, and the fact it was on camera, that it was added, this is enough, this is enough, and now every time, for example, I edit it, I call him, and it updates the last time he called, you can see, oh, Tom called Caleb. It's like all of this information, like it's not obvious, oh, it's encrypted, it's fine, no, it's not fine, because it's still identifiable. So it is a risk, and it's a risk that's not solved by end to end encryption, just like that. Okay, let's go, I mean, this is it, I mean, you know, like we say, this is all we can do, but it's not actually, because we can do much better. And let's go at it. So as I said in the beginning, our data, like our servers cannot be trusted, so our data can also be manipulated by the server. Let's look how. So I don't know if a lot of you do encryption, but like a basic example, like of bad crypto, is encrypting, but not signing, or encrypting, but not doing any sort of message authentication. So let's look at a crazy example, which I mean, it doesn't exist in real life, but like similar examples do exist. So I want to have some access level storage in my app. Let's, I don't know why, like I want to, this is the people that my app automatically support, automatically will expose accept requests to join a group. I don't even know what it does. So it's encrypted, so I know the server can touch it. And the default level is zero zero, so no level. But just by changing one bit, or whatever you want, you can just scramble and write, give another encrypted chunk. All of a sudden, like it changes to 19, like the level. So it's, I can't control it. I don't know how to change it to be one, but 19 is already good enough, especially if you do Boolean comparisons, so you do true or false. So I mean, this is just by being able to touch the data without it's end to end encrypted, I can't see it. This is enough to manipulate me as a user. Like again, this is not a real problem, you just HMAC or like use message authentication codes or signatures to make sure no one does it. But this is a real problem. Let's look at another thing. So we have, we have the hardened card of, and I just added again, a contact, or I just added a to-do item, or I just added a calendar event, because let's say I have a court hearing next week, I just added it, and my adversary, like the people on the other side, obviously, so they saw I added it, they hacked my server and just omitted this file. I have no way of knowing, I mean everything again, end to end encrypted, everything is safe, but the file is omitted. There's no way for me to know the file was omitted because this is normal operation, of course. Files get deleted, get added, I add a contact, I remove a contact. Like so this is another problem, and again a server can't be trusted. So this is easy to solve. You just verify the state, so you do any sort of signature authentication code or whatever, and then like every time this is changed, I will get notified. But again, there's another problem with that, which is data rollback. So let's assume I had, like I just had a state that is valid just now, I added a contact, now people want to remove this contact, all they need to do is serve me, again the old state before I added a contact. This is it, so again a server, no access to my data, I'm signing the whole state, I'm making sure that the state cannot be manipulated, it's still I can get attacked. Which is again, don't trust your server, never. So the solution I came up with, and I mean it's not, obviously it's not me, it's like built on top of like a lot of work by a million of other people, is tamper proof journals. So what it is essentially, is that instead of adding files, instead of adding contacts, what I do every time I add an entry, and so for example here I add Alice, and then I added Bob, and then I realized I had a typo, so I changed Bob, and then I deleted Alice, because we're not friends anymore, and all of this is just like every time, append only, append only, so you can't really know which data is which, because everything looks like, actually let's jump to this, like an encrypted blob, so this is how it actually looks to the server, something happened, something happened, and everything is end to end encrypted, so there's no way for the server to know what happened, and also the server can't manipulate anything, because the UID is every ID of every chunk on the block is a verification of the contact in the previous block, it's essentially like get, just encrypted and signed, and so every time, if I try to remove this, I'll get an integrity error, if I try to just switch them, I'll get an error, again, an error, and all of this protects from that, and because I can't remove anything, I only append, I can verify that nothing is missing, because if I added, I added a contact here, and then I added a contact on another phone, so the server can potentially just not show me the new edition, it can show me, it trimmed the log for me, but then when I try to add again on the other one, it will get a clash, there's no way for the data to be correct, unless again they could diverge it completely, but then the server has to identify each time to know exactly which is which client, which is a difficult task. Okay, so let's go on. So let's see how we protect. So as I said, it's immutable, so data can only be appended, no modifications are allowed, everything is verified by the client every time, like get, so every time you had an error with get, you would get the same error here if a server was trying to manipulate your data, it can't be manipulated or faked, you know it's you. Again, PreviewD is also signed, so there's no emission or reordering, and every, again, like get, as you can see I very much like get, everything is distributed among all clients, so let's assume the server, and this is something I didn't talk about actually in the hardened card, let's assume a server was hacked and everything was deleted, this is it, I'm like actually you know what, I don't care about him finding out, but if, again sorry, but if everything is distributed to the clients and every client does not say oh, it's missing on the server, it means it's deleted, I should delete it here, instead they verify it, so in the case of mass deletion they would just not do anything, there's clients, and we'll be able to recover all of the data from the clients themselves. Let's talk about previously unsolved attacks, they were not solved in the cardoff case, so which data is accessed and modified, everything is done locally, so there's no access, like no one cares about access, and modified, you know there's a modification, but you don't know which one, because that part is also encrypted, so this is actually, it's not entirely true, because for example you can say, you can analyze, do some analysis to know by the size of the message, so for example a contact is probably one kilobyte, but a calendar event is maybe, actually it's probably the same size, but like an image would be much larger, so you could do some analysis to figure out what's what, but again like I said in the previous example, you can fake it, you can fake it, and you can hide it, and you can create fake messages, but it's wasteful. Data omission and rollback, we solved as I said, so that's just like a few words about what Edison is, and how it's used there, like a real world example, so essentially it's secure and encrypted, and you can use it on your own, obviously, use the journal, and personal information, Cloud Sync, so essentially, Syncs you call context and calendar among your Android devices or web, or there's also CalDAV and CalDAV proxy for the desktop, so you can just use Thunderbird or whatever you use. Yeah, I use it everywhere, it's seamless on Android, so like you just use the same apps you always use and it works like a Google account essentially, just encrypts in the background. And the journal format is very simple, you have the UID that I mentioned, and then like in every time, every, like I change, I snapshot, again like it, I snapshot the whole format itself, so all of the calendar event is snapshot every time, so even if I delete, I have a latest snapshot of what happened. Another nice benefit of this is the change journal, so now I know this is like my shared reminder as calendar, so I know I need to feed the cats, go to the supermarket, and then like all of a sudden I can see and someone deleted it, later. So I know that for example my lazy flatmate decided to remove that task that we gave him to do, and you can really see who changed what, when he was changed and also obviously recover lost data, because if the event was deleted now we can find all the content that was there, and also which is super useful is finding entries based on date, so for example I met someone today, added it in the contact list, it's been two weeks, I want to send them, I have no idea, I forgot the name, I have no idea how to find it, I just go back, it's like oh, when was that weekend, yeah, though it's one of those three, I probably will recognize by that, by then, this saved me a handful of times. One more, sorry, one more thing actually and then I finish is signed pages, so as Caleb said in the talk before, one of the biggest problems with serving JavaScript is that you cannot trust the JavaScript is actually really what it is, because the server serves the JavaScript, which is the app every time, so a malicious server, for example, could just serve you a malicious piece of JavaScript that sniffs all your passwords and steals everything, and you'd be none the wiser. Like Caleb, I was looking for solutions and I decided to write a browser extension that essentially verifies the PZP signatures of pages, so it's called sign pages, Devs signs the web pages, users add the expected public key for a page, and the page, like the URL, and then the extension verifies the signatures, as you can see, this is a page with a good signature, a page with a bad signature, and obviously you have external JavaScript and CSS, so you probably want to use sub-resource integrity for all of those, and that's already a mechanism in the browser that's safe and verifies all of that, so essentially by verifying the HTML, the main entry point for your app is that you verify everything, and that's at the same level, almost the same level of security that you get with native apps in the browser, and in the future we're trying, like I'm working with Daniel, the airborne IO, which is like a Google Docs N2N Encryptor, yeah, he's there, and we're trying to do it also for service workers, so you can have it in progressive web apps, so like web apps that you just have an icon on your phone, because they're not verified, because there's no browser there to run it. Yeah, so this is it, just two finishing notes, I mean privacy is a secret right, don't give it up, and once you give it up you give it up for everyone, because then we outliers that care about it are signaled out, and don't forget, oh it's disappointing, no, you're the weakest link, so I mean this is a famous XKCD comic, like in the end doesn't matter how much encryption you use, if in the end like someone hits you with a wrench and tries to steal your password, so make sure if you have really important information, like double redundancy, so encrypted by multi keys, like essentially thanks to cryptocurrencies, there's a lot of information nowadays about advanced crypto, so this is that, and useful links, if you want website, download at the sync, yeah, and my blog, sign pages, a lot of information there, and no time for questions I assume. I have time for questions, okay, so, yes. Question? Over there. By the way, just some good, attribution, like I took a few icons from other places, it's in the slides, if you download them you can see like, and obviously the XKCD comic, so just for good measure, back to this, yeah. Thank you, so your journal appears like a single file? Okay, so the way it's actually implemented, it's implemented in a database, so just like add an entry to the database and all the servers can see is the UID and encrypted content. Okay, so, what about scalability? Did you tackle the issue? Have you thought about it? So, I don't, I haven't, I mean I've never worked on a company like Facebook or Google that deals with a million of users, I have no idea how it will behave in this level, but at the moment it's not even bothering my server, it's not, there's nothing there and it just uses normal SQL, it's just like you add entries, there's nothing there, it's like, and because I rely on existing technologies, it's not too similar from Git, it's not too dissimilar from that, it's not, it's very similar to other things, I don't see any scalability issues in the horizon, essentially. Any more questions? Yeah, over there. Since the data set is always appending, do you, do you take care of trying to make it smaller after using it for a long time? Like if you add contacts, delete contacts, etc. Like after 30 years, I mean I still want to have my contacts encrypted, so would the data set ever increase and download it on my mobile phone, maybe become higher or? Okay, so I mean it's a good concern and you know again like in Bitcoin all of that there are solutions to that, to an extent I think, like I mean you could download only chunks of it and verify those, but to be fair, it's contact information and calendar, like I challenge you to like get to a gig even, and like by the time you go to a gig, like we'll have enough storage, like it's really not a concern, like I don't see how we'll ever reach a point that it's a concern. And I have a lot of users that rewrite and change and add and still we never reached anything that's even scratching the surface, like I think the whole at the base is maybe like 200 megs and not even, actually not even, so yeah it's not a concern at this point. As a fair question to the increasing log and not garbage collecting anything, and the same time for new clients, for instance if you have a new mobile phone and you want to sync all the contacts it has to go through the entire log, right? Yes. Okay, so that could be a problem. You heard the question, I don't need to repeat the question. So essentially again like I challenge you, like right, I don't know how many friends you have, but like I don't think you'll get to more than 2,000, 4,000, 5,000 modifications, 10,000, whatever, it's still not a problem for a computer, and anyhow all of this, the data, all the size just gets to less than a meg, which is again less than an image on a normal website. So it's really not, it's not a concern, but with that being said, I have, it's not really implemented by a plan on doing like helping with general trimming. So essentially let's assume I added a contact and then I realize actually I don't want anyone to know I added this contact. So what I can do, I can help you recreate, you know, recreate a new journal, you can't change them, but create a new one that does not include this data, so you can potentially trim it if you want. I actually have one question. What if you lose your phone and you buy a new one and then you don't have, you cannot verify what's on the server, you cannot even access your data, what's the plan? Okay, so first of all luckily for me, it's because like most of the problems that usually associated with end-to-end encryption are actually solved because end-to-end here means me-to-me, so I don't need to do any secret sharing or verification of who you are identity, so I use symmetric encryption, so as long as you remember your password, it's not a problem. Like you can install it on how many phones if you remember the password, which you should, I mean write down as well, like it's, you can always recover it. That was the first part of the question, what was the first part or that was it? That was it. Okay, yeah, so that's solved and yeah, okay, question there? Well, just a quick question does it still make sense to solve host? I mean, is it secure enough to use a service like a central service for this kind of software? Yeah, so, you know, you should trust me, but like you shouldn't, so like I wouldn't do anything bad with your data but like you never know, so I mean if you think but yeah, but for example I have IP like if you don't use Tor I can have your IP, I don't have access times but I can know for example if you don't do delay addition, I can see when you added a contact so like I don't know it's a contact, maybe you change the calendar of it, but I can see when you touch your phone, when you're awake, if that makes sense. I mean there's still some stuff, but yeah, I mean my, I don't think so, I think you should use like, I don't think you should self host, I think it's too much trouble and you probably better off just using the hosted version. Yeah, question here? So I think we are thank you very much for big round of applause for your presentation and and thank you for making this Dev Room a success, there have been many people, very great presentations you will find everything online. Now it's the end, but it means also that we can stay undiscussed.