 Bon, first, welcome for coming at the hour of lunch. I'm also suffering and I'm hungry, so I would try to be fast. I only have 20 minutes. I'm here to speak about privacy in practice for self-hosting. And when I say self-hosting, it's not only for hosting your own stuff. It's for hosting for other people. Because when you host your own stuff, usually you do not have privacy with your own data. I would like to ask people to not take too much picture of me. If possible, zero. Let's talk about privacy, so I expect people to respect that. Quickly about me. I'm a system administrator working at Red Hat. In the open source and standard team, we help upstream projects with various things, such as community management, event management, design. And in my case, cleaning everything the developer did on the infrastructure when everything is broken. Which is a topic quite interesting, so I do a lot of security stuff. And I try to care also about privacy for community. I have a very strong opinion as my co-worker can attest. So yeah, what is privacy? For me, privacy is linked, as I said, to security, which is linked to CIA, not the bad guy in the James Bond movie, but confidentiality, integrity and availability. Especially the part about confidentiality, where you want your data to be kept secret and, for example, not being used by bad actors, and which is something that usually happens when you start to trust Google, Amazon, Facebook, etc. As I assume that there is maybe some people who were not here since the beginning of the track and are just listening because I'm here. I want to remind quickly why the GAFA are bad for privacy. So first, it's a question of confidentiality incentive. On one hand, I know people there and I know they want to protect privacy. On the other hand, the whole business model is getting private data and reselling them, so you cannot really get something completely private when you need that to get money and to live, like getting bread, getting food and everything. It's quite important for them, as it is for me. Sometimes they fight back against government and sometimes they do not. And the main, and I think the biggest issue, the one that we tend to forget, it's the lack of diversity in technology. When you are someone in San Francisco, it's easy to say, yeah, I want to share all my pictures on whatever Google decides to do because nothing bad will happen to me. Well, when you are someone in some restrictive country and more restrictive than a current USA, I will not make a joke about it, right, not now. Well, maybe you have a different point of view, but you are not working at Google and maybe Google is not asking you anything because you are just a user. Well, maybe they will not take care of anything you want. And that's where self-wasting is quite important because it goes privacy beyond TLA. So it's not TLA's control system. It's a three-letter agency, like NSA, FSB, or DGSE, which is four-letter, but you see the point. And in order to do that, you need to think about the trust model, which is basically try to decide with what people want to do, like bad guy for generic bad. So, most of the time, people say, yeah, there is a state-level adversary and there is NSA and China and everything wants to get to me. Most of the time, no, unless you are a terrorist and if you are, please do not ask questions about I'm a terrorist, what can I do during the question that I recorded? At worst, what you can get is the cops. There is a lot of people doing activists and sometimes cops just knock on your door saying, oh, you wanted to organize a protest against whatever social stuff, environmental stuff and everything. Like, we have to ask you some questions. Sometimes they go to the wrong person. Sometimes they go to the right and everything. And sometimes, while you are just attacked by your co-worker, that guy that you decided to kill at a quake, your ex-lover with an abusive guy, maybe your neighbor, because you say that his dog is really loud and you call the cop on him, or your family, because you are a BSD user and they are all a Libyan hardcore user and they do not need to know. So, yeah. And when you think about it, privacy leak can be anything. One of the most common examples is IP address. People say it's bad, but they never explain why it's bad. So, from the world, let's say that you are on IRC. You connect. So, that's me connecting from some provider. I joined the first time channel. And then, there was someone else, Adrian, with maybe here or not here. He was speaking one hour ago. And he connect from the same IP address. So, based on that, you think, yeah, maybe they are the same place. And depending on the circumstance, maybe they are the same workplace, which is fine if you are working together. But maybe we are not working together and why am I in the office of Adrian? Maybe we are the same home. Well, maybe we were connected during the night and that's where I start to say why I am sleeping at Adrian's house. And maybe, if he was called Adrian, maybe things will be different. So, that's the kind of stuff when we say privacy problem that can be innocent. Like, I'm connected from my phone. There was 1 million person with the same IP address. Okay, no problem. I'm connected from a specific IP. Maybe there is something. And if people say, yeah, you can hash IP, no, I do not help. For example, people disconnect at the same time because network is crappy. Well, don't have that you found something. So, let's back to the basics. I will focus on the web because everything is web and it's easier. I have 20 minutes. So, first, TLS. I mean, everybody can use the song script. You have to use it. There is no reason to not use it or at least no good reason. So, you need to use good cipher. It's only if you want to protect from state level adversary and everything. So, it's everything you need on Mozilla Wiki. You can use a search engine for that. I also recommend to take a look at the crypto policy project. It's a project aiming at making one single configuration for your world system for cryptography like strong, future, normal, weak which is much better than what we have now which is 20 different syntax for setting every single cipher and the order for every project which is bad. You can take a look at HKIP, etc. So, there is a lot of good out there. I'm not going to repeat them. What you need to do is to discuss with your user because some of them have older browser. Sometimes they do not have choice. Everybody can remember the old windows from the school and this kind of stuff. It depends really on what they want to avoid. So, if you want to protect people you need to get some set of cipher and sometimes you block access for others so what I would recommend is to provide two views, one for insecure stuff one for secure stuff and you can then communicate with user without trying to track them. I mean, that's the whole point of privacy. It's not track people. Set a banner to upgrade browser this kind of stuff. Because remember, because threat is not always TLA it's just someone at your school sometimes it's someone living with you and everything. Another option is to offer onion services. People who do not know what this is it's onion, well it's on top of tour you do everything with tour I will not explain what is tour I hope that someone can do it for me later again 20 minutes, etc. It can help to avoid the IP leak it requires a specific setup Currently it's quite hard to scale properly it's a research topic So there is a lot of specific attack where people try to steal your key and everything so if you want to be serious for that you need monitoring and spend a lot of time to research what is going on I'm pretty sure that in a few years it will be easier but for now it's not exactly here but that's a great idea because then you can make a talk about it for next year Another issue is the use of external JavaScript Again it's a web, you don't want to start to host a 1 million version of jQuery but it turns out that it's an IP leak so if you can avoid it Now we go to the biggest issue which is the logs That's an issue because logs are critical if you get attacked but the problem is pressing an attacker it's the same as a user I'm not saying that user are attackers or at least not on record it's kind of the same for that several solutions first one, easy, no logs no logs no problem it might be illegal, I'm not a lawyer do what you want, sometimes a fine might be okay because it's not so expensive when compared getting someone to jail sometimes it's your money so you decide what you want some people try to anonymise the log which is usually working fine that reduce the risk of leaks but yeah when you have something like this in the log you can anonymise the IP address as you want you will see that there is still my name and so yeah, you need to reduce information logs as soon as possible for example, do you need a user agent maybe, maybe not, do you need the IP, the referrer one interesting trick is to use an ID for each user and you keep the mapping of ID out of the log so everything will be user number something and someone ask for the log well, turn off that if they do not have that mapping because it's not on the log you cannot do much so that protect against accidental leak for example, cut and pass on bug tracker or automated cut and pass on bug tracker description on IRC and everything it might or might not frustrate you which might or might not be a good ID again, it depends and it's also nice because it is user name change sometimes you do not want to keep a nice guy from some department and get Mr nice guy or if you change name because you are married, this kind of stuff obviously, you need ACL for the log if possible, separation of access which means that you need to have more than one admin which can be hard if you want to avoid abuse there is a three step, audit audit and audit if possible, no direct access to log you can use something like elastic search but I cannot recommand the tool because it's quite hard to install and it's not exactly good from a security point of view but something like this, but not this if you can get a public trailer of who read the log, that's usually present people from abusing that if you are looking at the same person everyday, maybe people will not do it there is the issue of a long term storage either you decide that you do not do that because who cares or you decide to use aggregate log in the end, you need to decide why you need logs in the first place maybe if you have that for one week it should be sufficient maybe you need one year because this is the law maybe so you need to think about what you want and what you need another issue to keep in mind is backup so obviously it needs to be encrypted I say obviously but people do not do that you need to test them I mean, people have seen what happened to GitLab I need to remind one interesting question is what to do with backup when a user leave and that's quite complicated because sometimes a user leave but it's not the user someone gets access to the password decide to destroy the account what do you do if you do not keep backup, it's fine when people want to disappear if you keep backup, if someone makes someone disappear you can save the day usually what people do is to step cleaning ok, we clean the backup in 2 weeks in case you change your mind but that's also the kind of trick that Facebook is using so that's not great whatever you do, leak and error do happen there is no perfect choice something else that I try to recommend is to use open infrastructure so you publish the configuration that make harder to do something fishy like deciding suddenly to collect more information you cannot do that without being seen in practice, not much people will read but at least it's better it will trust with user it allows people to fork and decentralize this is decentralized internet room so it's good there is more information there is a website that just started 2 weeks ago and if people have lots of money you can go to scale in south california there is a whole day dedicated to that another one interesting thing to take in account it's centralized authentication so for me, it's great I mean it's easier to edit and debug that's why it's a cat-smiling it permits to have 2-factor authentication more easily but at the same time it's easier to trace people because the same people will connect on one central system and it will have the same name everywhere and most of the time it requires upstream work so I know that for example for myself which is a group providing services do not want to set up central authentication for political reasons so again no perfect solution so you need to be proactive look for a way to attack privacy which basically means be a stalker let's be clear you need to think about sharing policy I would recommend to see what Mozilla is doing so they have 6 principles the first one is no surprise surprise as great but only for Christmas as birthday you need to tell people what you do you need to make sure that they understand basically do not get a lawyer to write a document to explain that usually do not work so it needs to be tested with user, discuss with user make sure that they understand something and not something else you need to make sure that you give them real choice again they need to be clear you need to discuss with user everything you need to have sensible default if you create a system for blog if you create the same system for a private it's supposed to be private it's technologically the same stuff write stuff and it's recorded but the default will be different of course you need to limit the data data that you do not have or data that cannot be leaked that can only be private and you need to make sure that the user stay in control and that you trust third party make sure that third party are taking privacy seriously and everything which is quite hard to do I think but it's up to you to decide and yeah and there is a whole importance of few weeks which is by itself a whole topic that will span one hour discussion so in c they will just remind discuss with user and discuss with more than your friend because if you are in this room you are likely to be more paranoid you may well you may not see that yeah maybe GPG is great because you can select what safer you want oh shit size of key you want but most people would not want to do that they want a checkbox private and that's it so you need to discuss with real user or more mainstream one because your friend most of the time are real I mean it's fine if they are not I'm not judging anyone so what to do to avoid sysadmin stalking and to be honest that's your worst case attack because that's someone who control everything and who sometimes can be a social and let's face it I'm sysadmin that's called the conscious deputy problem there is a lot of discussion about that again I keep repeating myself because when I was in university we were saying that repetition is everything so audit what is going on so you can see if there is something you can go back that's for sysadmin sysadmin if they have volunteers they need to be aware that they will be spied on by other sysadmin and everything the whole who watch the western stuff I cannot speak latin or I would have said in latin again private direct access if you can use config management do it if you cannot go to config management come tomorrow in Gant learn how to do it and go back to that slide if you want to be trendy there is a whole server less container stuff because if there is no server to connect there is no problem and if there is configuration that cannot be changed because container everything well it cannot be changed and there is less risk of abuse if you have more time you can take a look at a values research topic search has peer and everything where people do not give you data they give you encrypted data and you can make computation on it something like zero past zero bin I think is working like this as a server has no knowledge or anything nothing is done client side but for more than a past bin because that's quite limited and to conclude think about weeks discuss with user I said that's a well time so just to make sure you understand that communicate if you can push security upstream that's great so I do not have to do that and data not collected is data that cannot be leaked it's quite easy you can print it in your office think about it contact me well you can use whatever you want as long as it's IRC or mail and thank you and if you have a question it's the right time we have 7 minutes if I'm not wrong yes that's correct thank you if you do not have any question I'm ok with people clapping for 5 minutes just for one you haven't mentioned it actually worth it but the data basically the data itself should already be encrypted so the question is is it worthwhile to encrypt a disk in a server if you do not log anything and if the data are already encrypted it depends because you can still have someone going to the data center taking the disk pretending like there is power cut changing the data and then putting it back that's unlikely to happen but that's something that can happen and the other problem is even if we need to encrypt the data sometimes you just take out of the box a software such as I don't know a webmail and most of them do not have encryption right now you can add it but it requires to be able to code it requires upstream to accept and everything so if you want to get something fast maybe not and no matter what you do there is stuff that you cannot encrypt for example you want to make sure that people do not know who can access to the server, there is SSH key you cannot encrypt them I mean they need to be in clear state when the service is started and this kind of stuff so I'm pretty sure it cannot earn too much to encrypt the data then of course it becomes complicated because you need to have a specific NITRD you need to have some specific process I know that people that read it are working on that on a look at Nathaniel McCallum work, he is likely giving a talk somewhere about it so what is the recording but for now the people who are doing that are just using out of the box NITRD stuff with Wobby where someone connect and decrypt that's for fine you have more downtime so you have to decide if it's ok or not again there is no perfect solution you just need to decide is it worthwhile isn't it worthwhile and then change back because that server is supposed to be throwable and again config management it depends no one has any question I cannot answer but confused people like this so the question is if I do self hosting in my house I cannot prove to people that I am not doing anything with the server I think you cannot prove it because even if it's a data center if you pay for it you can likely access it I can say yeah I am hosting something on a data center from OVH which is a famous company I can say yeah I am hosting something on a data center from OVH which is a famous European provider and if I pay enough like if I do not take the cheapest possible way well I can get access to the rack and do whatever I want and even if I do not have access at the hardware level well I am root I can refresh the BIOS I can do whatever I want so yeah that's quite hard you cannot really prove that there is people working on it with a remote attestation and making sure that you execute the code modified with TPM and everything I think that's still a topic of open research but in the end if you want to delegate something to someone you kinda have to trust that person to not fuck up the system because they can just refuse to do something and that will mean a lack of availability which can be a problem for someone so as long as you delegate and people can decide to not do anything that's an issue so you have to trust someone to make it a topic and I do not have good answer so maybe one more question because of one minute two minutes one minute yep so I tend to prefer oh yeah what is my preferred hardware stack to use well it depends on how much people I need to host for work while we are basically getting expensive server we want to get lots of memory to run Jenkins great and everything at my home when it's meeping out of my pocket with my electricity and everything I just get cheap possible board either Raspberry Pi because people are throwing them away and it cannot be cheaper than free or getting big old bone because it's well supported by Fedora25 and I can add TPM for crypto acceleration and this kind of stuff I do run a tournode in my house because the fiber line do not feel by itself just by me so I can use it and it's running fine there is nothing for remote access except serial port and I have the full control it's cheap enough to not have any intel stuff and it works well except when it's broken in upstream kernel but let's not discuss about that and you can get a TPM and a crypto cap so you can do all kind of security stuff I do because I do not have time and it consumes almost nothing and it's powerful enough to run a tournode which is doing a lot of CPU so it should be fine for I don't know mail, spam filtering and everything and it's cheap Thanks