 Hello everybody, thanks for coming. This talk is quit logging or data minimization in Debian. So I'm Daniel Kahn-Gilmore, DKG, and this is Matt Taggart. And so we're going to talk about some issues around privacy and security and the intersection between privacy and a bunch of other stuff in Debian. We're the focus on logs. So just so you know where I'm coming from on this, I work for the American Civil Liberties Union right now. I'm a technologist in their speech, privacy, and technology project. And we are concerned about the amount of data that can be used against people in the world right now. And so I'm coming at it from a perspective of someone who's very much interested in civil liberties and what we can do for that. And I'm part of Rise Up Networks. And we host email and other online services for activists, many of whom are in very repressive regimes and are trying to do things anonymously. And if they're outed, literally their lives are at risk. And so we have to take some of these things very seriously. So that's kind of this perspective I'm coming from. So and hopefully this is a discussion. People have questions. Please raise your hands. We're going to raise a couple of different ideas that we have about things we can do to improve the situation. But we want to hear what you all think is worth doing as well. So just for a background to explain the intersection with civil liberties and logging, we're living in this time of a sort of data flood. So there's more machine-readable data than ever. And there's much more centralized data storage that builds up. So that stuff is available to all different kinds of people for all different kinds of purposes. But it's available to systems that do accounting, so making sure that they know how much to charge you for data usage on your phone. It's used by law enforcement to find out things about what you've been up to. One thing I didn't mention up here, it's also used for things like real-world traffic analysis, like how busy is the road? How fast are the cars moving on the road? There's all kinds of data that's being collected. It's also used by marketers. And some of the data that is generated regularly is also used by the users. But it's interesting to note that the folks who are the subject of the data are only one of a very large, wide group of folks who are using the data. So the data that we are producing as we operate in this very digital world is usable by lots of people besides who it's about. And I just want to remind folks that as Debian, as always, our priorities are our users. And the priorities of the folks who do the accounting systems and law enforcement and marketers may actually be very different than the priorities of the people who the data is about. So there's tons of data. And we can't address all of it right now. I mean, maybe collectively we can. But in this talk, I'm going to focus on just a subset of the data, which is the logs that are produced by network services. So there's lots of other things that produce things you might think of as logs, like your bash history. But we're not going to cover that in this talk. And we're just going to try to focus on network services. So it's worth noting that there are good reasons that your network services produce logs. So I don't know how many of you here are systems administrators in addition to being members of the Debian community. I see a few hands rising involuntarily, like probably most system administration duties. But so it's useful for debugging. It's useful for having an audit trail. I don't know how many people here develop user-facing software, but certainly logs are useful in usability studies about how a website gets used. It's useful for diagnostics. It's useful for analytics to know who is visiting your website or things like that. There's a lot of other things that it's useful for. And the data that's getting logged covers lots and lots of stuff. But in particular, there are some details that are more identifiable that produce these patterns of information that can be used about someone that maybe won't be used by that person. And so IP addresses, like logs of who logged into a machine. There's mail headers that get logged. There's cryptographic parameters that get logged. There's a whole bunch of different stuff that creates sort of fingerprint-able trails in these data sets. And those fingerprint-able trails show user activity. They show what the people who are using these services do. And the user may have no idea that they show that. And in addition to showing what someone did specifically, they also produce, because of the volume of them, you might not care if someone knows that you visited a given website once. But there's also usage patterns, which reveal something about the way that you live your life. So time of day, what time in the morning do you usually get up? If, say, the first thing you do when you get up is you check your Twitter feed, then they probably have a log that shows that this particular user is accessing the data at this particular time. And then because there's these user patterns, there's also this question of divergence. So when did you, oh, you got up really late today. What were you doing last night? It's possible to wine and cheese party, perhaps? It's possible that you can get information just from the patterns that doesn't have anything to do with the actual content of the network service that was being used. And then it's also worth noting that even if some of the data is innocuous on its own, it can be correlated with other data to build a much richer picture about what someone is doing and who they are. Yeah, in particular, I think it's kind of become clear that it's almost impossible to anonymize data sets. Researchers will get some data and they'll say, oh, we anonymized it to put it out there for people to use. And through statistical methods, very often people are successful in de-anonymizing it or using multiple data sets and correlating those. And so even when you try to anonymize things, it still can reveal quite a bit. So why does it matter that these pictures are formable from the logs that we're generating? So the intelligence community wants to just gather as much data as it can. Law enforcement will tend to give a subpoena to someone, like a service operator, and say, we would like to see your logs. We believe that such and such is a problem, so let's see your logs around this time. When they do that, they get a huge chunk of logs. If the person who has the logs has them all, they give them over. So there's this collateral damage situation where anybody else who happened to be using the service, all their information gets revealed as well. Someone who's into compromising systems can get into a system and take the log data from the system for their own means. There's marketers and business types who want to use this log information, which is a slightly different scenario, because those folks are actually often operating from within the service that's logging. So they're saying, someone who works for Comcast actually came up to me at Hope earlier this summer, the conference in New York, and said, oh yeah, I work in the division that records every TV show you've ever watched if you use Comcast. And we know when you watched it, and we know how much of the show you watched, and if you put it on pause, we know how long you put it on pause for. And it's very useful for us to have all of this information going back indefinitely, because it lets us provide you with better recommendations for new TV shows to watch. So they're in sort of a different situation than law enforcement that does a subpoena. And I said, well, do you think maybe you could just stop keeping some of those logs at some point? Have you guys considered a policy on that? He's like, ah, we could, but nobody's ever really brought it up before. In the future, we might want that stuff. Did you want to? I thought by law they have to delete it after one year. So he said, I thought by law, they have to delete it after one year. I don't know of a law that says they have to do that. I don't know of any US law that does that, but I'm not a lawyer. So if you can find that law and point it out to it, I would be very happy to hear about it. So who might be the target of some of this data gathering? So a target in the sense of someone who's actually been, who can be directly harmed by it, as opposed to just feeling a little bit creeped out and not quite sure why. There's activists and dissidents, which Matt mentioned. If you use a web service and they have all this data about you and then the service gets compromised, that's information about you that is now not under your control. And in addition to that, they can use that, especially if it's tied to an email address. And users are notoriously bad about using the same passwords everywhere. They compromise one service, and they can go from there. And that's why it's such a big deal when some of these big providers get hacked, is that it spreads like wildfire. And then companies may also have interests in knowing, for example, the usage patterns of their rivals. So this is not just, oh, we're sticking up for the radical revolutionaries who are under a repressive regime. You have good reason to not want to keep a bunch of data around in the event that your own machines, that you lose control of them. That data is now in the hands of whoever took control of that machine. So there's a few other reasons why you might not want to log too much. And there's performance reasons. So I don't know how many of you, as systems administrators, have dealt with IO contention as a result of massive activity. Your log files get real big, real fast. The disk fills up. There's, and some kinds of network activity actually produce multiple logs scattered all over the disk. And so we have this example here from Taggart. This is specifically a rise-up example. And we had to split up the IO of a bunch of these things because on our mailing list server, every time a single mail comes in, it hits the post-fix spool and the post-fix logs and the sympa spool and the sympa logs and the list archive. And it's got a read from the list archive in order to rebuild all the indices and write all those out. And when those were all on the same physical spinning hard drive, it was a huge amount of contention and super slow. And the only way we could fix it was to individually divide up all that IO. And then one of the other things we did is mentioned at the top there, but slow down the rate at which the log files are flushed to disk helps, too. But there's definitely performance implications. And then lastly, people have devices that don't have persistent storage now. And so those devices, you want to be able to make sure that they operate well without having a log. So let's bring it into Debian. So what can Debian do about the situation and what is our responsibility in the situation as a major distro? So I want to point out that there are a lot of players in this ecosystem. And everybody has different responsibilities. So the users, people who use the network services about whom some of this data could potentially reflect, they're the folks who operate the services. And there's distributions who provide the software to the systems administrators. And then there's upstreams. And we at Debian, I mean, this is old hat to say, but our priorities are users and free software. And I want to point out that the users and the systems administrators are both our users. And free software is upstreams. So we have responsibilities to all of the other players in this particular ecosystem. And so a user might have some responsibility, but the systems administrator has a lot more power. We at Debian are not about to override the systems administrators' needs because the systems administrators are also our priority. But we do have a priority to the end user as well to ensure that we can set some kind of reasonable norm about what kind of data should we be keeping. And we need to also make sure that there's a norm about if you're a systems administrator, how do you debug your system when something's going wrong? We need to make that easy, make that useful, while at the same time respecting the needs of the users to not have large amounts of data floating around about them. So just for a second, because I think it's worth thinking about this. And people don't often just make the simple realization, if someone is trying to get data about somebody else from you, there are a lot of different ways you can resist them getting that data from you. But the simplest way to resist is to not have that data, right? It's a super sort of stupid thing to come to, but that is the easiest way to resist giving data away about somebody else's just don't have it. So I think if we can set some sort of standards and priorities as a distro to say, here's what we expect the normal setup to look like, then that's going to actually have follow on effects where people legitimately do have needs for logs. I'm sorry, I'm out of here. Go ahead. But we want to make sure that the assumption is less. Go back to the previous slide. One thing I wanted to say on the slide was that like we have been in so many other situations, devians in a very unique position in the fact that we're between the upstreams and the users. And we have so much in our archive to sit down as a group and collectively try and come up with some reasonable ideas here and help to drive this upstream and write some guidelines that apply both to devian packages in the archive, but also our recommendations for upstreams to say, hey, this is what we're adopting in devian and we really think it'd be a good idea if you did it as well. So concretely, what are some things that are issues? So one of these things is about the length that you keep the data. Maybe you decide you want to keep all the data that we've been keeping, but maybe you don't need to keep it as long as you've been keeping it. And our defaults govern what probably a very large set of systems that are out there do. And so we actually keep four weeks worth of logs by default in the regular log rotate package with some exceptions. And I think it's worth noting, so we've broken a couple of out here. Since log we keep a week of and we rotate it daily, the deep package logs are kept for 12 months. And Apache, we keep 52 weeks of logs. So there's a full year's worth of tantalizing data that's just lying around on the disk that if someone comes to you and tries to get it away from you, whether it's by subpoena or by hackery, you've just given them that data. Want this change a lot in Jesse? Is this too loud? Want this change a lot in Jesse because of Journal D? So I do expect there to be some changes in Jesse, but I don't believe that Journal D is going to change our logging policies. It may provide us with an easier way to adjust logging policies centrally, which might be very nice. But I don't think that the defaults are going to change unless we decide to change them. Yeah, so I'm part of the Debian system D team. And we, by chance, just discussed this yesterday. And because we discovered that the current settings, they are only based on size of logs rather than amount of time. So it's quite likely that we will change that to match the current settings. And if people have opinions about what, if the current settings should be changed, then it would be a good time to talk about that now. OK, I think probably we want. Well, speaking for myself, I would love to see the settings reflect both the size and the time. So whichever comes first, kind of a thing. I mean, if it's, so yeah. So great, thanks for being here. Also, it's good to have. So we can also, so fixing things at the infrastructural level, like at the logging dam is one of the things that we would really like to be able to do. I don't know why I don't have. So I do actually have Journal D here, but it's been stuck in the, sorry, the slide fail here. This is supposed to be a separate bullet. So there are specific applications that we can modify to log things differently. And then there's also this idea of doing it at the logging, like the overarching logging dam and like approaching the roles separately, right? Do you fix it at the service? Do you fix it at the centralized logging? And I think we're gonna need to probably fix in both and address the questions in depth. And the fixes you see on this slide are thanks to a few people in this room that have been working for years to get these things fixed and in upstream. For a long time, they were carried as patches in Debian, but most of these things are upstream now, which is really nice. And we can do more. Yeah. For the Debian.org calls, for example, we don't log the IPs anymore. For Debian.org, we don't log IPs. Yeah. Woo-hoo. Thank you. That's great to hear. Is that just the web you're talking about, so far? Are you? Is it? We're moving them and placing them 0, 0, 0. So, yeah, is that- So, Global just said we're just removing them removing the IP addresses from the web server logs and replacing them with 0, 0, 0. Is this Apache you're talking about? So, are you doing that with the log format directive in Apache? But I wouldn't need to take out- Yeah. Okay, I'd like to talk to you about that because you will still log IP addresses in your error logs that way. That's why this live Apache 2 mod removeIP will remove things before they get to the logging layer. So, so there's also, I think we can think about proposals that would affect Debian as a whole. And I think it's worth trying to come to, you know, some attempt about like establishing the norms, and I know I've said established norms a bunch of times, establishing norms within Debian, I think will help to establish norms that are outside of Debian as well. And I, you know, like any other subject that has trade-offs, these are not gonna be without their contention, but I think it's worth trying to have the discussion and figure out what we want to do. So, I think it would be really nice if we could say somehow that Debian is private by default. And those words are, well, default's pretty obvious, but private is not. And it's gonna be gonna take some wrangling to figure out what the details are. But packages could default to minimizing logging. And we can encourage packages to make it easy for assistant administrator to temporarily disable the data minimization for debugging situations. And if a package can't do that easily, like if it's a lot of work to go from a standard data minimized approach to, you know, turn up the noise, let me see what's going on in, you know, like let me get a record in detail, then people are not gonna want to have, they're gonna wanna go ahead and just have the standard be like, leave the noise in and deal with having large logs. I'm curious, a lot of the things that I wind up debugging are transient failures and maybe difficult to reproduce. And so in those cases, it's hard for me to actually go back and debug things if I don't have the information recorded in advance. Is there an answer to that, or is it just that that is one of the costs of having more private logging? Phil might have an answer. Is this in response? But how often are those actually useful? The logs that we are, by default, logging are often just not what you're looking for. So the log suddenly stops and you don't see why. So I'm not sure if there's, in general, in the default setting enough logging for you to debug anyway, so that might not make much of a difference here. So one of the things that's standard for me to get is a question is what happened to my email? Like, did it go through, for instance? And I'm wondering, and the logs really are useful to figure out whether the mail went through or not. So I'm wondering, A, how to anonymize it so that I can still solve the problem for the user, and B, how long to keep the logs for? I can tell you a little more about what RiseUp does in this case. So for our post-fix logs, we anonymize the IPs, but we leave in the usernames and the email addresses, the to and from and that sort of thing, and we can track things down. The only time we find that we ever need IPs is if someone's trying to denial services or something like that, but that's pretty rare. But we also rotate logs pretty aggressively and we only, at any given time, have maybe 12 to 24 hours of logs available and sometimes someone will file a help ticket and say, hey, you know, I'm having this problem. And the cost in that situation is if the problem occurred more than a day ago, we have to say, sorry, we don't have logs that far back. You know, if it happens again, file a help ticket right away and we can jump on it before the logs have disappeared. Or you could say, try to, you know, resend the same message that you thought didn't go through or get your, you know, you can encourage people to say, like trigger the thing now and we'll look at it as it happens. But that's the cost-benefit balance that we've chosen. And I'm not advocating that balance for Debian, we're just advocating less than 52 weeks' worth of logs or something like that, you know. Was there another? For email addresses, another option is to actually hash them in your logs so that you can, if somebody goes, my email address is foo at bar.com, you can look that up, but it's kind of hard to actually just look up what has passed through. And that also, yeah, that minimizes the collateral damage somewhat. I mean, you can reverse hashes with enough time, but. Yeah, that only works for incoming mail because, excuse me, if you have outgoing mail, you have a list in a database of the users, you can just build a rainbow table of all the users in the database. So having done this, done, you know, Central Social Administration for a university, one of the places where we tended to use a lot of this data was because we had a mandate to try to protect our users from attacks, in particular from phishing. And so one of the places where we used IP addresses in the logs a lot was to correlate accesses of different services from a particular IP range where we thought that someone who either had successfully phished accounts or had compromised accounts was coming from. So we used them a lot for security. So giving that up as a hard sell is one of those justified log reasons. I think there's a lot of power in having logs degrade over time instead of trying to do, because I think stripping out data is great if you can do it right up front and don't have those kind of needs. But if you do have those kind of needs, one of the things you can do is you have all the IP addresses in the logs at first and then after some period of time that's relatively short, maybe a day, maybe a week, you hash all the identifiable information. So now you can still get correlations if you need to track down something that's beyond that period of time, but you can't, but it's harder to pull the actual data back out of the logs directly. And then maybe after some longer period of time, like a month, you strip all the hashes out. So now you still have the logs that you need to do statistical analysis of what services are being used, but you no longer have identifiable information. And then after some longer period of time you actually throw the logs away completely. And if there were tools in Debian that did that sort of progressive deterioration of the logs automatically, I think a lot of people would turn them on and you would get a lot of privacy almost for free. I was hoping that you were describing something that you guys were doing. Oh, man. Can you write that? I have two comments to that. One thing that Daniel and I talked about when we were preparing the slides was kind of the Webalizer case because that's one thing that people like is being able to run analysis on their web logs. And my understanding is that Webalizer and some of these analyzers now go through the logs, harvest the metadata and put it into their own data storage. And so as long as your Webalizer is sucking that data in and already doing that minimization it's okay if we rotate the logs more often than 52 weeks a year. So a couple of other proposals of things that might be useful within Debian is an idea that maybe we can separate out what logs need to be persistent and what ones don't. Maybe make it easy for an assistant administrator to identify those things. That can help with some of the performance issues as well but we can basically put all of the logs that don't need to be persistent but just need to stick around say while the machine is up in a temp FS. And if we have mechanisms that are easily available to put that data in non-volatile storage then as soon as there's a machine reboot the data is gone. So and I also think there's this idea that we could ask the user once, the assistant administrator once, what is your preferred logging level as a debconf question and then that provides a hook for multiple packages to go in and say okay they think that a week's worth of logs is all they're gonna need for services that they haven't explicitly overridden. So if that kind of question, if that information were available centrally then other packages could pull from it and make appropriate decisions based on that. It seems like centralizing more of this logging policy into fewer controls would be quite useful. To that end I wonder if we might want to have explicit Debian policy saying if you're packaging software that can either log to SysLog or to its own dedicated log always default to SysLog and the SysAdmin can configure a separate log if they want. That's not in policy right now to the best of my knowledge perhaps it should be Ditto for JournalD. I think we deliberately avoided using the word policy in our slides. But I think our hope is that we get there at some point but we need to lay the foundation first. Stand up for each other please. Okay. So there's also some countries have data retention loads or maybe they have some unclear status right now. But yeah, having also maybe pulling data from going to lawyers and asking them okay where should be Debian defaults for France and so I can say okay, France. And then I'm logging what is only strictly required by load to comply. That would be helpful. Also one thing that we do on the server day I'm sure is that after to we so log rotate log rotate has an option that is called well you can call an arbitrary command as the compression step and we use GPG. And so the logs are still encrypted. And so you can't just go into the server and grab a lots of historical data without requiring a key that is not anywhere on the server. Interesting, that's it. Maybe they should like get more integrated on something. Yeah, I like that approach. So another thing that we noticed was that Debian builds all of its packages with the debug flags turned on and then strips the symbols out as part of the build process. And it's entirely possible that your upstreams might interpret the debug flag as meaning oh yeah, I'm doing a debug build therefore I'm gonna be more verbose. So it would not surprise me I don't have a specific instance of this but if there's software in the archive that in Debian is actually more verbose by default then when people build it themselves just because we have the debug flag turn on during the build. So if you have a package that's producing logs it might be worth looking around and that package is source code to see whether it's something like that is in play. So I think it would be nice to have sort of like a framing document that describes sort of what we're trying to do. And it smells a little bit like the DFSG to me and I'm not proposing this to be on a par with the DFSG but the idea that there are things that we are looking for in software that produces trails of activity and so these are just some sort of, you know this is a document that's gonna be sort of at a very high level not saying you have to do these specific things but say hey, these are the goals that we think that packages should do when they've got data of users. So I'll just read them out for folks, right? To acknowledge that the data they produce could be sensitive and that some data is especially sensitive and if you can identify that and you don't need it in the logs strip that especially sensitive data. I think most packages are pretty good about for instance stripping passwords out of the logs that they produce but maybe we need to think about more things than just passwords as sensitive. So I don't know, Tiger, I think Colin has a... The other thing to be careful in light of stripping passwords is something that I think OpenSH does this various other packages do it. If you're pre-authentication be careful about logging usernames. Be careful about logging usernames and failed logins because it sometimes happens that people type in the wrong way around. In the fast-forward field and vice versa, yeah. So if we could have a set of these things so minimize what you keep, minimize how long you keep it make it easy for logging to become more verbose for people who need it and for instance is when it's needed and it might also be nice if there are ways I don't know of anything that does this right now but to be able to say to a tool increase your logging for diagnostics and while I'm telling you to do this I'm also telling you that in 10 more minutes I want you to decrease your logging so that you could say I want to capture this stuff but I don't have to worry about getting distracted in the middle of whatever I was doing and then I come back two weeks later and find that my logs are full and I've got a lot of data that I didn't want to actually have. So if you're a DD and you maintain a package and the package produces logs think about this compile time debug question if you find an instance of it I'd be happy to hear about it see if we can take it out. I have a crazy suggestion maybe we should have a version of log rotate that knows how to strip stuff when it rotates the log and then you would be able to say well I'll keep the IP address or the email address or whatever it is for the first few days so I mean what Lunar was suggesting was just that the rotate command runs an arbitrary compression filter and you could have it be GPG or you could have it be said so it sounds like that's something people could do. Also can we move the logs from one place to another? I mean another thing that occurs to me is maybe I want the full information in my tempfus and some redacted version on disk. Yep, a couple more hands. We're just going to say if you store it first disk and then later remove it then a backup could pick it up the data are going to be out there. So I'm totally okay with or I think the idea is good of being able to write it to tempfus if you want to and that should not be the default as you advocate. And also just to point out even if you're careful about your backups nowadays there's copy and write file systems so to my knowledge there is no copy and write file system in mainline kernel that has the ability to retroactively strip out data. So I don't know of a way to solve that. Well it depends on what if you're assuming that someone is coming after your data, right? There are situations where someone comes to you and says give us your data and you can turn around and say well I don't have the data because I've deleted the file and while it may be true that it resides in some patterns on my hard drive I'm not about to tear down my hard drive and rip that stuff out. And then there may be a situation where there's just simply someone who's like if you're being hacked they may not have the technical capacity to do that. So deleting is better than not deleting but I agree with you there's still data left. To some degree you can also fix that by using secure arrays. If the file system does the secure arrays. Right? That's, no that's actually doesn't actually require file system support. Okay I'd like to hear about that more but there's a couple hands. Sorry. No so what secure arrays does is it's a SCSI ATA level command where it will, you'll discard any unused blocks in the file system. And any file system actually knows it will have to keep track of what blocks are used or not. So it's the file system that needs to make the decision. No it's, it's the block. Well, a file system knows what blocks are unused or not. Yes. So we should be looking at the tuning at that level as well in terms of making sure that the data is actually gone. Yes. There's also at least for Butterfest you have the no cow option. So you could have a sub volume that you write all your logs to that doesn't do copy on right. Okay. So with the caveat that I'm not a lawyer what we were told at by legal counsel at Stanford when we were dealing with cases of subpoenaed logs is that basically the way this works is that when you're through with a subpoena for logs you and this is mostly for civil. So criminal is a different story but for civil subpoenas basically the person who's subpoenaing the logs has to pay the cost. And if it's not sort of there's like a customary cost that for reasonably accessible data. If so in the case of things like where it might be residing in blocks in the file system that you have to do data recovery in order to get at it but in that kind of situation you basically get to hand a bill to the opposing counsel who's trying to subpoena the logs for not only the cost of doing the entire data recovery service but all the cost of downtime for your service while you pull the hard drives out and so on and so forth and basically you just to get to start adding zeros. So the opinion of legal counsel was essentially that as long as it's hard enough that they can just start adding arbitrary zeros onto the end pretty much people just go away. So loggeritate also has a shred option now since we see I guess and we can turn it by default, I don't know. So someone just said only on some file systems. Just so folks know we have I think seven minutes left. Instead of trying to sanitize logs after the fact after storing them in persistent storage what I've found too easier to do and been doing that for maybe 10 years is to actually log twice. Some the logs that I want to have a non-rejected version of I log the non-rejected version to tempfs for a small amount of time and I also log the same messages to persistent storage but rejected. So you don't have to find clever ways to remove data that you've already written to disk and find ways to avoid that this data can be recovered later which is actually a hard problem. That sounds similar to what Ross was proposing but maybe a more feasible way of getting to that point. Just on the question of what to do if you're getting civil subpoenas I can't speak for every state I can speak for the state of Washington and for the federal system. Most attorneys involved in high level civil litigation right now are engaging in massive amounts of e-discovery, they're very sophisticated in doing it. They usually send their consultants out to your device and they'll say you leave your hard drive where it is they'll just dump it all out and when you produce it according to subpoena you're supposed to produce it in its native format as you store it. So the bills will actually not be an impediment. It's a good thought but actually they're ahead of the curve right now and they're accessing this data left to right center. All the more reason to not have it. One more hand up here. I'm quite interested in that by this idea of tidying the logs up after you've written them to persistent storage which if you're gonna be subpoenaed six weeks later is useful but if I left my laptop on a train for example and I'm rotating my logs on a daily basis today's logs are right there for you to read and along the same vein at least in the UK I can't speak for anywhere else if a warrant is issued to acquire data from you on a criminal basis rather than a civil basis you don't get somebody doesn't turn up and say give me your logs please they turn up and say sit where you are I am taking your equipment and at that point you have no opportunity to do any kind of scrubbing so you're much better off not having written them in that form in the first place and I don't think we should forget that we're not necessarily talking about civil subpoenas there are all sorts of other ways in which somebody might try and get hold of something from you So I just have one very small point now we've been talking a bit about the arguments that one might have with one's managers if one's a sysadmin about data minimization the situation in the US is pretty bad but in the European Union there's the European Union data protection directives which legally mandate that you should not keep unnecessary data and I won't go into the whole detail of that but one way to look at some of this is to say we should make it easier for people who are running our systems to comply with European law as well as obviously going beyond that and doing what we think is right I think that's a reasonable way to approach it Perhaps you could set the defaults based on what language you selected at the beginning so like if you selected English we could say well you're probably in the US you're right if you selected American English then you know we could say well you're in the US or if you I think asking it as an explicit separate question right no I'm saying like we could ask it as a separate question but set the default based on your language based on a guess from your language yeah exactly I think that has a lot of potential for confusion especially because I know that there are people who run their systems in American English simply because that is the most widely yeah or C yeah what does C have to say about the law I also think as a project we don't want to get in the habit of ourselves trying to interpret the laws of all these different countries and figuring that out we are not the police but there may be someone who wants to do the thankless job that's like TZ data every time somebody changes the time zone we have a way of formalizing something I suspect that the laws around data logging is much more complicated than the laws around daily savings time and please please please don't be confused into the belief that existing geopolitical boundaries mean anything with respect to data protection data privacy who's going to mind what so we've got Tiger and I set up a very rudimentary sketch of a wiki page which is just wiki.debian.org slash logging and we want to start filling that in and we want to start trying to come up with a sort of you know a set of guidelines that we think make sense for tools that are in Debian I don't know if folks are interested in setting up a mailing list as well to discuss this and sort of bat around ideas or if we just want to change things through the wiki and then use existing mailing lists for discussion where it comes up I just heard a whisper of Debbie and DeVell over there that's quite the fire hose to read if you want to filter it through for questions just about what what logging guidelines should be if we decide to set up a mailing list if enough people call up to me and say yes this should be a separate mailing list we'll put the mailing list on that wiki page but I invite you to go to the wiki page and try to add thoughts and observations and concerns that you have and maybe we should probably establish some user tags within Debugs that indicate so that we can find where these policies or these guidelines sorry I wasn't supposed to use the P word where these guidelines are not being met or where we think they're not being met as a chance to sort of come to you know the usual rough consensus within the project later in the week there's a upstream guide buff so if you guys could come to that and we could discuss how we can advise our upstreams about the defaults for logging that would be great the other thing is earlier in the week we were discussing a possibility of a Debian free services guidelines document maybe we could add this sort of stuff to that document as well yeah I think that would be good just to say our wiki has a subscribe option so you can subscribe to that page and you will know when there is a change do that good call alright so I think we're out of time thanks folks for coming and talking about it and I look forward to making Debian more respectful of our users privacy and more protective for our users