 So, can you hear me okay? So I'm MC and I work on transparency toolkit along with Brennan Novak and Kevin Gallagher and basically what we try to do is watch the watchers. Back in May, we released a database of over 27,000 people in the intelligence community called IC Watch and this is people who are talking about their work on classified programs on the public internet. So we collected it using search terms like the code words mentioned in the Snowden documents and today we're releasing an update to IC Watch doubling the data and the database. And that's already live if anyone wants to look at it. So for the people who aren't familiar with this project and the sorts of things available in the research methods, I'd like to go through an interesting example of the sorts of things that can be found in this database. So this is Lauren Russell and she works at L3, a major intelligence contractor. But she started her career as an army interrogator in Iraq. She says that the information that she used, that she collected was used to capture dozens of people and part of her job was also to assure safe and humane treatment of hundreds of detainees, so that's good at least. But then after a few years of that, she went and worked at a different company called Accelis in Afghanistan and this job was quite different. It involved finding people to kill. So she says as part of this work that she utilized F3A methodology to conduct analysis on raw infused humans against incomment helping to create 125 targeting support packets that nominated to the joint priority effects list for kinetic targeting. So there's a lot of not very obvious terms in gibberish there and this is a pretty common problem when going through these resumes, so I want to break down how you would interpret that sentence. So signals intelligence is what the NSA does. It's collecting data from intercepted communications. Comments communications intelligence is specifically signals intelligence from communication data, so what the NSA does when they read your email. And human intelligence is intelligence from human sources. So things like data gained from informers or from torture. The joint priority effects list is a list of people the US military and its allies are trying to kill and capture in Afghanistan. And F3A stands for Find, Fix, Finish, Exploit, and Analyze. It's a rapid intelligence collection analysis methodology used for targeting and we recently found out in the Durham papers that this is often used for drone targeting. And kinetic targeting simply means attacking and moving targets. So looking at her profile again, she says that she utilized F3A methodology to conduct analysis on raw-infused humans against and comments helping to create 125 targeting support packets that nominated to the joint priority effects list for kinetic targeting. Basically what she means is that based on intercepted communications and information from human sources possibly gained under duress from torture, she's deciding who should be killed and captured. So the intelligence community has long had an attitude of collect it all. And General Alexander started trying to collect all the data that they could from every source. And one of the first projects to this end was something called Real-Time Regional Gateway. It's a massive project to store, combine, search, and analyze data from many different sources at once. From intercepted communications to data from drones to data from interrogations to even mundane things like traffic patterns and the price of potatoes. And they started this program in 2005. The initial version was built by SAIC for use in Iraq and these days it's mostly used in Afghanistan. It's not just the US though because according to documents published in Der Spiegel last year, Germany is the third largest contributor to RTRG. This sort of collection analysis tools are used for some programs that you might have heard of too like co-traveler, the program the NSA has to figure out who is going places with who else. And there's a specific analytic tool that's part of RTRG called Sidekick that uses relative velocities to calculate this from any different data sources so that they can calculate that for people across networks. Unfortunately, this is really computationally intensive because they need to pre-compute all of the travel behavior for all the pairs of selectors, but it's feasible for them to do computationally intensive things with how this is built because it's built on Hadoop and Accumulo for distributed data processing and storage. So they're quite serious about this. And the goals for RTRG are quite lofty. One of the creators in an interview with Defense News described their aim is being able to use intercepted communications and integrate it with signals for geo-location so that they can instantly find people and target them. And another counterterrorism official told the Wall Street Eternal that RTRG literally allows them to predict the future through correlation because it's the strongest correlation tool ever. So their goals of this seem to be twofold. First of all, to be able to kill or smite any potential enemies. And second one, to be omniscient, to know everything that's happening at once and to correlate it and use that to predict what will happen in the future. And these goals sound a little bit beyond what you would expect from someone who's trying to simply protect people or stop terrorism. It sounds more like they're trying to become some sort of God who, by collecting and analyzing everything, knows everything that's happening everywhere and can just smite any enemies from above instantly. But the thing is, they aren't a God. They're people working on these things and they're normal people. And they have crazy resources and they intercept a lot of data, but they also use data that's freely available to anyone for a lot of their work, open source intelligence. This is a pamphlet from a startup called XeroFox that uses data from social media to track ISIS. And this is, tools like this are quite common. There's another tool called LLM Wisdom that's made by Lockheed Martin. And they have a wonderful promotional video on their website explaining exactly how it works that I'd like to play for this work. Social media content has the power to incite organized movements and sway political outcomes. It's an opposition terrorist organization in Iran. Monitoring and analyzing the massive and rapidly changing open source intelligence data, or OSINT, and turning it into actionable intelligence for decision makers is an imperative. Lockheed Martin's Wisdom Software Suite offers an advanced capability to collect, manage, and analyze vast amounts of open source data, enabling analysts to understand, measure, and anticipate real world events through social media. Think of wisdom as your eyes and ears on the web. Wisdom is that tool that would allow you to do this at scale. Wisdom's advanced big data collection capability and data store automatically identify and harvest online social networking data of operational interest, as well as socio-cultural data from standard online open sources like newspaper feeds and structured databases. Wisdom's high performance analytic algorithms analyze the content in near real time, distinguishing noise from high value information, capturing trends, sentiment, and influence, turning open source data into predictive actionable intelligence. So that's what they're doing. And they're not just using this to target terrorists. It was recently revealed that they are helping Walmart use this to find employees that are organizing for better working conditions and find the main organizers and fire them using data from social media. So it's used for corporate purposes as well. And Elimism wasn't even made for surveillance in the first place. I tracked down one of the people who created it. And at the time, he worked for General Electric and was hoping to help NBC make tools so that they could figure out which sites to partner with to make their videos go viral. So it's not just governments that are using open source intelligence because there's no barriers to access it and there's many applications. There's even many people search databases that have information like people's address and phone number and relatives and how old they are. And these include many, many people, probably everyone in the US. And they're used by many people for all sorts of purposes from private detectives to people that are selling advertisements. So if this data is available already and it's used for everything from figuring out who to kill to stopping unions from organizing to trying to sell things to people, why can't we use it to understand surveillance programs too? Why can't we use it to understand human rights abuses? Why not use it for accountability? So we've started to build tools to do this. And in the near future, we'd like to make it possible for anyone to make something like ICWatch or other databases in less than a day and without programming. And long term, our goal is to build software similar to what the intelligence community has. Things similar to Elimism, things similar to real-time regional gateways so that people can collect all of this information in one place and analyze it. And I'd like to show a demo of some of the tools that we've been working on. It's possible that just won't work at all, but we'll see. So this is Harvester. It's a tool for collecting data from online sources in an automated fashion. And you can choose different data sources. So say, here's Indeed. This is a resume website. And say you want to find anyone who mentions X-key score. And for sake of timing, let's just get people in Maryland and start collecting. And it might take a second because it's still a bit rough, but it opens a browser, goes, finds all the people who mentioned X-key score in Maryland, and it goes and downloads all of their resumes in one place. You can kind of see them as they download because this is being slowed down a bit right now. And that's just for our excuse, it's fairly small. That's it. Takes a second to load. Still kind of rough. So we're hoping to add many different data sources to this so that people can collect data from sources online, as well as just take a pile of PDFs on their computer, point at the directory, and it will load them in OCR them and people will be able to search through them in a searchable database. So while this is loading, why don't I go and walk through some of the rest of the pipeline? So our goal is to have tools for collecting data, loading it into a database, and then tools for matching data across various sources on the same person or the same company. So to take someone's resumes and social media profiles and everything and link it together and then also link that to the companies they've worked for, the other people they know, the locations they've lived. As well as tools for extracting things from data. So to be able to go through a resume, extract all the code words mentioned, to be able to go through a document and extract all the companies mentioned and generating entities that way, then tools for searching through data and databases where you can search for search queries and browse by categories, and for viewing data and network graphs and maps. Let's see if this is done. Right now it just shows the raw JSON. The connection between tools is a bit rough, but we should be able to index the data and load it into a search tool. Take a second. Hopefully this works. Oh, it's going. So it takes a little bit. Index. So there's a searchable database on all the people who are working on X key score in Maryland. So I think that in using this free software and open data are really the keys, because we have far, far fewer resources than the intelligence community. And we don't even have the resources that accompany like Lockheed Martin is. We can't internally build all of this software and hope that we will anticipate every future use to be able to help people adapt to that. So having people be able to take our data, take our tools and adapt it to their own situations is absolutely key to actually ensuring that they're useful. And there are also a lot of open source tools that the intelligence community has released, like Acumulo. The thing that's used in real-time regional gateway was released by the NSA in open source. And Gaffer, which was a graph database recently released by GCHQ. So we can start to take those and possibly also build on those in some cases as well and use the same tools. And that's appropriate, because our goal is to enable people to collect and use information in the same way that the intelligence community can. But well, I think that we should aim to collect it all and collect all the information that we can. I think we also need to be careful to avoid a lot of the mistakes that the intelligence community has made. Because some of those mistakes are quite bad and lead to people being killed for no reason at all. And it's quite absurd. And the main one of these, I think, is dehumanizing people. Torture techniques are specifically designed to dehumanize people. When people are looking at data that they've intercepted, they're not looking at a person. They're looking at metadata. They're looking at numbers on the screen. It's not something that's easy to find a way around. So when I was working on ICWatch, I was grappling with this problem quite a bit. So I decided to try to see who some of these people are and try to put faces to these issues. So I started going to intelligence conferences. Many of these conferences are quite open, you can just go in. And I wasn't that out of place either. I just told people that I made tools to collect and analyze open source intelligence. There are many people doing similar things out there too. Like I met the XeroFox people who were one of the examples I showed earlier at one of these conferences. They were actually very, very nice. And there were also some people who were quite interested in what I was doing. There was one recruiter from Northrop government who seemed somewhat interested in hiring me. And I looked her up later and found a bunch of job listings where she was trying to hire people who were to work on programs related to XQScore. It wasn't all good. I got kicked out of one conference. I got some strange requests. Like there was one guy who was trying to figure out how to use open data to help venture capitalists figure out what porn the founders of the startups they funded watched. I'm not sure that's even possible, but it was really weird. And he was asking me for help. I don't think I can help with that. Sorry. And of course there was some negative comments on things like Manning and Snowden and some confusion. Like there was someone who was making insider threat detection software who was talking about how it would stop a situation like when Snowden leaked documents to WikiLeaks and things like that. So people don't actually know what's going on. But generally, most of them were decent people. And some of them were quite nice. Some of them were quite funny. And some of them really seemed to think that what they were doing is saving lives. So they're not evil people who want to hurt others, but they're not infallible either. They're human beings. And our strategy of looking at individuals scares a lot of people. But what you have to realize is the institutions are made up of people. And it's easier to just look at the institution. It's easier to just look at an abstract program, just like it's easier not to think of the person who you just decided to kill in a drone strike as a person. That's why these things continue to happen. And I think that there's a lot of benefit to looking at people as people, both to avoid some of the problems that the intelligence community has, as well as because people's data trails are part of the data trails of institutions. And if we're only looking at institutions, we're missing part of the data trail that people leave. Of course, no one person is responsible for the wrongdoings of all of the intelligence community. So we shouldn't demonize any one person. But these are the people who go to work every day and perpetuate the actions of the intelligence community. So I think everyone involved is a little bit at fault. And the other benefit of looking at people as people is that we can start to understand them. We can start to understand what their hopes are, what their fears are, how they see the world, what upsets them, and what might cause them to change their behavior. And from that, we can start to maybe come up with alternatives. So let's look at some of these people and look at some of their stories. This is Jason Epperson. He works on intelligence collection for special operations. In his fair time, he enjoys coaching children in sports. And he currently works at the US Special Ops command, helping different agencies collect data, share it, and figure out what data they need, just generally helping them integrate it. But, and he started his career back in 1998, also working on collecting data for special operations. Then later, in 2004, he went to work at the United States Central Command and the NSA Cryptologic Services Group. And he was focused on tracking down high-value targets in individuals. And he claimed that as a result of his work, numerous high-value individuals were captured or killed. And this is especially interesting, because he was working on this in 2007 when Prism was launched. And at the top of his resume, he lists in his specialties Prism as possible that that's kind of a generic word, but based on his background, it might not be. So, I think it probably is actually Prism. Then after he was working there, he went and started working on counter-radicalization efforts, things like boosting the capacity of Muslim faith leaders to win hearts and minds and establishing competing social networks to counter al-Qaeda ideology. And he's very clear in this job description that he's not killing people. He's just helping allies of the US figure out who to set inter-pol notices for. But the most interesting thing about him isn't any of his jobs. It's this publication that he has at the bottom of his resume called An Examination of the Effective Government Data Mining on US Citizens. And this is clearly an area where he has a lot of expertise. And to present this at a conference back in 2010, I still don't have a copy yet. It's not easily available. I think it might be possible to get either by buying it from the company directly or by going to the Library of Congress that seems to have some copies of the conference proceedings. But it could be quite interesting, both because he was relatively high up. He was in command of nearly 400 people back when President Stalin was working with the NSA. It's possible that he had some role early on in the program, and this might provide some clues. And then also the little data mining on US Citizens a bit in the title is kind of interesting because that's supposed to be the last protection. I think it's kind of a super protection because most US citizens wouldn't find it very comforting if the Chinese government said, oh yeah, we have a master surveillance program, but we only spy on people who aren't Chinese citizens. That's not really comforting to them, so I don't see why it would be. But it's been the one thing that people have been repeating. We don't collect it on US citizens. And just seeing that in the title of a paper is like a tiny admission that maybe they do. So some of these profiles tell other interesting stories about people's lives. If you've seen any of my other talks, this is someone you've heard me talk about a lot, Solomon Varnado. He spent most of his life in the military and intelligence community focused on signal intelligence and geo-location. He took down his resume after I see Watch launched. But I actually recently found another resume of his on another website that has additional information, like on the side in the military, he ran diversity programs and a sexual assault prevention program and other things like that. I first came across this profile because he mentions a lot of interesting code words. This is probably the first known mention of X-key score back in 2004, 2005. But these aren't the most interesting part of his resume. Later on, after he works on intelligence collection, management, just standard signals intelligence collection, he goes and he works for L3 Stratus. And there he says that he identified, collected, and performed direction finding of specified target signals using Penetrace, DisplayView, and SEGs. But I wasn't sure what Penetrace was, so I found a definition very conveniently located at another resume that said it was an airborne collection platform for Penetrace. It sounds like it's some sort of signals intelligence collection platform. And the other interesting thing about this job is that he said that he called for external review of intelligence management processes, which is not something I see normally. And he was there for a fairly short time, only a couple of months, after staying at most of his other jobs for over a year. And then in his next job, he was also there for only a couple of months. He was working at Flurrybus International, also on drone intelligence. This time, definitely drone intelligence from PredatorDrones, because he mentions AirHandler, which we now know more about thanks to the catalog released by The Intercept. It's a processing system for geolocation data from PredatorDrones. And the update to ICWatch includes all the data on all of the words mentioned in that catalog. But, and then he leaves the intelligence community entirely after that job. And he goes and works as a used car salesman at this used car dealership. And it turns out he's actually found from his other resume that I just found. He's actually quite a successful used car salesman. He's won a bunch of awards. He's one of the best salesmen in the region. So he's doing quite well. And he won a bunch of awards in the military too. So it seems like he's fairly committed to what he does. But so that's quite a huge career change. And it sounds like maybe he was starting to get upset with some of how things are being done. And he couldn't figure out a way to fix it after calling for external review. So he just left. And then this is Michael Dial. And Michael Dial is a pipe fitter and a plumber. And this is him with his family. And yes, he's actually a pipe fitter and a plumber. But he's not just any pipe fitter. He has security clearance. And he goes and he fits pipes in secret facilities. As you might expect, he does a lot of pipe fitting for naval ships. And he also does things like he goes to embassies and other secret locations in Afghanistan, Iraq, Ecuador, Serbia. And that's up their pipes. And then he also did some pipe fitting in Djibouti at some sort of homeland security facility, which coincidentally is also where many of the drone programs are run out of. So there is some interesting cases like this where there are people like Michael Dial who aren't involved in intelligence at all directly. But the information in their resume still provides very interesting, useful details about where secret facilities are located and other aspects of the intelligence community. Because secret facilities don't just materialize out of thin air. They need people to build them. They need people to operate them. And so from tracking down these people, we can start to map them. And then there are other useful things like we can figure out which company has cleaned the NSA. And I'm sure that has all sorts of useful applications. So this is Helyana Costa. He lives in DC. And he works for the DOD. And this is him at his high school graduation back in 1988. He's been working in military and intelligence for nearly 20 years. And back in 2003, he worked on PSI-OPS programs. Specifically, he worked on PSI-OPS programs in Paraguay, Columbia, and Bolivia. And these were in support of the DEA, the Drug Enforcement Agency, and the CIA. And there are a few other reason I see what you mentioned involvement in PSI-OPS in Latin America for the DEA. It seems to be quite an extensive thing, especially since I didn't collect any data on this specifically. And there's just suddenly a bunch of people in the database on this. So maybe worth looking into a bit. And then after that, he went and he worked on PSI-OPS programs in Iraq. So it's kind of interesting. Then he went and worked at the DOD on human intelligence. The other interesting thing about Helyana Costa is that he's one of the people who deleted his resume after IC Watch launched. And that was how I found him. So after IC Watch launched, a lot of people were positively interested in it. But we also got a lot of threats. Because it's really absurd, because all we're doing is collecting the information that people explicitly, independently, willingly posted online about their professional lives, or not posting addresses or anything like that, and making it more searchable, just like Google does. But a lot of people in the intelligence community contacted us. And for the first few weeks, we saw a new response every day. Some of these were kind of interesting and reveals some sorts of nonsensical mindsets of people in the intelligence community. Like this guy, Mrs. Alexander Arunovich, he sent me actually a nice email, a very nice email. It was really nice, saying that he couldn't understand why he was in IC Watch, because he wasn't involved in surveillance. And he was working at a private company that had nothing to do with surveillance. So I looked at his profile. And I saw that he was working at Unit 8200, the Israeli Intelligence Unit, which, OK, they have mandatory military service. It's not that weird, though he was there for several years, not just the mandatory portion. And this is the intelligence unit that spies on Palestinians. And then I looked at where he works now. And he works for a company called Verrant. And according to their website, they make software for analyzing data from wiretaps. So I think that has to do with surveillance. I'm not sure why he interpreted that as nothing to do with surveillance. But it's an interesting interpretation. I think it makes sense for him to be in the database. But of course, for any particular profile, there is some noise, so it's up to whoever is looking at it to make their call and do the research. And sometimes other people who complained also helped us find interesting details. Like this guy, Joshua Lively, he's one of the people who reported us to the FBI for domestic terrorism. And he worked as a linguist at this company. And I looked at his profile, and he mentions a lot of interesting code words in it. And some of them didn't make so much sense at the time. There's this thing called SEDB. And then a few weeks later, the Intercept released this article on a thing called Skynet. And it's used to use machine learning to analyze travel data from telecom providers. And SEDB is one of the databases that they use. And he coincidentally has a lot of the databases that are used in this list of the skills. And as a linguist, proficient with the language is used in the region that's mainly targeted in this. So I'm not sure if he's involved in this particular program. But it seems like he's involved in something somewhat similar. So it's quite interesting. And generally, there are a lot of angry people in the intelligence community. Some were nicer than others. And we're just asking questions. Or being like, can you please take my profile down? Some of them were afraid. Some of them were more violent in sending things like death threats. And our server started getting hit pretty hard. And I see watch kept going down. We wanted to be sure that we weren't going to be compelled to take the data down some way. And the easiest way not to be compelled to take the data down is to make it so you can't really take the data down yourself, and then people have much less incentive to go after you. So we moved I see watch to WaheeLeaks, which has been great. And they have been wonderful helping with all this. So thank you, WaheeLeaks. And as I mentioned earlier, a lot of people have taken down their resumes in response to I see watch. Specifically 1,030 people have out of the original 27,000. And others have edited them or made them private. So as part of the update, in addition to doubling the number of resumes available, we also recollected all of the initial resumes. And you can go on the site and see which ones are removed, which ones are made private, which ones have been modified. And all of that is flagged. So you can easily see how that's changed. And some of these reveal details that people hadn't posted, might have wished that they hadn't posted in the first place. But they also provide useful updates on where people are working. And we can start to track people as they move from job to job. For example, there's this guy, Michael Acosta, from the original I see watch. And from 2011 to 2012, he worked at Guantanamo. He was primarily trying to find out about potential attacks on Guantanamo itself. And he monitored various detainees and collaborated with the behavioral science team and was trying to figure out if detainees were planning some sort of coup, I guess. And then he started working for the Air Force. And here he was working on drone intelligence and targeting and says things like how he was responsible for the production maintenance and upgrade of DGS2 mission critical intelligence databases, which include high value target development folders, like the things used for JPL targeting, regional threat briefs, mission storyboards, and mission target logs, which document FMV mission rollups. But the most interesting thing on his resume isn't any of those things. It's the thing that changed between the original launch of I see watch and now. And that's that he moved and started working for a different company. Started working for this company called SOS International as an all source analyst. And he unfortunately had to leave the position that he had on the side coaching high school baseball, which he seemed to really like. And he clearly liked it because right now he's looking for baseball opportunities in Germany. So he seems to be in Germany working for this company called SOS International that I've never heard of before. So they went on their website and they have a list of the cities that they operate in in Germany, these six cities along with Guantanamo and a number of other sketchy locations. And based on Michael Acosta's past record of working at Guantanamo and on drone targeting and things like that, it sounds like this company is probably doing something quite sketchy. And by tracking changes to where people work, we can start to find things like this that we might not otherwise think to look at, that we might not otherwise jump out as interesting. But it's not just open data that we collect. Because the same tools for collecting and analyzing open data are also useful for other data sets that are useful. Like we made a search tool in collaboration with Courage Foundation for all of the published Snowden documents that allows you to search the full text of the documents, browse which code words are in this document, see documents that mention particular countries, see the full PDFs and articles. And we also made a, when the hacking team data came out the summer, we mirrored the data and became one of the primary mirrors of the data. We had a torrent that was almost done in a server with a lot of space. We figured not a lot of people had that, so we put it up. And that got a lot of traffic. It got about 57 million hits in the first two days. And soon we realized there was a problem where our server charged a lot for bandwidth. And it cost us $48 every time someone decided to download the 400 gigabytes with WGIT. So that was interesting, but it's been resolved now. And it's hopefully made the data more accessible to people who don't have 400 gigabytes of hard drives space available or enough internet connectivity to download that. So then we've also made a search tool for all of the hacking team emails that has a search interface that lets you browse them like you would in a normal email client with threading and a network graph so that you can see the connections between senders and recipients. So the intelligence community has a variety of collection disciplines, signals intelligence, open source intelligence, humans intelligence, measurements and signals intelligence, imagery intelligence, and they have all these different sources that they're gathering data from. And I think that we should try to duplicate this, because there are a lot of different sources that we can gather data from as well. And we need to find ways to better collect data from all these sources and diffuse them together. And so these are some of the ones that I've been spending a lot of time looking at. There's open source intelligence, things like ICWatch where you're collecting data from purely public sources. But this is just part of the broader ecosystem that we can draw on. This is mostly information that people and institutions think about themselves publicly, either intentionally or unintentionally. And it's really difficult to use because there's a lot of it and it needs to be collected and matched up and pulled together in a browsable way for people to be able to use it. So you can't really just manually go and use it at scale, you can do it a little bit but not nearly enough. And so we're working on making this easier to use. The other sort of data that's recommended is leak documents, documents that whistleblower send journalists that they think should be public and these often pretty explicitly reveal corruption or human rights abuses or other issues. But these can also be used to collect more data. Like we use the published Odin documents very heavily to find code words that we could use to collect the data in ICWatch. And once we start to collect data on secret things that were recently not known at all, but now are, and we can find data on that, we can start to find data on unknown code words and unknown things that we might not otherwise recognize. And then there's data released by governments through FOIA requests through open data initiatives. This of course can be spun or things can be held back. So it's not ideal to use on its own but it can be used like the other two types with in combination with each other. You can use that to provide context, you can use open source data to frame FOIA requests and things like that. So the goal of transparency toolkit is to make it easier to collect all these types data in one place and to start to use this data in the same ways that the intelligence community uses the data collected from all their various collection disciplines. Except our goal isn't to kill people or be some sort of omniscient cod-like being but we just want to build some sort of external structure of accountability to make it easier to uncover and understand things like surveillance programs or human rights abuses or corruption. And when we can find the people and companies that are involved in things like surveillance, we can start to map who's doing what. And we can start to request information about specific contracts and we know who we can ask questions about particular programs and then we can start to use the data to start legal cases against specific companies and we can start to take more concrete actions than we would be able to otherwise if you were dealing simply in theory or in guesses as to what's going on. So open source intelligence lets us be more proactive and more direct with our techniques. And many, it also makes us, lets us find some of this information earlier because many of the programs mentioned in the SODA documents are mentioned first in other, in open data sources. And if we can start to figure out where these are and start to figure out what they are, then we know what data we're missing and we can start to go after it through FOIA requests or trying to find it by other means. But all of this is a really, really big project and we can't, this is not going to work if it's just us working on it. We need to work with other people. We need to work with activists who have ideas of how they want to use the data. We need to work with journalists to collect the data and write stories about it. We need to work with human rights lawyers to help them with their research and help them build legal cases based on the findings. We need to work with NGOs and human rights researchers who want to collect and use open data in their work. And we need more people going through databases like I see watch. This doesn't require any special expertise. You'll gain the knowledge that you need as you're going through them looking up terms. It's not easy, but it can be quite interesting once you combine all of these obscure terms and say, oh, that's what they're doing. And oftentimes what they're doing is something entirely absurd, like reading all your email or killing people. And we also need software developers to help develop software and help us figure out how all of these tools should fit together. So if anyone's interested in working with us to take on the intelligence agencies of the world and figure out what they're doing, please let us know. I think it sounds a bit insane and I know that, but if you say I have far more resources and far more experience, but if we keep ignoring the situation or we continue as we are now making scattered attempts to change things that aren't coordinated, they're based on limited information, nothing is going to change long-term. So I think we need to collect all the information we can and figure out how to effectively combine it and use it for concrete goals. And I think we need to do this with free software and open data because against such powerful adversaries, they're probably the best hopes we have. Thank you. Thank you so much. Now we have the round of Q and A for anyone who would like to ask a question. Please forward to the mics on both sides of this. I'll start taking the question from, yeah. So I'd like to ask about documents which are scans, which are sometimes released as official open source information. What kind of workflow do you have or even if you have any kind of workflow for some OCR on these? Yeah, OCR is tricky and it depends on the document. There's some open source software called Tesseract that's quite good, but it doesn't always work in cases where there needs to be more specialized parsing. Look, I had to use something called Abbey, which is officially not open source. I'm really looking for an alternative for the published notary documents because we needed to extract the classification headers and that wasn't working with Tesseract. But Tesseract works for most things. Yeah. Thank you. Do we have a question from Yahweh? Yes, Rudy is asking on IRC, what would you recommend the NSA to develop towards a future of social usefulness? For example, what value have databases from 2015, people's cell phone sensors in 2115? Could you give the NSA maybe CAO their useful work? Can you re-fit that last part, sorry. What recommendation would you give the NSA to develop towards a future of social usefulness? Social usefulness. Well, probably the most useful thing they could do is stop collecting the data in the first place, especially the data that's being intercepted or illegally collected. There's probably some amounts of useful tracking they could do, but I'm not sure that's the best approach using the tactics that they are to collect the data at this time. Thank you. So, next question for me, please. Hello, thanks for the talk. That was one of the best ones I've seen this Congress. I was wondering what you think about the question you were raising about we shouldn't make the same mistakes because I'm not totally sure that's possible because of things I've seen in other communities or communities have their extremists and they will abuse this data. And then that allows a political attack on you because they say you made that happen. It's not true, but it will sell to people. So, how do you protect against that? I think it's hard to entirely protect against it because we can't control the actions of other people, but people could also go off and use this data negatively by collecting it on their own independently of us. I was actually quite impressed after we launched ICWatch. I haven't heard of anyone complaining about threats that they've gotten from people in the intelligence community. I haven't heard of anyone in the intelligence community complaining about threats that they've gotten as a result of ICWatch being launched. All the complaints have been theoretical. The only threats that I've heard of resulting from ICWatch are from the intelligence community to us. I haven't heard of anything, so I've been very impressed with the civility of the internet in that case. And I think that maybe by framing it and actually bringing it down to the individual level and making it clear that these are people that makes it a little bit less likely that people will go after them in a vicious way. Have you thought of creating a kind of usage guidelines? I mean, that's not gonna change what anyone does, but if someone does something, you can then say that's against our usage guidelines and it's a political defense against someone accusing you. Yeah, I don't think there's any way that we can enforce something like that, but we do try to be very careful with how we're framing it and saying, like I've spent a long time in this talk saying these are people that are not evil people, they're normal people that you should look at as such. So I think being very careful of framing it and maybe developing some sort of guidelines that's definitely a good idea. Thank you. Hi. First, thank you very much for this tool that makes it possible to fight back against legally for people who try to punish or, yeah. What I have to say on my question is, I worked in the last three and a half years, let's say in the field of IT forensics and I worked with Maltego and stuff and so I know what a lot of work it is to collect data and bring it into a good condition so others could read it or you can get a goal or see a goal. And what I personally think is very important, this could be very sensible data to people and my question is how do you care that this data which you will offer to download will keep safe? So and that's the first question and the second is did you think about verification? So we are collecting a lot of data and in a few years I want to see another person wants to see if this data was correct. So do you verify the sources like MD5 some or so you can say okay this fingerprint taken at this day and this time is correct? So for the first question, I don't think there's really, I'm not sure particularly why I protect it because this is a variation of people posted publicly themselves so they sort of said that they don't want it to be protected or secured because they're posting it on the public internet. So I'm not sure there's really any reason to try to protect it when it's something that they've published very publicly. And on the second one for verification that's quite tricky with some of the data especially around the intelligence community because all of these things are secretive and it's hard to confirm them. We can confirm them against each other. Like now we have multiple resume sites on IC Watch so sometimes we can find the same person's resume on another site and compare over time and we can go find other profiles they have and try to combine as much data on the same SD as possible and have it over time. What I did, I made a fingerprint when I downloaded the website, made a fingerprint and then I can say okay. Collective then? Yeah. I mean that's a bit harder to absolutely do that. I mean we have all of the full text of the webpage and then we have it all published on GitHub so you can verify that it was collected then but yeah. We will take the questions from up there. Hi, community extremist here. So I wanted to say something which is that I think what Julian did for leaking documents you're doing for analysis, which is really great because transparency is enough, you need action. And so I just wanted to say that I hope that everyone can give MC and Transparency Toolkit a lot of material support and maybe a round of applause. Yeah. Definitely the best talk at the Congress and I had a couple suggestions but one of them is I think it would be great if you could focus on American domestic police agencies in particular collecting the images of police academy graduation photographs and to be able to move in the direction of facial recognition so that we can find undercover police officers that are in our midst. And I think it would be great if you could create a FOIA wizard essentially because everybody likes wizards and who doesn't like UNIX. So it would be great if you could create a FOIA wizard where you could say I want to know about these terms and then it would just generate automatically maybe by partnering with Muck Rock for example. Interesting things where there's a kind of weight where you realize there's a lot of people working on this classified program and it's at this agency and they have a contract with this company and these are the people involved and just automatically generate those FOIAs and then get people to sort of sign up to put their name down and sort of sponsor a little transparency and to say oh that's the FOIA I wanna get behind I'm gonna check on it once a week I'm gonna do this thing through Muck Rock and I think that that would be a way to take this information in a legal manner and to make it actionable and I think there's lots of other interesting things you could do that are not about the law but I'll leave that to the imagination of other people it should be legal but it doesn't need to be through legal channels like say FOIA. So thanks for the work that you're doing MC and I hope that you will expand it to basically all of the pigs of the whole world and I would really encourage you to read Hannah Arendt's Eichmann and Jerusalem because you described a fundamental thing where these people aren't evil but actually evil itself doesn't exist these people are the banality of evil they're people who have soccer practice and they have a dog and they like to go home and fuck their wife and they're regular people who do drone strikes. Thank you. I have a question on mic one. How easy is it to add support for new databases or new sources of information? It depends on the source and how that site is structured but generally it's not too difficult the adding support for new sources does require programming at this point but it's not particularly complex programming we have some libraries that make some parts of it easier as well if you're interested in adding a data source we're more than happy to help with that. Awesome, my favorite is the list of the report of when people were denied security clearance and why and if their appeal was then like removed. Yeah that would be quite interesting. Okay, if there is no further questions. Yesterday it was said that we have to make sure that they know that we watch them and make sure that they know that we watch them because someday they will get prosecuted so in some way I think you are exactly doing this so this is brilliant. Are you already at the stage or in the stage where you are thinking you can start concrete legal actions against some individuals that you are getting information with your tools? We've been working with some lawyers towards that we are looking to do more in this so if you know if you have any ideas for particular situations or this may be applicable or lawyers that we should work with let us know but we're working towards that and making some progress. Thanks. Again, question from up there please. I just want to say that you are a visionary who is more passionate than anybody I've ever collaborated with and it's a total honor. Thank you. Yeah and just further on that's Brennan who also works on transparency tool but he made the awesome UI for a harvester and looking glass that you saw and helps us all this. If no one else is going to ask a question I'd like to ask a question which I know the answer to but no one else in the room does and I think it's very fascinating. I wonder if you could talk about lessons that you've learned from studying about the South African resistance to apartheid and maybe you could talk about the things that drive you to work on these things. For example, what inspires you to justice? For example, experiences at MIT and maybe I mean if you don't want to talk about it I'm sorry for asking it but if you do want to talk about it I think you can inspire everyone else here to raise to raise their fists with you in solidarity. Yeah, okay. I guess it's been nearly three years now so maybe that's okay to talk about. So three years ago there was this case at MIT well everyone's probably heard of it in shorts and there and he was being prosecuted for downloading documents from JSTOR and I was brought in we were trying to figure out MIT's role in the situation and if we might be able to sway public opinion a few people in Boston I think some of whom are in this room and we were trying to help him and eventually partway through the process he became afraid and decided that it would be more risky for us to help him that the prosecutor might lash back so we stopped but one of the things that I did in this process was I sent out a survey to all of the professors at MIT asking their opinion on his case and whether they identified with his actions and one of the and I got a lot of responses to the survey some were quite nice some were quite supportive some were very vicious saying that he should go to jail and that he's a waste of humanity and he works at this Harvard Center for Ethics so how is this ethical and things like that they were quite horrible and initially he had access to this database and somehow over the next year when we weren't doing much he lost access to this database and he emailed me asking for access again and back then I was on some stupid kick about research ethics and redaction and thought that there's no reason to really say I was like well can I give you the answers without the names which is stupid because the names are the most useful part of that data and I kind of abandoned him along with a lot of other people in that and I feel like if I give him the names that might have been something that could be used to find supporters within MIT or people who were rallying against him and I don't think it would have made a huge difference but it might have made just a little bit and that was one of the things that really showed me the power of data on individuals and the role of individuals within institutions and I feel like I really failed there so I don't want to do that again. Thank you, unfortunately we need to wrap up because we are out of time. Thank you for attending this very interesting lecture and quite touching in the end.