 I want to introduce Redbeard to you and he's going to talk about new data mining techniques and Aggregation techniques of all kinds of data like DNA and pictures from the web for face recognition and open source Implementations of this give a big round of applause to Redbeard So as was mentioned, my name is Redbeard The talk idea that we've got going on here is going to be a survey of data collection techniques And how dragnets are increasingly being done on a larger and larger scale, you know The idea of a dragnet being where you are mining a specific corpus of information About a group of people in order to narrow it down very quickly without having to do necessarily the same due diligence of research that you'd have to so really what we're doing here is we are discussing where Or what happens when data collection starts going haywire here? I'm here today representing the Institute for anarchist studies Out of the United States We do a number of programs every year including rat renewing the anarchist tradition and a few other things Feel free to check out the website. There's all kinds of interesting things going on We publish books things like that, but moving on interesting stuff. So the main idea of what we're talking about here is paranoia you know and really Why shouldn't it be I mean? We're really talking about paranoia. I mean we've got so many different angles and vectors in which We are leaking data constantly and there is someone there With the digital vacuum cleaner to both pick it up as well as sell it and market it and use it against us. So We're really going to focus on two areas here And that's going to be both the before and the after the before being the you know The process of what companies are doing to begin to collect this data the strategies that they're taking and the after is going to be what you do Once these strategies are in place We're going to cover data collection Who is responsible for doing these things and that's both on the before side on the after side We're going to discuss data processing as well as some of the countermeasures and then towards the end We're going to get into the ramifications for everything. You know what's actually What's happening when the rubber hits the road here, so we're just going to jump into it and start talking about data collection So we're going to be discussing Some facial recognition techniques where we're going to be discussing DNA profiling DNA collection Again, there is so much to talk about in the sphere that It's going to be 10 miles wide sorry 60 kilometers wide and About a centimeter and a half deep So you'll have to bear with me. I mean we're there's going to be more traditional biometrics techniques iris scanning fingerprint scanning very Brief talk about to Geoprofiling geolocation data and then also we're going to discuss surveillance cameras surveillance cameras and specifically facial recognition are pretty Fascinating and it's getting borderline Terrifying some of the stuff that's able to be done on that So Let's just jump in this and talk about who is actually responsible So one of the core focuses of this is going to be the infrastructure component of it These are the companies that are making their business building various components to spy on you as much as possible and One of them is Oracle Larry needs a new boat. So he's going to keep on coming up with all kinds of Neat ways to collect as much data as possible one of the newer ones that he's that Oracle is Produced is Oracle data mining You'll see that there's lots of things in quotes throughout this which is where I've found just Genius marketing quotes on all of these things Oracle data mining is a modification to the sequel developer suite that allows a Business logic individual to point and click their way through all of the data That's been collected and shoved into an Oracle database now It's only actually as powerful as the intelligence that's been put in there You know, you're going to have to make sure that things are the right data types You're gonna have to make sure that the right pieces of information have been collected in the first place but if that's been done and Don't worry Oracle will give you consultants to ensure that this has been done right at a at a very large premium Then they're just gonna keep selling you software that allows you to start looking at this data in new ways and getting at it faster and faster and faster You know another company Here that we've got is the Department of Homeland Security and this isn't something that folks would traditionally think of as an infrastructure provider But they are I mean one of the main things that they're now doing is the global entry program You know, I'm sure that any of you who have come to the US recently have been greeted by our lovely individuals and customs and border patrol Who are just itching to get your fingerprints and a photograph and all kinds of stuff And what is actually happening here is they're working towards building The largest corpus of information on individuals Held by a government. I mean that's that's one of the goals of this is to be able to say We have this information on individuals and we have gone to the effort of indexing it in such a way that we can quickly look up things you know the the argument for global entry has been that you know The the tourists who came after America 9-11 they they use their real names on that airplane And if we had just been able to look it up the right way then none of that would have happened Well, I mean what it should occur. There's there's a lot of things that went wrong in that situation and I Don't think that personally, you know having every single visitor who comes to the US, you know Have a dossier built on them is necessarily the best strategy again Could be wrong. I'm one guy. Oh Yeah, and now they're building in facial recognition and iris scanning too So when you get that photograph done, they're gonna make sure that they get to real close up in there But we're gonna be discussing some of this here in a little bit, too Another company that's pretty interesting here is digital signal corporation digital signal corporation their whole deal is that they are marketing Things to make the world safe, you know again there that's that's their big deal You know they are delivering the only Precision long-range Identity solution and we're talking about facial recognition We're talking about being able to take a high-resolution camera and take a photo of you and map it in the right way and collect The right data points out of it and then store it in such a way that they're able to make sure that you are you and someone else's Now the fascinating thing about this is one of the rollouts of it is going to be the 2014 World Cup Where their goal is to give people or give the the military police because again, this isn't just you know Standard US police. This is dudes who will be running around with machine guns while trying to keep people safe at the World Cup they're going to give them as as the Tribune in London said Robocop glasses now All that I imagine happening here is one of those guys Kirking out and Ed 209 Telling somebody that there's 10 seconds to comply and murdering them, but this is just me kind of Living out my 14 year old suburban white gi Joe watching fantasy um Moving on so we also have the FBI. We had Department of Homeland Security. Why not have the FBI? And specifically that's because the FBI runs this thing called CODIS now CODIS is Stands for two things. They couldn't just make it one of course it's well specifically the CODIS unit Handles the combined DNA index system and it handles the national DNA index system and You know, it's your standard sort of thing where CODIS is going through and it's Saying that we are going to be responsible for kind of the master DNA database for law enforcement in the United States, but Don't worry. Don't worry. They're gonna make sure that selected international law enforcement has access to this too So I realized that folks here in Germany were kind of getting skittish that they might not be able to be tracked by the US Database don't worry. It's it's gonna happen. You know, we're we're ensuring that you know we don't want to leave anyone out here and You know and in the end, this is just infrastructure. This is just the components where we're talking about You know, it's the stuff, you know at the end of the day. It's just something in a box You gotta have somebody to do something with it So we're gonna start out with the low-hanging fruit here and it's gonna be funny slash boring slash a big face palm but you know we'll just start out with America online and the only reason that I bring them up is their 2006 data publishing situation that they had where they publish a search corpus where they removed Identifying information from it and replaced it with a unique key now each person had a unique key tied to them So if you did a vanity search in that three-month period Well congratulations you identified yourself, but there were some really fascinating searches here, too That you could go through so you'd find someone searching for murder and untraceable poison And insurance money and how to hide the body and you know while this doesn't give away anything per se about that individual It does show you that you know you can that it's trivial for these companies to be able to go through and look at What happens at what one individual is doing and this is just one search engine and and I I'd go so far as to say that this is kind of the lowest common denominator search engine I mean we're talking 2006 and these are people who have logged into AOL Probably because they still think that they need it on top of their broadband connection and Type something into the keyword bar now This does not make for the master criminal who is trying to figure out how to hide the body and collect the insurance money But it does on the flip side of it show how You know a company with a bit more aware with all in how they were to track things Might be able to do something with that now I would go so far as to say that at the same time Google is well-worn territory, you know we We can sit in him and ha about it There is enough other people talking about Google that we can just skip on and Facebook You know they're marginally interesting, but I'm much much more interested in face comm and that's one of their partners face comms whole deal is that they are providing facial recognition API's to other websites now just like many Data stream providers they have a sharing agreement. So, you know, we'll we'll show you ours if you show us yours and You know the The end result of this is that you're ending up with this hosted service that Facebook's you Facebook uses and Lots and lots of other websites if you do a Google search for photo tagger I mean you're gonna find quite a lot of chaff in that but at the same time There's some fascinating examples of people uploading like 25 photos of them and their friends And it's like oh, I see that you know Tom Dick and Harry are all in this and you know Picks out with little boxes over their faces and the reason that exists is because it's been crowdsourced Your friends post your photo up there Even though you don't have a Facebook account and they tag your name to it Or you do have a Facebook account and you don't remove the tag and well that being said Facebook doesn't delete anything ever So even when you have been tagged can remove it publicly But it's still there and it's not going away Amazon Amazon's one is Both neat and has been in news quite a bit in the past week And you know we'll we'll get to the the latest instance of this right at the end, but they have 59 records on 59 million active customers That's people who shop at Amazon and for folks who are not From the Pacific Northwest in the United States. You might not know that in that area Amazon will even deliver your groceries so You know They know what books you read. They know what movies you watch They know even more now that you're using their streaming video service and rating it and giving them more information about what you like but they also know that You bought, you know 25 25 bags of popcorn because you're a slovenly individual who wants to then sit and watch Episodes of CSI back-to-back-to-back on their streaming video service, but it goes one step further because recently They've expanded the analytics and demographic information on this by receiving us patent number 8 million 60,000 463 And while he doesn't have anything directly to do with it I would like to give a shout-out to my friend John who works for the US Patent Service and does his best to I mean he works in Video development, but you know video drivers Monitor screens things like that, but we often have talks about the ways in which He just wishes that his job could be easier and it's not by having to deal with bullshit patents like this Which in this specific case is mining of user event data to identify users with common interests. It's a great title for a patent Specifically where the rubber hit the road on this one that the big example that keeps coming out is that I Buy you a gift and then I pay a dollar to have Amazon gift wrap it and when I pick out the image of it I just happen to pick, you know something covered in Crosses and you know it while Amazon doesn't understand that they wrapped it upside down and it's a bunch of upside down crosses You know, they they now think that you're Catholic, but It they're going so far as to making these kinds of assumptions because the information that they store on you is much much much more valuable the more complete is The better of an image that they can give of you to other marketers The more those marketers are willing to pay for access to it You know, I I gotta say I'm not saying anything during this talk. That's that's really rocket science. It's you know it's just going through and reading lots of boring white papers and you know things on slash dot and reading terms of service on websites and I really don't recommend that any of you go to that extent but Going back to our one of the very first things that we discussed that healthy dose of paranoia Is what helps you again on the before side of this? now Moving on from Amazon There's this company called axiom axiom is a data marketing company of a different sort They sell mailing lists. They sell Personal information and when I say personal information, I mean if you purchase their premium service results of your drug tests results of your you know searches on criminal history and your education data because you know after all when you know at least in the US all these universities are hard up for money and They will sell anything they can about you to try to augment their sports budget This company even produces a secondary premium feed called the suspected terrorist watch list now they when Prompted about this, you know, this isn't anything maintained by the US government. This is just their opinion They said that this is so that you can weed out untruthful employees If you think that your employees might be terrorists. I think you have bigger issues If you I mean But let's talk about untruthful for a second because this isn't for untruthful employees The whole purpose of this is so that they can sell these things to advertisers Point blank. There's there's nothing more than that, you know, there's there's gonna be a few individuals who were like, oh, yeah Yeah, if we hire these people the terrorists will win and you know, it's it's ancillary money It's you know, it's gravy You're taking all of the data that you've already got you're putting a slightly different spin on it You're finding a new way to sell it But this is really you know when we're when we're talking about service providers choice point Awesome awesome, they're they're really great. So as of a year and a half ago They contain 17 billion discrete records now. Okay a record is you know a phone number for you It's it's one specific address, you know, it's You know the fact that you know you happen to buy something from Amazon So I mean 17 billion it just it's one of those huge that's really hard to kind of comprehend so But that's 17 billion is on 220 million people so While this is spread globally, you know, we're talking Two-thirds the United States. We're talking a bunch of countries in Europe The entire population is I mean it's spread out This is this is me You know coming up with things off the cuff to sound scary because it is I mean this there is no reason That this information should Should be worth that kind of money but they're getting it for free and they're getting it from weird places and A lot of the times, you know, it might be You know in the US Not who you voted for But whether you vote is a matter of public record Whether or not you went to the polls Now if you're like Joe blow average person, I mean this is data that's published by states in something called A voter file is Going to list Addresses it's going to list your party affiliation and the years in which you vote Well, if you vote party ticket like most people that effectively means that they have sold who you voted for I mean it's it's cut and dry, you know They get a hold of DNA information from prisons Or from it situations like the city of Santa Clara in California city Santa Clara started taking cheek swabs from people who are suspected of crimes not charged but suspected of crimes and Come like this get a hold of that information buying it from federal state and local governments and Then they start aggregating it now choice point specifically was the auditor who was brought in in 2000 during the presidential election when Florida had like, you know, half of a percent margin Because of how badly they designed their ballots and everything else I mean depending on who you talk to there's a lot of reasons there, but in 2000 they nullified 94,000 votes I Feel like a fool right now because I I should have done the due diligence on this But I seem to recall that that being something that very That did a fantastic job of closing the gap in deciding that specific presidential race because it came down to the single state of Florida and this auditing process of which They were apart So 94,000 votes were thrown out due to criminal conviction But the issue is is that only 3,000 of those had verifiable criminal convictions that made them ineligible to vote That's not to say that all 94 or that, you know 40,000 of them did not have something criminal in their past which at one point would have made them ineligible to vote but at the time of the election they were only able to come up with 3% of those individuals whose votes were thrown away that now were forced to I Mean that decided the election So it gets into these questions of how is The fact that you are now also collecting DNA information from people in prison how is That biased how does that allow you to very quickly say Well, I know that this person has Red hair and a propensity to grow pretty kick and beard You know It makes it very very easily to start narrowing down Who you think is involved and implicated in all kinds of different situations? I mean, maybe you're using this information to try to market me the best Conditioner possible and you know if you are then more power to you. I'll give it a shot You know, I'm I'm the same, you know consumer prostitute that most of us are you know You just need to find the right way to to frame it but We've been talking about service providers. Let's let's go into some of the contractors I mean these are the individuals that are actually doing all this stuff There's a huge global contractor called a censure a censure is pretty much just a body shop They will give you people to do anything that you want and they will tell you that they will do whatever you need If you're trying to write an application, then you'll probably get 50,000 monkeys with 50,000 typewriters And if the contract is big enough in the case of the US government, you might actually end up with something I mean 10 billion dollars buys a lot of human time and This dossier system became something that we discussed earlier called global entry. So it's a web and Like I said, it's paranoia and it's maybe these things really aren't connected, but I'm sitting there Like that awful mass market movie Sherlock Holmes that just came out where he's got like strings going all over the place and you know Watson's tripping over it. And you know, there there are moments when I'm like, maybe I shouldn't leave the house And there are many other moments with other people tell me I shouldn't leave the house or that I should put on pants and things like that I mean, it's all beside the point We've got more more contractors to discuss code focus is another great one So Not to cast dispersions here, but I would say that it's a pretty commonly held belief that Canadians are relatively mild folk Just throwing that out there. I mean, this is this is you know, somebody coming from the US, you know We have murders everywhere, you know, you saw Michael Moore movies murders don't happen in Canada all that kind of stuff but code focus They are a company based in Canada and when the when the social challenge when when the the call to action rose after That the city of Vancouver's dearly beloved hockey team lost Yeah, by the way, I'm hesitant to even call them contractors I mean they basically it's giving them too much credit they sat down with the Facebook API for about 20 minutes and didn't even understand half of it and you know kind of came up with a strategy there and They said, you know the city of Vancouver is In the grip of crime you know The hero that they need who is Batman is fictional and won't be coming So they're gonna rise to the occasion and they produce the website identify rioters calm Now if you have never seen this website, it pulls photos directly from Facebook riot porn and Allows you to click on the people in the individuals and or the individuals in the photo and type in their name and It notifies the Vancouver Police Department. Oh And since they're using Facebook's API, it's notifying face calm and a bunch of these other places and You know, remember you can always directly tip The Vancouver Police Department as well But code focus is this company and it's really hard to see but it's you know They're very very specific in saying that it was developed by them for the limited liability Corporation known as identify rioters that way, you know when they Potentially falsely accused people. Oh, yeah, one other thing speaking of falsely accused The website was up for about a week before they threw in the word alleged in front of rioter There's just you know a little bit of minutiae there, but yeah, some of these photos are, you know marginally interesting I do take issue with one of the photos though. I mean that guy. He's just having a great time This guy and this has actually become kind of a meme photo Let me just suggest one thing. I realized that he's a sports fan you know some of us come from a Social justice background, so we're familiar with different tactics But let me just suggest that you don't wear those shoes if you're going to try to blow up a police car And there's someone photographing you I mean, let's just leave your face out of it if you are wearing Custom orange and blue neon Nike kicks You know somebody is probably going to identify you and I mean this is crowd sourcing data entry This is now Literally like a censure 50,000 monkeys with 50,000 typewriter. Sorry guys each one of you has a typewriter you're using it but And Yeah, it's just one of those We don't need to talk about these guys do we I mean It's a bunch of clowns. I mean like the contractor category. Yes, we'll talk about each figure So we've gone over a little bit of who's responsible But at the same time we need to get into the data processing side of it This is going to be you know now that some of this data has been collected now that your friends have ratted you out on identify rioters and now that Oracle has produced, you know, they're data mining utilities. What is actually being done with this? So One of the things is data storage You now have huge huge huge corpus of information what do you do with it and On the one hand you have to come up with customized engines for storing it. Yes when I say customized engines I mean database type things and on the open source side, you know, you've got You can define custom data types within Lots of different open-source databases. I mean you've got that you've got post gis Post gis is an extension to the postgres that allows you to directly store like lat-long data types. So if you're working with Mapping type information you don't have to store it as an int and then hope that you know Your developers are going to do something with it. It builds in actual functions at the database layer So that your code becomes your sequel code can become more intelligent about ripping that information out as fast as possible and it's familiarization with these data types and You know actual real computer science, you know, we're not talking about, you know Getting a comp side degree and then being you know Somebody who's just churning out code for seven hours a day, you know And hoping for an hour and a half lunch when you should really just have an hour I mean this is this is Individuals who sit and model data types and try to come up with the most efficient ways possible to actually store all of this And post gis is one DNA nexus is another now DNA nexus. They're providing a cloud data storage Set up specifically for DNA information Yeah, good luck when that gets popped So but the other side of it is we're moving more and more into object-based storage And this is going to be ways that developers can directly access these things Abstracted out from the disk and being able to get at bigger chunks, you know, we're talking about Partition IDs at a 64-bit length object IDs within that partition is 64-bit length and all of this is built on top of the actual SCSI standard, you know, they're in the process of ratifying the third object storage type here for You know building on the SCSI standard, I mean they they did one in I believe 2003 one in 2008 and then they're coming up With this one now as well And then each one of these objects stores both, you know, the actual data that you're worried about but an extensible metadata So as you begin to build out these custom data types as you begin to make the database side more intelligent You know, you can store information about, you know, you don't have to put your blobs in the database You can now store them on the file system and have them extracted out that way And you know, it links it all together These are those things where Developers and folks folks working with Hadoop and MapReduce and all that they get real excited about Object-based data storage, but I mean this goes beyond into Mining of other data types as well, so But you know, we're talking about this because we're talking about big data, I hate big data But you know big and you know exciting personal experience, you know, I've I was working with a Client not too long ago who their data set that they were working with was one petabyte, you know They and this is spread across multiple like huge sands and I mean this In the grand scheme of things. This isn't even that big. This is just, you know, me being one guy and You know, whom I I'm just some guy with huge beard who standing here talking about gibberish on this front and You know any Individual or any organization for example the CIA who's hand-classifying five million tweets a day and Probably saving a copy of those They're gonna need something where they can easily be able to dig through all of that now Okay, so you've got this these data types. So what do you do with it? You know this light field cameras they go into a little bit of a gray area, you know light field Technology is something that was come up with its Stanford about 15 years ago For lack of a better term It's being able to take a photo at every f-stop at the same time I mean that means that everything in your depth You're taking a photo at all depths of field and the way that they do this is, you know They've got a huge lens set up and It's not the best description, but it's kind of like taking light at at different phases and being able to capture them Separately so that you can drag through that depth of field by storing it in you know their specific type So, you know in this case you see we've got all of these Santas in the background who are you know Jolly and crystal clear and you can't see it that well, but the same photo you can actually refocus on this guy's face You have both macro and micro in the same actual photo What happens when this starts getting used with security cameras? I mean this camera that was used right there You can purchase it today. It costs 400 us dollars So what happens when somebody goes, you know, that's enough that I can use this color camera in a security situation And now if I can afford a hundred dollars for a two terabyte disc I can store all of these images and then I can go back and I can play with it at need If I can find a way to actually rip out the information that I'm looking for so You know these these are things to think about you know it but it's not even just that it's a total focus camera You know they're starting to use these things for 3d rendering as well So now you have multiples of these cameras and you can actually produce a 3d image of a situation you know Possibly that ends up with you know building a better image of a crime scene Maybe that's being able to measure more accurate distance You know, there's there's a lot of different ramifications for it But it's just one thing that you're able to do now If we're gonna talk about algorithms, we can also talk about plant here for a second Plant here is an organization based out of California They Will be very very quick to tell you that they do they are not a data mining company They produce algorithms They work with companies like HB Gary to do that mining and then But the idea is you know, it's gonna be try to be presented that it's a you know buy hackers for hackers sort of situation You know, it's it's individuals, you know who See what this tool can do and they get real excited, you know, it's it's neat You know, it gets back into that, you know, 14 year old suburban white male neat factor You know wouldn't nuclear war be neat. Yes, it would until you're dead. I mean, I I'm not gonna front. I I'd like to see if I could hack it Let's get real though. I'm dead like I can't hack it. I'm dead and and individuals that think that you know the master's house is going to build a new master's house with these tools and not end up just really kind of From behind is kind of out of their skull a little bit I mean, this is buy hackers for hackers against hackers. I mean the the thing to keep in mind here is that this is the company that Lambasted WikiLeaks and other data sharing services and actually produced Presentations for the US government about how to effectively destroy them Let's leave politics of WikiLeaks aside. This is an organization who wants to destroy the public collection and Collocation of information and keep it privatized in the hands of companies that are paying for it I mean, it's it's reasonably cut and dry and that's where my problem comes in with it You've got author awesome algorithms sweet You're gonna take a political agenda on it. Okay you're going to You know publish these webs of trust off of Twitter, you know, and There's a fascinating article that wired wrote about I guess about a year ago Where their their sample image Has one of the presenters from this year and last year at CCC just like dead center in the image so it's Clearly they're they're drawing a line in the sand that they they want someone to see But the thing that is going to get a lot of people excited is just talking about countermeasures, you know, okay now You messed up. You're in these databases. What do you do or? The infrastructure exists. How do you avoid it? So People get they love talking about facial recognition. So we're gonna spend quite a bit of time there at least in comparison. So It's done via measurements, you know They the facial recognition Guidelines say, you know that the things that they are going to check for is they're gonna look for mirrored images They're gonna look for a neutral expression on your face, you know, when they do the registration photos They don't want you smiling. They don't want you doing anything like that. They want you as relaxed as possible and Part of the reason for that is you'll see in one of my sample images that I've got later That's because they start modifying your face Trying to put a smile on you and removing it and doing goofy stuff and this just gives them a clean data source to start from You know, they want you to not be wearing glasses again. They can put the glasses on you. It's harder for them to take it off You know and they want diffused light, you know, they don't want any kind of spotlights or anything like that on it You know, they if if possible, they'd love to have multiple angles, you know Well, guess what the all those photos that your friends posted on Facebook? Well, that gave multiple angles, you know, if they happen to correlate those together and the thing that surprised me is that it's acceptable to have, you know, yaw pitch and roll of up to 15 degrees in each direction on the photos like they They can handle that without a problem It's you know, only when you start getting outside of that that they need to that the civil service worker Who's making barely over minimum wage like can you take the photo again? I mean, this is the way that it goes at least when I had to Renew my driver's license most recently But even more than that we should talk about as far as facial recognition, what's working and what doesn't work, you know So complexion and lighting that's one of those things where you know if you can change your complexion, you know Let's not front. It's not gonna do anything, you know It's it's a low and lowest common denominator They're going to be able to mess with hue brightness saturation contrast all that to really get around it but it's Generally easy to take care of that I will say I Will say that as far as light and complexion goes there is a fascinating problem and that is that Individuals of color specifically individuals with dark skin complexions because of the increased problem in defining contrast on their faces It's harder for them to be detected by facial recognition software, you know, so specifically this means that there's a inflammatory maybe borderline funny video From about a year and a half ago where these individuals claimed that Hewlett-Packard explicitly made racist facial recognition software Because the 640 by 480 webcam on the top of the computer running software that cost 2999 couldn't properly figure out, you know facial details of a very dark skin complexion man in low light this this was you know Unacceptable for them. So painting Hewlett-Packard is racist on that We'll talk about plastic surgery. I mean this is one where yes plastic surgery will work You rearrange your face your face is rearranged. It's My man who's on top of it Beards are pretty effective at least as far as measured by a bunch of the papers that I was reading But while they're effective, it's more funny in how they deal with it So like right now they try to figure out, you know Whether or not a photo of you is a photo of you if you have a beard or might have a beard by Pre-processing pre-processing the images and doing lots of stuff to it, you know So they say you know they sample out facial theories Facial features and then they start putting fake beards around it. I'm not kidding you like these individuals in Turkey They they published this paper on how to deceive a face recognizer and it's You know if you want sources for any of this stuff. I've got them, you know, I've got them all in my laptop right now It's pretty funny, but this is their sample image They take that guy. Oh, man. We'll get to Jimmy. We'll get to Jimmy here in one second But they specifically take this guy right there on the end and then they squint his eyes and Then they put that mustache on him and then they put sunglasses on him And then they put the mustaches and the sunglasses, you know trying to do their best unibomber impressions and you know This is what is actually being done as a technique to try to figure out if that is indeed the case And I'm now going backwards. Okay so You know, we've got Jimmy here and effectively what they're doing is they are putting my beard on Jimmy Like that's their strategy to try to see if we are the same person. I assure you we are not Jimmy And I have discussed this at length But the specific technology the specific strategy that was brought up here was CV Dazzle So CV Dazzle is specifically an implementation of a previous Technology strategy depending on what you want to call it called Dazzle. It's camouflage You know, you're painting these geometric lines on Something in this case they did it on ships During World War one and the the idea was is that it breaks up the ability to pick out specific components of it kind of like a You know a herd of zebra and harder to target anyone specific thing on there All this work has been done by a gentleman named Adam Harvey out of Brooklyn The research is fascinating. I mean there's code. It's all going to be open source He's rewriting it in Java right now or C++ one one or the other And Specifically they're referring to a computer vision dazzling so Quote directly from him is they want it to be an antagonistic technology. They want this to Not per se make facial recognition software more effective, but to clown it, you know They they want to show that it's absurd And I already realized that I forgot to fix one thing with these slides So I'm gonna have to scroll on a little bit by hand, but the goal is you know to The best part here is to be deceptively fashionable and functionally deceptive Man, it's awesome. So let me let me really zoom out here because These are all things that they tested That effectively beat facial recognition software having that haircut Beats facial recognition software. So it gets to this point of like, okay, so the machines can't find you and Corporations have determined that human labor is too expensive What happens now? Nobody's gonna sit there with the mugshot book to figure out if That's you they're not and it's fascinating. I mean here's an actual render out of it You know directly from his research You know even just by putting that up there and taking two photos and comparing them It's not able to detect it. I mean this isn't you know different angles and stuff But there's also they tested this directly against face.com's API and photo tagger They post 15 photos of this woman at different angles yaw pitch roll all kinds of stuff No matching faces detected out of all 15 photos So we discussed facial recognition and we're gonna blow through iris recognition real quick contact lenses they kind of work I take that back. They they reasonably work Folks want you to think that contact lenses don't work nearly as well as they do There's a bunch of different strategies as far as measuring Dithering printouts and stuff like that on the actual contact lenses to be able to Measure predictability Printouts work. I mean it's seriously. It's this It's unlikely, but it it really comes down to Nyquist Shannon, which is like you know if you can double the sample rate of the source that you're trying to generate Then you can match a data set. I mean this is the idea of you know being able to hear audio That's you know why it was thought you know doing Stereo 22 kilohertz channels will produce you know high enough quality audio that you can not tell Degradations in it, you know so to defend against that they do a two-dimension for your transfer on the image, but you know Every camera has a max resolution including the one that they are using to measure this if you can beat the resolution of that Camera sufficiently with what you are photographing to then produce that output I mean it's it's done and so same with video. I mean Ideally they're gonna be using videos that they can watch for slight micro movements and things like that to be able to detect liveness as it's called, but you know if you've got a high enough camera, it's gonna happen and Data recognition I mean this is just knowing that you know your passenger information is being stored if you're posting images your exit data has All kinds of stuff in it. You may be sharing geolocation through exit and through Facebook You know the genetic non-discrimination Act of 2008 in the United States basically says hey You cannot you know it's safe to have your genome sequenced. It's safe You can have you know 23 and me or these other companies do it, you know They can't you can't be discriminated against based on employment. You can't be discriminated against in health insurance But there's nothing to say that those companies can't voluntarily give that information up to law enforcement Yes, I know Fortunately the talk is just about done as well so There we go So we've got like search history purchase history Now man there it goes, so I guess that's a good timing for us to move on Yeah, if you now have questions raise your hand and I will come to you with the microphone Yeah, so in summary if you want data removal, that's your strategy. Hello. Yeah Well first part is to comment more than a question that your companies actually believe that bullshit that This is making the world safer and I mean it just helps in how you will debate them What do you think? There are definitely companies that have true believers that think that you know what they are doing is Making the world safer. I mean don't I have no doubt that there's individuals within those same companies that all that they see is dollar signs But you know within any of these companies there are true believers that think that you know by God They're doing something useful The next couple of questions is you said that there were 17 billion records of people with certain companies, but The only way I can imagine that came up to them if you believe the privacy policies other companies put up is There was some kind of black market of Personal data not necessarily man. I've worked for similar marketing companies to that and you know they publish those types of statistics as a bragging thing to as you know That's their marketing material to be able to say no really you should you should give us money you know we've got information on this many people and The bigger thing is how much of that has been deduplicated You know it I would suspect that those numbers are actually inflated for marketing purposes to be able to sell Or to more effectively sell their services, but that's only personal experience with seeing data sets like that But originally they must have come from somewhere. Yes. I mean you You get information like that from like I said voter file You'll get it from various nonprofit groups Not a lot of nonprofit groups will have sharing agreements like that So I'll give you my mailing list if you give me yours and you know My mailing list is more valuable to you if I've got phone numbers addresses Click-through rates on emails things like that so So the question was if he mis-tags photos that were I mean misinformation is always That's what yeah, so his question was if he was to post photo or human-like photos of himself You know potentially mannequin photos things like that is what I'm assuming And then tag them on Facebook You know would that be a? Successful misinformation tactic, and you know I certainly think that it would It's going to become interesting when multiple people start doing that You know because mannequins all have relatively similar faces. So, you know you're gonna end up with a dozen people a dozen bodies that are all you know the same You know two thousand people and that's it's an interesting problem to try to attack and I I can't I Can't off the top of my head come up with anything really obvious of what they would do to counter that outside of you know Just kind of setting up like the equivalent of a spam corpus You know when you have a spam trap address at your domain where anything that matches that you know immediately gets flagged out You know it's potential, but they would both flag out that image and potentially flag your account just for you know questionable tactics whatever but Do we have any other questions? Yes, we have some questions from the internet here. Okay. Yeah, I've run for question from side through personal communication He's asking about CV diesel. Okay His question is isn't this necessarily a make-up arms race a real system would be able to tell where the face is because it's on top of a moving body and This attack will only work on the current generation of Algorithms, especially if it's taking a serious attack that the terrorists might use the next generation of face recognition Will counter it or just make make a varying weird make-up in public illegal. I Mean so that's a fantastic point. I mean you It there are strategies like that now where you know, you can't wear You know a face mask into a bank You can't you know cover your face when you go into various stores and you know trying to legislate issues like that is going to be at Best maybe doable on a local level, you know It's I would see it being a challenge at least in the United States to happen at the state or the federal level I can't speak to any specific European laws about that Now as far as the ability to track movement and you know more effectively do that. I mean absolutely. I mean that's There are individuals who are paid to work on these algorithms and I mean it's going to be a game of cat and mouse and I think that's part of the fascinating thing about it, you know It's you know Adam Harvey is looking at this as you know, how effective can he make it? You know and you know, it's no fun if there's not a chance of losing or having to do it again So I mean it's it's you know part of a thesis project that he did and You know, he's continued working on and he's actually looking for assistance In developing it further, I mean if you go to see the dazzle or do a search for it and go to the website I mean he's specifically soliciting individuals for assistance Yeah, and another question from the audience Hi What's the false positive rate on these facial recognition things like because I've always not really worried on the basis that When you do a Bayesian 101 they say, you know, if you've got a very good test But you're looking for a needle in a haystack you're gonna find a lot of needles So I've never you know, is the false positive right now at the point where it doesn't produce too many false alarms I don't think that the false Positive rate is necessarily that high the false negative rate though on the other hand is is pretty high I mean I saw in Statistics are all over the place, you know It also comes down to which companies algorithms you happen to be looking at I saw one where somebody you know took a random sampling of a few thousand a Corpus of a few thousand photos and then pulled out, you know 25 Or a hundred or so known photos of that and the specific percentage rate that they had of success On the first run was 30% so You know, that's one company one algorithm one strategy But I mean this is kind of a capitalist situation where You know, it's it's an evolution, you know Whoever is producing the best mousetrap is going to continue evolving and you know other ones are going to die off I mean, there's specifically two Predominant facial or were as of a few years ago to predominant facial scanning strategies one was called Gabor filters and another one was called PHR Too many white papers Specifically it's the same one that those Sample photos came from and they talk about the effective rates just of those two different Even sampling strategies on the face and you know doing an n what minus one comparison of regions around that And I mean what they're what they're doing is they're building a kind of level of trust on there where it says I am x percent positive that this is the person and it's based on well These areas match 20% of the time and these areas match 60% of the time And you know, it's a probability assessment they're making Okay, we have more questions from the internet. Okay. Yeah, here's a question from in J4n through IRC He asks you mentioned geographic profiling in your abstract and the question is whether there are any countermeasures against that so The as of right now it is a challenge at best to the the best strategy that you've got is trying to turn it off or Misinformation You know it it really comes down to With a phone You know your cell towers are going to report roughly where you are You know, you're even if you've got your GPS turned turned off and are you know filtering it out as far as You know IP location on the computer that just comes down to proxies I mean all well-worn strategies on that. That's part of why I just gave it lip service and didn't cover it in too much detail I mean, it's kind of Probably a third of the people in this room could at least give you you know three things to do to avoid things like that so You mentioned the success rate when doing facial recognition on the corpus of a thousand images and I can imagine that when you have a very large corpus of images The factor that human faces look alike factors in Is there any data on that on and how when you for example have 10 million faces in your database? How many faces are really so much alike that you can filter out who is who? So I I wasn't able to find things on really really large data sets at least that was published it was a lot of individuals at universities who were going through and filling out all the paperwork and getting consent to take photos of folks and kind of working through this process and The largest that I saw at least in the university realm was six to ten thousand photos So it doesn't get anywhere near, you know, what a commercial company is going to be doing just on a daily basis, so You know, it's it's one of those where I hate to punt it, but I really just I couldn't find anything when I was looking for that So and so there are no more questions from the internet. Does anybody in the audience has any questions left? Nobody So yeah, then thank you for your talk. Thank you very much