 Thank you, Joe. Hey, everyone. My name is Sam. I'm a software engineer in New York City. I work primarily on machine learning and applications for a quantitative hedge fund and my talk is called demystifying clear view where effectively I'm going to give you a demonstration of how you would leverage today's existence technologies to one of the same kind of detection and recognition, like the functionality that clear view currently has. The New York Times was the first outlet to actually publish an article by ClearView which went into detail as to the amount of information and the types of people that have actually been using this application. Over the years, a lot of other outlets have come out with more information and in fact, there have been several hacks of ClearView software that went into detail for who was really sharing it. Initially, ClearView said it was primarily law enforcement using it, but then what we come to find out is that there are a lot of organizations that were not a law enforcement and specifically there are individuals primarily like a lot of their backers and clients that were actually using the application for various reasons. A lot of those outlets have talked about the information that ClearView has, which is roughly three billion images. They've also talked about the people that have been using ClearView, such as the FBI, SE, Interpol, and actually some other countries as well. Countries with the questionable human rights records. But instead of talking about those things, I actually want to focus on the technical aspects of ClearView, which specifically boils down to the two methods that they mentioned or went into detail as far as explaining how they actually went about collecting the data and how they're currently using technology to or how they're currently using AI to detect and categorize people. They said web scraping, but in reality, what they've been doing is breaking the law and also breaking a lot of the user and user ingredients that a lot of the websites, just as Google and Facebook has. The difficulty is that it's very difficult to stop it. Largely because to Facebook or to Google, there's no distinction between a bot that's actually saving the image versus a browser that's actually viewing the image. And as far as the artificial intelligence process goes, there isn't really anything revolutionary to what ClearView is doing. And part of the reason why I'm going to the process of demonstrating this app is primarily to show you that without any kind of new technology, without any kind of anything special, I can actually just build something that pulls data from online sources and runs a few libraries on top of it to actually give you very easy tracking and recognition. The kind of tracking recognition that 10 years ago, you'd only see in movies. So going back to the very, very beginnings of facial recognition, the technology itself is extremely old. It goes back to the 1960s when a gentleman named Roldro Wilson-Bletso was doing a lot of research into what pieces of the face can you actually analyze to be able to distinguish one person from the other. Back in those days, a lot of the work was manual, which meant that a researcher would have to manually tell the computer where the noses were, where the eyes were, and based on the coordinates, the computer would kind of like figure out, this is one person, this is another person. The major kind of technical breakthrough that happened was primarily around 2008. And Google and Facebook, you might not see them as being technical breakthroughs, but they very much were in a lot of ways because they made it kind of easy and somewhat fun to do two things that the researchers back in those days were not able to do, which is, one, they made it easy for people to submit their information to a centralized place, and second, they made it easy for that information to be organized. In a lot of ways, Facebook and Google were different companies, but together they enabled a lot of the technology that we currently have today because without all these images from Facebook and without all this indexing from Google, it would be a lot harder to actually train a model right now in terms of identifying not just people, but also objects. So for a bit of context as to the size of the dataset that Clavius collected, the FBI has been collecting facial recognition data since 1992. They were first required by Congress to report how much information they had, who they had the information on, and back in those days, the only database that they had was primarily from booking photos and mug shots of criminals and other people that actually had a record, and for a long time they didn't report to Congress anymore as to what they were doing with that technology, and we didn't really find out up until 2008 that their database had grown to roughly 400 million more people in pictures, and the reason I'd agree with that size is primarily because a lot of state agencies were sharing state agencies and also federal agencies were sharing pictures such as license photos, passport photos with the FBI, and effectively FBI was using technology and using the technology and sharing the technology with them to give them a lot more insights into the U.S. population. The problem with the reality in 1992 and the reality in 2008 is that whereas 1992 the database was primarily of people that actually had a criminal record that had a reason to actually be within the FBI database, by 2008 the majority of the people that were actually within that database were people that didn't have a record at all, and so that was very concerning to Congress, and I believe at that point the FBI started getting like a lot more oversight into what they were doing and whose data that they had, and I think beginning in 2008 they got a lot of pushback, a lot of congressmen were actually trying to put legislation in to make it such that they don't use those kind of technologies, and for the most part the FBI's capabilities as far as official recognition goes effectively started to stagnate around that time, up until the September 11th in which case we haven't really heard much about it, but nonetheless that's kind of how to become irrelevant, but within the context of Clearview in just a couple of years because Clearview started back in 2018 or so, within a couple of years Clearview has obviously surpassed what it took the FBI roughly 10 years to do, and there's a very specific reason for that primarily because the technology that they're using somewhat mandates having that large of a data set, whereas in the past to have 300 billion images to sort through would be an impossible task to find one person, but because of the way that technology actually works it makes it such that you have to have that many images to be able to effectively track and identify people. So, like I mentioned, I'm going to actually go through the development process with you guys as far as collecting data, processing data and analyzing it and putting it in a way such that you can actually carry out whatever application that you're trying to carry out, but I'm going to give you a broad overview of what the process would be if we're actually to go into NSFORS productionizing it, but today what I'm going to focus on is like the first four steps, primarily showing you how the data collection process would happen, the kind of things you would do here and there to clean the data, fill the gaps in, and then secondly, I'll talk about the actual machine learning models that you would build to either do classification or to do kind of categorizing one object from a different object. And lastly, I'll kind of give you a broad idea as to how you can actually take the deep learning technology or deep learning methods and then combine that with just broadly speaking, just regular product development, software engineering to actually get one type, like a specific type of capability out of it. Like for me, the process of actually getting CTCV data was very transparent and straightforward because the Department of Transportation actually has a website where it lists all the webcams that are currently running live on all traffic that goes within Manhattan as well as Brooklyn. I know for a fact that San Francisco has one, but I'm not quite sure about like a lot of other major cities, but the gist of it is that this information is public and anyone, it's available to anyone who actually wants to get to it. And whereas the DOT has a fairly open API for a lot of the information that I'm trying to get, for a clear view, the equivalent would pretty much just be going to a site like Twitter, looking into the source code or just watching the network traffic and pulling in all the images that get fed in through the browser. As far as Twitter's like end user license agreement goes, they're not supposed to be doing this, but as I've just shown you, Twitter doesn't have a very easy way of distinguishing a person that's legitimately going to their site versus a robot that's don't go into their site and scraping their pictures. So the hardest part for me anyways, ended up being, or at least it was kind of fun, but the nuts and bolts parts as far as pulling in the data was really having to do it, figuring out how many cameras that I wanted this happen to. And once I had that, like the broadly speaking, the cameras that I wanted the target on, it's a matter of like basically setting up processes such that as the cameras are recording, I'm actually able to record that data and put it somewhere where I can actually access it later. It's easier said than done primarily because I might only have one laptop and so I have to use a laptop for the things and all these, all this processing is very, very, very like CPU intensive. And so the solution to that is actually using a service like AWS. And so what I ended up doing is actually, well, I created an AWS account and unfortunately they give you 750 hours for free as far as compute power goes. And so once I had roughly a cluster of A computers, I spun up each computer such that it would record a camera feed and through over the course of 72 hours, it would just effectively record every second that was coming back from a CCTV camera and saves it locally such that I can process it later. And what you're seeing here is pretty much for compute instances of Amazon that's taking in data from the CCTV feeds and just either saving it or processing it on those specific machines. And ultimately what you end up with is a real time feed, well, not real time feed, but it's effectively what you would normally, what you end up with is like a local recording of the CCTV cameras that you would otherwise have to go through the Department of Transportation webcam website to analyze. Running the machine learning models on top of it gives you what you're seeing here, which is one, you have the original feed on the left side and the right side. You have all these kind of like objects that are being tracked on the frames. Depending on a model that you're running, you can actually track a lot more than just cars. Some models track people. They track bicycles. They track motorcycles and buses. But for our methods, I'm only tracking cars because one, it takes a lot less compute power. And secondly, as you track each object, the library also pulls out and extract the objects into individual kind of the detections. And the reason for that is, obviously, you have different things that you could apply this kind of technology to, right? So one application is like speed limits within school zones. We already have a camera, so that does this. Another application that's a lot more specific to this kind of technology is something like Amber Alert. And the reason is because as I mentioned, when the machine learning model pulls out each of the individual cars that are scanned and from like these snapshots, it is able to like order the cars within a classification that says that this car is closer to another car. It may not be because of color, because of make, because of model. And based on that information, you can specifically pick, for example, I'm looking for a red Acura and simply by giving it that search term or giving it like what a red Acura would look like, it can tell you from all the different traffic cameras throughout the city where such a car has been spotted. And so this is great for if you have any police officers within the area, you can tell them, hey, watch out for this car within third and 23rd. So that's a very simple application that I'm not quite sure if it's currently in use, but it's certainly one of those things that can actually be implemented fairly easy right now with this kind of technology. And just to mention in terms of the development of this, it took me roughly three days to collect the data and one day to preprocess it and another day to actually run the machine learning on top of it. So this is just me doing this on my laptop and also with a few AWS computers. So you can imagine a clear view with several million dollars of backing and also the Department of Transportation with their existing feed into this data, not even having to do the preprocessing that I'm having to do. They can certainly do this kind of stuff in real time. Sam, you have five minutes. Oh, wow. Okay, great. So I guess real quickly, so there's a detection part and there's also the search part. The detection itself is fairly easy and the thing that concerns a lot of people is really clear visibility to search throughout different images and match up one person from another. The reality is they don't actually do recognition as much as they do image searching and from screenshot of Tick News app, what they've basically done is implemented Google's image search, but the difference between image search and what they have is obviously the fact that they have a lot more pictures of people that they're able to match up together. And so the danger of this is a lot of people are not aware that these pictures are within Clearview's database and oftentimes they're not publicly available pictures per se. For example, if a law enforcement officer was actually using Clearview without being sectioned by his department upflows a picture of your driver's license. All of a sudden, Clearview now has that picture which it shouldn't be having in the first place and it also makes it easier for other people to kind of like if they were to hack Clearview not only are they able to get Clearview source code, Clearview's client list, but they can also now have your picture which Clearview like just made a lot easier for everybody to get. So in terms of how you can defend yourself against such a technology, there's been a couple of different ways that's come up over the years. One of the primary ways was primarily using strobes to interfere with the camera's ability or infrared strobes anyways to interfere with the camera's ability to effectively get the pixels back to match up one person with another. But another for Clearview's case which is where they're actually taking pictures that they're not supposed to and uploading into the database the primary way that you're actually going to defend yourself against such a thing is actually by modifying your pictures to begin with. You can think of it as being a filter that to a human being is actually not noticeable but to a deep learning model because of the way that the model is implemented it's able to trick them into believing one thing is another or rather it's able to confuse them as to what exactly they're looking at because deep learning models are actually looking at every pixel of the picture they're not looking at the high level abstraction that the picture is trying to demonstrate. And so the gist of this is that like a lot of this the technology is not new and the defenses for it like they're there but they're an inconvenience to you and I as far as the reason why Clearview has kind of been so brazen with what they're doing is because they're using the argument that it's legal but the reality is when you look at laws such as CCPA what's nowadays mandates that companies have to tell you what they're doing with your data and specifically you have the right of asking them to delete your data it's very easy for you to go to Clearview and say like delete all my pictures the problem is as I've shown you a few steps back is there's a lot of information that's generated by this by these machine learning models for example these pictures as you're seeing them right now they're just the regular RGB representation but there's also intermediate representations that you and I can't see because they are not in 2D or 3D space but rather in like multi-dimensional space and those pictures are saved on Clearview servers and Clearview uses those pictures to train their models and so in reality they might say they've deleted all your pictures but they're still using representations of your face to train their models and optimize their models so to the extent that the law believes that that kind of like derivative data is your data Clearview still has the ability to kind of use your information against you and up until people recognize the process that Clearview uses in technology that they use in fact to do these things it's going to be very easy for them to kind of like circumvent the law because you can't make a Primify-C case that Clearview actually has your information just based on what we know about them and just not even based on the fact that they're able to match up your face because when you train a model you do away with all the training data and ultimately you can actually have a new set of data but the whole, the gist of deep learning is that the application is able to predict or understand things that it has not seen before so once the model is trained and you've saved the model weights and you save the model state and are now using it with like a completely brand new picture you're not aware of the fact that within a model state all these other pictures that belong to you are actually still there so legally speaking if you have a judge that's able to agree with you and a lawyer that's able to articulate this if Clearview, if you have a right for your data to be deleted it means that Clearview would have to retrain the models without your data to make it such that they're actually not using the data anymore and that's prohibitively expensive because as I showed you before where I had to set up a cluster of Amazon AWS servers to actually train this model because of the fact that Clearview is training on three billion pictures as opposed to several hundred pictures as several hundred thousand pictures as I'm doing it becomes prohibitively expensive to start training models with pictures that you don't have the right to so but again that's an argument that's going to have to be made and that's an argument that enough people are going to have to understand before they can actually start exercising those rights there are people that are currently doing regular CCPA requests such as give me all the pictures that you have on me delete all the pictures that you have on me but as far as asking Clearview to retrain the models without your pictures that's something that I've not seen anyone go as far as doing and I think that's really where you're going to have the cost associated with this technology that's going to make companies like Clearview a lot less a lot more reluctant to actually do this stuff. Thanks, John, you need to say that very quickly if that's okay. Thank you. Thank you very much, Sam. Are you done? Yep, you want to close it up with any final comments? Yeah, the source code for this this is actually a Jupyter notebook where I was going to demonstrate the distance metric that you can actually use to analyze one picture versus another because when you convert a picture from pixels into something that a machine learning model can understand it pretty much converts that into a multi-dimensional vector and it's just a matter of figuring out like which vector is closest to another vector which is a very simple mathematical operation and you can do this for any kind of picture it's just a matter of like processing it correctly such that the pictures are the right size and the right color and what else the source code itself I can certainly like show you if you're interested in looking at it because of the fact that I had to deploy this in AWS I ended up using Node.js for a lot of the processing and then for the actual machine learning model in terms of object detection and object classification I use the library called Image AI which is actually a high level wrapper for the TensorFlow and Keras libraries which made it very, very easy to actually just process a lot of this information. There are also a few other utilities that I use primarily FNPEG which is an open source like encoding application that makes it such that you can actually slice images and kind of format them into different encodings. That was, yeah, I can show more details about that if you have questions but yeah, there's pretty much interest of it. Yeah. Okay, Sam, thank you very much for fitting us in in this unusual session at CSB Comp. Just to reiterate what we're going to do so we're going to share this in the general channel in Slack and give people the opportunity to watch this in their own time and then if you're available tomorrow to maybe continue some of these conversations on Slack you can do that by text or we've got a core functionality in Slack so maybe you want to organize yourself a session where you can talk through some of these issues in a bit more but that was really fantastic and I'm really pleased we managed to fit this in because yeah, me too, thank you. Yeah, this is primarily to supposed to be like a discussion piece because the technical stuff is very deep but it's just one of those things that I've noticed that a lot of people haven't really talked about despite the fact that that's really where you're going to have the ability to make any changes. May that be changes in the law or changes in clearly behavior? So I just wanted to go into a bit more detail about that and also kind of share the how easy this actually do.