 I introduce Whitney Merrill. She is an attorney in the U.S. and she just recently, actually last week, graduated to her CS Master's in Illinois. So, hello. You heard the translation from Predicting Crime in a Big Data World from 32th Chaos Communication Congress. Hello, all together. Thank you for coming here. I know it was an exhausting congress and now you have to listen to me here how I talk about Big Data and Crime Prediction. It's a little bit of a hobby of mine. So, in my last semester in Illinois, I took the time to get around there to see what's happening there, what kind of information can be collected. So, I have about 30 minutes with you and I want to show you what's happening in this area. What is predictive policing? What are the data used for? Are there similar systems? How much is used at the moment? Is it used to improve society? And then I'll talk a little bit about how effective these programs are. So, imagine that in a short time, there will be police with cameras around your neck and there will be recorded which cars are recognized or people and they will all be profiled and it will then be pointed out which are suspicious. So, at the beginning of the day, a list of hotspots was given. So, for example, you can go to a certain region where there is a certain time, a certain order, where there is a certain probability of 82% for a dot. So, if something happens there, then it will be immediately recorded in the detailed database where all the police authorities are connected and then it will be pointed out that it's possible that it's Bobby Burglar and he was already an accountant and he had already taken a deep step. The question is now, when you meet him on the street, should I now investigate him as a policeman? Do I have information that I should now take into account? Does he somehow have a big bag with him that might be suspicious? Or was it just because of his face that I see this information? So, it was decided because of that. So, another thought that I have for you is this quote here. So, it's more on the side of the algorithm. So, if people are simply reduced to numbers, can the police simply say, that it's racist or profiling a bias of people? I'll talk about that later. So, I'll explain predictive policing. Who and what. First of all, predictive policing isn't new. It's just new that more technology is added. It's just getting better and faster. So, analysts in police agencies have already done that for decades, simply by hand. It was about creating profiles to somehow identify some suspicious people. There are also places based on suspicious people and people who might be a criminal. It can be that you simply have an idea that it happens in a certain area or that a crime happens in certain areas. Then also the time of the day and that's just shown when the past crimes have been committed in this area. There's another short rant about machine learning. It's a very basic explanation of the training of an algorithm. You collect data from different sources, you clean up, you divide it into three sets, training, validation and test. The training set is about changing the algorithm of the machine. This is then applied to find the confidence level. You need a support level where you say you need a certain data level where you can determine if the algorithm has enough information so that I can make a reasonable statement. The confidence level at the end, if there's an 85% chance, that's 85% confidence level, then we can get out of it. What does that mean? It means that we can gather as much information as possible so that we can determine the less simple scenarios. Information has to be shared. Thirdly, it's with the fourth, the third and fourth sources of analysis, we've been doing that for decades, but there's nothing to do with it and there's nothing to do with it and there's nothing to do with it. What are these algorithms? What do they do there? Do they do it for the sake of it? It's not like when you think about it, they don't say, is this person guilty or not, but they say there's a chance. It has to do with stochastic and it's a classic problem. It's from the past, that there's a piece of software with a handle, with existing patterns, with different types of patterns. People handle patterns, they're usually, they're usually just data sets, but the scene is not filtered yet, so no one says, it's true, it's not true, but they're just fed into everything and that's the prerequisite for the data pool. There are different types, demographic information, personal activities, scientific data, that comes from all possible sources. One has me particularly shocked. The radiation detectors are so sensitive to the New York police that they can say, if you had radiation therapy, if you had a flu therapy, you can assume that everything you tell someone in the USA will eventually end up in one of these data banks. And every government official can simply go and get the information and you have zero control over it. If you have a nice over, there's a friend, a dog, a speed lever, really cool overview of how these data are collected on the streets, because I'll just go over it. Wow, there's a program in the USA that collects everything, pilot license, weapons, information, that's cross-referenciable between states, countries, countries. If you can collect it as data, you can assume that it's somewhere in the data bank. Of course, there are a lot of private companies that write all the software and sell it to different police departments and serve it. The more the police of different states have as a basis that you're bigger, that you touch it, for example, Hanschlab, they're hotspot targeting, so there aren't very few people focusing on it, they're focused on enemies or on possibilities. Hanschlab has a program where an officer comes into an area, if you say in this area there's this, for example, an ambush and a warning, and that's how you want to prevent crimes. Another example is what happened in New York City, together with Microsoft, in 9-11. New York City actually makes money to sell it, so it's about collecting data from surveillance cameras and now, for example, there's a man with a red shirt looking for the software for people with a red shirt and the police alarm accordingly when they see such a suspect. The other system is from IBM and there's hotspot targeting with a little bit of other features. Even more event-worthy is the heat list. It's about the individuals that are followed. I'm from Chicago, there are 420 names on it and these are 500 times that they are involved in violence as others. They are considered to be punished and then they are called by the police. In general, they are mainly young black people on this list and their acquaintances because of social network data. They, for example, went to school together with a handful of people and many of them were in a gang and then the others are baptized because they are racist or and of course you don't have a chance to change your own story when you are on this list because you have no influence on it. The police in Chicago visited these people and said, hello, I'm here, I just wanted to take a quick look. It's not a suspicious moment, but the police can visit these people in principle. Then there is this pre-cob, that is used in Hamburg. Yes, they went to Chicago to learn more about the tactics and the technology and that is used in Hamburg, for example, Hamburg. There is generally about repeating criminal activity and the important thing is that you need a lot of data points, so crimes that rarely happen are hard to say or those who don't have an example are hard to say. So yes, there is a picture drawing of what crimes happen. Pre-cob is actually a play on the film Minority Report. There are three media that say these crimes. There are other systems in the world that try to say if something happens. One is called disease and diagnosis. The algorithms are actually much safer than doctors to say in advance what medical treatment someone has. The other is the NSA. There is an attack on secret documents. So yes, if you want to access a protected document, you have to go through this process and so it's basically a process where people are being and then you try to say who hurts these secret documents. So yes, there is a very high error rate and then there are a lot of people who want the same job and yes, it's just about preventing someone from having the document. I would now say we have the USA we have the Indians talking about our laws. We are talking about we are concentrating on individuals and not on hotspots. It's not that much hotspots are common but we implement programs such as Al Jik Rager which also target and it's also trying to promote these programs. In the United States the suspect is always talked out as a general suspect. The police or the police should create a balanced relationship and for example if I know that this is a pastor is convicted of being a pastor in an area it could be that he has something else to do as an input. The suspect is also used as a common sense and a lot of data can share this picture with the person what exactly the person did before you stopped and intervened in which area that happens. No court in the United States has really put out a percentage as long as the USA has made some fixed conditions for it. You always need a warrant to do a stop and search for a reason or a court order in the USA to stop and to do a stop and search until you're actually detained. But I have a professor who says 30, 40, 50 percent you have to be sure that someone has committed a crime and you will probably go through it in court. There are also the suspicious standards. It depends on which relationship it is with the surrounding laws. I'm not too sure but I'll take it. Is this a black box? Normally if you smell a dog and it can smell and alert and the police officer can use it to pull out orders. It's an algorithm, like a sniffer dog. You put something in, it searches for it, comes out with the information and I can trust how much the police can rely on their decisions on the results of the algorithm. If they trust the algorithm and say we might come on a level of acceptable suspicion where we can stop the person and perlustrate. The big question can we actually say to 60% is it a crime? What does that mean? Without looking at anything else without the police looking at anything else just trust the algorithm and the answer is probably no, an algorithm alone can make a really outstanding decision. You have to permanently update this algorithm, the environment variables change, they run. The statement itself has already influenced the other algorithms if you were suspected you would have influenced it again. And then figure out why you've detained the person the algorithm says 60% and then you arrest the person and then say, ok, now find out what you could have done. That would be illegal. There has to be a further factor that an algorithm says 60% probability seems to me too little. That's just not enough. Maybe you can see a pistol-shaped ejection of his pink bag or so. The whole of the environment is the question, can the algorithm do it all? No, probably not. Then maybe you can't process the data in real time. The algorithm doesn't know that and the police probably don't either, they don't have all the data. You can know the outcome of the algorithm but I mean, the algorithm can't know for example if it's a politician or if it's someone in his community or so and what if for example, if the algorithm has an account that someone is a pastor and maybe it counts twice and it's not the same weight. What are the problems? Maybe there are wrong data. There's no transparency in which data is used, how it's collected, whether it's aged, whether it's true at all. It can be that there's noise in practice data and it's of course biased because it's collected by individuals. For example, we've seen some studies in the USA that young black people are often searched as white people. That means there are collection bias and as more data has been collected over minorities it's of course a feedback loop in this system because of course there will be more mistakes because the data is collected. It's an acceptable error rate. Of course it depends on the burn of proof that it's proofed or it's different if you're asked if you want to participate or not. The question is what is a mistake if you're still searching and you can't find anything to a fourth amendment violation even if nothing happens. It's not a violation of the fourth amendment if nothing happens. The question is if there's a very deep error rate under 1%, if you have that in machine learning it's good for machine learning but of course there are a lot of people who are searched on the street. It has a very big influence on reality. This hand full in Chicago. Yes, these have now been recorded as suspects of the police in Chicago. There are of course data bank problems the end of the proof material in the USA has only had a very high level so that it can happen so it practically never happens. The data goes in a sentence comes out everything is happy basically the data from IBM or Hanschler now. Then other things we need to do is in this feedback loop how safe do you have to be before you continue to the police? How do you have to determine these weaknesses? I mean, the suspected suspects are 100% and it can be said in the USA that he drives in a car and drug dealers drive a car and it's in a data bank of drug criminals then it's clear enough to investigate. There are positive aspects of this whole story. Predball is a service that offers such services and when using it in LA for example a 13% return of crime in Santa Cruz for example 25% to 29% reduction 9% of assaults so yes, some police departments publish these success numbers which of course continue to be told by the people who sell this software but basically it is said that crime is reduced. But it's hard to say because in feedback loop we know that crime is going back or affects the collection of data in the future. We don't know. If a police officer sends a community or the community and the collection of data from there that's a problem here. So a few final thoughts about the police are not the same from the beginning. There is much more analysis we have to have more guidelines we have to do more research to see if these companies get data as they buy it from these data and then continue to sell that we should distinguish where these data come from they are bought by Google and in post departments we don't know how to handle this data there is no case law and there is no general rule for the values of these data there is a lot more research to be done. The next question we have to think about a system how to remove the wrong data from these databases because we don't have that data data collected from people is always thought that they will not prevent it we don't have a mechanism to remove wrong data for example police are going to believe what they have been told so if a police officer goes on duty and you tell him that and he believes he doesn't think and then the question is 70% chance that you see an input it influences there is no Fourth Amendment protection on the border that is taken out of the Homeland Security Act you can search without suspicion as we all know but there will still be aggregated data that can be used to remove the entity from the USA that is it if I now organize the crimes want to do and if I now want to attack this pre-cop system and can I just make my criminality where it is not searched that is a data collection problem a good argument for the police not to tell you how we are going to do it because it can influence it we have questions on the internet Is there evidence that data like the use of encrypted messaging system encrypted emails VPN, TOR, with automated request to the ISP are used to obtain real names Is there any evidence that such data like TOR or VPN usage in this score flows in? I am not sure about the algorithm that is already under consideration I know that the police departments are partly involved that is a system and if a suspect is involved in the USA do you have to assess it? Yes you have heard about medical things such as Google Flu Trends are there any good examples where such systems are used for improvement of the community for example for social work instead of working only with criminalization I can't give any explanation I am not aware that the police are not a social worker I don't see the police sending a social worker instead of a policeman Thank you for your talk you are talking like some others the discussion about such a fine decision that is necessary between false positives and the prevention of crimes what I mean with the police is a different situation when a policeman says that someone steals a paper or someone makes a terror attack there is a justification to prevent the cost of false positives as a terror attack how do you do the fine tuning here what is the justification that the configuration is not in the program but in the customer of the software I imagine that the police despite the statement the algorithm has to understand I hope that they do it and I assume that they judge and react differently than if someone steals a paper or a piece of paper this fine tuning probably happens in the police department on a private basis Thank you I heard the translation from the translation