 the proceedings are still on their way. The talk will be by Karin Ulrich and Hendrik Heuer. Karin works at the University of Amsterdam and Hendrik at the University of Bremen. Karin researches on machine learning and Hendrik is doing academic work on human-machine interaction in Bremen. Please give them a warm welcome. Thank you very much. Good morning. Welcome to our talk how artificial intelligence is impacting our daily lives. We would like to start with a quote from Kate Crawford, the founder of the AI Now Institute. She is principal researcher at Microsoft and a professor at NYU in New York. And she once said this year at a symposium that humans fear that computers are too intelligent and will conquer the world. But actually, and that's what she says, computers are dumb and already have conquered the world. So we would like now to draw on a couple of examples. Some of you might know them, but we are basically aggregating them to paint a bigger picture of the phenomena. And why are we talking about banality of the approach? Because some of the systems that have been designed are actually pretty simple. And we'll see how they impact our lives during our talk. So a couple of words about ourselves. Karn is researching at the University of Amsterdam, Bayesian methods and deep learning at the University of Amsterdam. And I'm a research assistant at the University of Bremen and I'm focusing on human machine interaction and especially trust issues. We would like to start with an example that most of you will know. The detection of spam. Sometimes it works, sometimes it doesn't. We would like to look at the technical setup behind it, which is actually a quite astonishing one. And the difference that we would like to talk about is imperative programming and machine learning. The first imperative programming is basically what everyone would think of first. It's imperative programming in a way that it's very absolute. So if Viagra is written as here in an email, then it's a spam email. Otherwise it's a good, valid message. And machine learning is a different system of predicting. It's an iterative system and we're looking to predict whether or not an email is a spam email. And this is done by looking at a lot of different examples and classifying them if they're spam or not. And this is being calculated with a resulting error. So if you actually change the parameters that led to this error calculation, you can also, of course, change the error and minimize it, which is an own field of research in itself. And after this, you go back to step one and can basically, in iterative circles, change the outcome of the process. So this is not as absolute as the algorithm doesn't have to, in this example, first need to really detect whether or not something is spam or not, like in the imperative programming case. We're continuing now to medicine as a field and the detection of breast cancer. So we have different features that distinguish benign and malignant cases of breast cancer, separated by a couple of features like curves, points. Some examples are shown above. So if you take two characteristics, for example, the number of points as the first one and the pixels as the second case, you can see that there's a separation, a line of separation, indicated here in blue and red. So mathematically, you can basically draw a line through this data set, and that's called a decision boundary. It doesn't need to be linear, it could be more complex, actually it is more complex actually. So this approach is kind of an analog approach to what actually a doctor would do. They're learning and drawing on their experience and then decide whether or not something is benign or malignant. And the big advantage of machine learning is that we can draw on a huge set of examples and have a better line of distinction based on a huge number of examples. And you can also see that there are a lot of red cases to the right of the picture. And this means that very often we're actually making errors when predicting breast cancer. And very often that actually... So if I have people that I tell that they have a bad breast cancer and that they actually have a benign breast cancer, they might be very stressed. Before we continue, I want to talk about the power of big data. And what we want to do here is to look at personality markers. We use data that is very easily accessible, but it can be used to predict very complex personality markers. For this study they had a lot of volunteers that gave their Facebook profile with all of the likes and all of the information, like their sexual interests or their religion. What the researchers did was to use a very simple model, a simple regression model, to predict based on the likes what personality markers could be detected. With a accuracy of 95%, you could detect whether somebody was white or African-American. This actual interest could also be detected with very high accuracy. The political interest, the religion, and also the experience with drugs are the form of the family and the parents in the youth. Predicting the experience with drugs is not as easy as predicting your gender or sexual interest. We show this because it gives a lot of possibilities of discriminating people. What we want to show here is that we have to think about privacy because you can detect a lot from just very simple data. Using this model you can find likes that are very important for a specific personality marker. You can think about this for yourself which of these pictures is related to the intelligence, which of them is more about the sexual interest. The interesting thing is that this approach of looking at pictures and basically drawing conclusions about the personality features associated with them is comparable to approaches that companies like Google are approaching. Basically we are using big data to detect humans and single persons. This kind of research can also be used to identify people in a pool of voters that are more likely to be convinced to vote for your own party. This has also been covered by the media in the general public running up to the German election but also in the context of the Trump election in the US. We think this is a very likely use case. We have also seen that this has enabled kind of a discrimination 2.0 for example if an employer can detect or predict whether or not an employee is homosexual or not and then draws his own conclusions. This discrimination actually can be more concrete and we can see this in the legal system in the US where actually it is being predicted whether or not somebody will fall back and recommit a crime after being sentenced to jail. Here there is an example. To the left there is a man, VP. He has done two crimes robbing a store and to the right there is a woman. She has committed four different cases as a minor being less than 18 of age. So both of them a category is being allotted. To the left there is three, low risk to the right. There is eight, high risk. And what actually happened in reality, what we are looking at is a prediction, remember that, is that the guy to the left immediately committed a new crime after being released and the woman to the right didn't. So in this case the prediction was wrong and also discriminatory in a way because the woman right is a woman of color. And we also know this from statistics. We are distinguishing this there between type one and type two error. The type one error means that the person that is being predicted with a high risk is likely to commit a crime. And type two error is that a person is predicted with a low risk but actually does commit a new crime. Here the probability is the other way around. White people have a higher chance of falling into this error than people of color. So how does this happen? We assume that those are good programmers who have learned their stuff and know what they are doing. What you want to learn is a function that goes from X to Y. The problem is that X is not a sample from the population but instead just the people that have seen by the police. So it's not representative. This could be a self-fulfilling prophecy because maybe black people might just be checked more often and maybe that leads to a bias in the dataset. On the other side the prediction is also biased because it is based only on people who were processed by the legal system. So you also have a Y prime. If you have a jury in your legal system then this discrimination might be more or less extreme depending on the people. What you can do with machine learning is build a system that kind of hides this discrimination. So let's go back to taking direct influence. Those systems are sold as tools that can be used to help you in predicting crimes. But the problem is that the people that use these tools they trust those algorithms. There is another example from the Deutschlandfunk. Veronica Heller who was working for the legal system and was suggesting a punishment for the... The problem is it's more about social things and not just about the machine learning because even if the system would be fair it always means that the results must not be interpreted in a fair way. People might still make errors. And there are probably a lot of people here who built these systems. The bias based on these data is everywhere. This is an example from Google. We were searching for three white teenagers on the left and on the right we were looking for three black teenagers. Technology shows societal problems. We have social technological systems here that show a problem view but can also enforce them. There is another example here from statistical translation. You have to know that the Turkish language does not differentiate between different grammatical genders as the German language would do. So if you then use Google Translate in this example and translate... He is looking after children and she is a doctor then the gender doesn't show up in the Turkish language. But when you translate it back the gender was mixed around the other way. So the doctor was suddenly male and the babysitter was female. So what we do when we translate is we calculate probabilities and then just use the most probable result. So even very minor errors can be reinforced and then you get these problems that we have seen here. This can be explained technologically but what does this mean for the world view that you get from using a translation service like this? Our next example using Facebook shows how transparent or intransparent this influence can be. Facebook is a very social system where users are both... users that are consuming but also producing content. Facebook always wants to show users relevant stuff and uses a lot of data. In this example from this paper people have 200 friends and like 80 pages. So if every third person does something there are a potential of 90 posts. A chronological view is not very useful. So Facebook is sorting the posts using an algorithm. The problem is that this process is very intransparent so the users don't know and understand the algorithms and they think that if you talk to them that the algorithms are objective and independent. And I'm talking about the study which is from the computer and the action community and 62.5% of the volunteers in the study did not know about this news curation. There were 40 volunteers and they are representative for the US population. 25% did not even know that it was sorted at all. The users are also angry if they don't see posts from close friends and family members. The interesting part is that the volunteers are looking for the problem with themselves or other people but not with the algorithm that Facebook uses. They think that their friends don't share stuff with them because they are not very close friends, they don't know each other very well. That's why we have this quote. I always assume that I'm not very close with this person so what the hell? I don't see their baby photos because I don't know them very well but actually the person shared their photos with everybody but the algorithm took the decision to show the photos to the other user. So these systems that operate in a hidden they have a very huge impact on inter-human relationships. So what do these news systems actually achieve? Other examples are Netflix or YouTube. If you look at some videos and four of them are funny dog puppies and the algorithm decides to oh, he likes dog puppies, let's show him more videos of funny puppies. So if I look at ten videos and in four of them refugees are criminals then the system will show me more videos of that type so it will change my world view. So those filter bubbles and echo chambers they get created online but also in real life with your friend group where certain world views are shared but on the internet they're more transparent so they're all under the same YouTube logo. So those are problems that we have to solve and this concludes my part on the human-machine interaction a field that's affected concerned with those questions and I would like to hand over now to Kahn who will focus more on the technical side of things. Yes, hello. So we've now already shown that algorithms are impacting us already today in our daily lives and we've seen to what extent this happens once they have been set in place. And we've also seen that they don't necessarily actually deliver what is being put into them in terms of trust and now we would like to actually show you in what cases this is especially not recommended to trust an algorithm. And what's important here is something that we call bias. There are two biases. One is the bias of data, our own values, value judgments, whether or not we know them or not. That's the first category of a bias and the second one is the bias of the model. So when a population is misrepresented in the data that is the basis of the algorithm so what engineers build into the algorithm to make the predictions. In this process the bias of the model can actually reduce the bias of the data. At the same time it's also possible that the bias of the data is impacting the model bias and we'll show you now four different examples showcasing a couple of these biases. And machine learning has made a lot of progress in the last years and also respective research has been triggered by the success of artificial intelligence. One of the studies of this latest research is elaborating it's done in Shanghai and this research is claiming that you can detect criminals by looking at their passport photo. And we will now focus on this because it's contradicting other research findings basically using social economic context to predict your potential crime rate. And what those researchers have done is they've taken 1,800 photos of Chinese citizens 700 of these photos were depicting criminals so people who have been sentenced for a crime and 1,100 photos were randomly selected from the internet from such like LinkedIn and other sources. And let's have a look at a couple of data points to actually see what this study dealt with. So here we have six examples, six images and one row consists of pieces of criminals and one of legally people who have not committed a crime. People who have not committed a crime. I would like to now do a vote. Please left or right hand, whether you think the top or the bottom row is non-criminal. So most of you probably think that they can identify the ones who have committed a crime. And somehow we think we can predict this by looking at a couple of data points. For example, his shirt or a slight smile and in contrast to this there might be a strong light put into your face like when a picture is being taken from you in prison and on the other side there is a makeup or Photoshop even being used to tweak your picture used for a resume or an application. So the algorithm is for example detecting whether or not Photoshop has been used on a picture. And this is maybe an extreme case of how data is actually impacting the outcome of an algorithm. So that's data bias. So the sample can be screwed and also the data can be biased but the amount of data that is used to base the algorithm on and there's also a reporting bias in this case basically when data is being, when and how data is being identified for the algorithm and also the modeling assumptions are important in this context. So before the hype there's been this no free lunch theory. No possible model would work in that scenario and so we would have to change our assumptions looking at a specific problem and use limited assumptions but then without the model showing that it's not adequate it fails nonetheless and so it's our job as scientists to point out these cases and to really check whether algorithms are valid in a certain case or not. But so the success so far of machine learning let's forget these rules quite easily. I think that models are so flexible that there are no limits anymore but I now want to point out an example that still hasn't changed even though we've had this deep learning hype and that's goal setting. So what is success? Hendrik already talked about mistakes and correction of mistakes that we need to do to train algorithms It's often the case that we don't really know what exactly is a mistake and that's up to the engineer to decide. So for example on YouTube they know how long people spend on their site but maybe there's not even a person in the room with the computer but the computer is still as YouTube open and also with clicks it might not necessarily mean that a person likes a certain video. So just imagine we have a data set of translations and we want to measure whether a text has been translated correctly. How can we test that? So word by word or paragraph by paragraph if we use a word for word measure then that doesn't work through too many languages and then paragraph by paragraph that might not work as well because maybe the algorithm doesn't have the right context and then what about synonyms? Is that depending on the data set that might be interpreted differently by the algorithm and so that means we need to make decisions at Google and other leading translation software companies they translate sentence by sentence they tend to think of sentences as independent and only the most predictable word gets computed. So back to our example from the beginning she's a woman, she's a doctor and then we translate it into Turkish and then back again then we get she's a woman, he's a doctor and now we see how this happens the algorithm doesn't know about the context so the algorithm only translates the most frequent occurrence so it's not necessarily bias within the data but bias within the assumptions that simplify the model we use way too much and something else if we change the period to a comma then the algorithm is suddenly able to correctly translate because it knows about the context so another example of this bias phenomenon how can bias be get bigger over time we have active learning that means there's data on this databases we train an algorithm but we're not quite sure about something so we ask a human expert and then there's always the question of have I correctly treated this data point and then the algorithm gets more data in the future to learn better and to not have to ask the human being so often and this works well but what about a data point that is described not correctly described and then of course we have human beings that are not as perfect as we like to think so we have the assumption of perfect data and perfect human being in this model but that leads to this wrong data point being ignored and so the cause of distortion isn't visible anymore and that leads to more distortion and then of course if we have more distorted data then they keep on piling up because there's no human being assigned to the task doesn't even know about it and so of course Google and Facebook the big players aren't transparent most of the time but this might be an example where something like that might have happened so we have an African-American woman bottom row in the middle and she's assigned the term guerrilla and of course this was quite huge in the media but Google didn't quite explain exactly how that happened so first of all it might be that the description is always provided by human beings they might just like to troll but still that might not be an adequate explanation groups of people in our society might be marginalized and for those marginalized groups they very often are not included in testing procedures and we can also think of Microsoft's Tay Chatbot that was brought very quickly to write racist messages and so transmitting learning there's an exercise for the algorithm to make it learn but there's too little data to actually get it to learn and so we have another kind of exercise to help our original problem with more data but this again leads to distortion, leads to bias so for example if we have robotic arms a whole full of robotic arms, a thousand of them they're not that good at learning so what happens in modern robotics is that you have your systems being trained in models so that's not reality but for example surrounding lights and of course the idealized version of a robotic arm that's not true to reality at all and so of course the future with self-driving cars more space in cities, more fluid traffic and all the promises that come with that can be quite skeptical about that because of course self-driving cars have been trained mostly in models and not in reality and so data points they can be close together or can be very much distant from each other and to find new data points that is called interpolation but find data points outside of that frame is called extrapolation and that's not very easy to do interpolation going from one or more data points to another can answer important questions for example how would the child of Kim Jong-un and Donald J. Trump look like, what would that look like also the changing video for example this is an edited version of a porno with a face of the Wonder Woman actress and so of course you know there's lots of possibilities you don't want to think too much about that and there's cases where we also don't want to make a wrong assumption but this is really hard when we talk about extrapolation and modern science still hasn't found the right answers for that and so what we need to remember is that algorithms make these assumptions and that can even lead to hostile attacks on algorithms for example you take a data point from a data set and then you have their properties for example traffic signs and then find a very similar data point that the algorithm treats as something completely different and the spooky thing is that this even works with printed versions of the same attack even if you don't use the same model the same data set so if you're interested in that please stay for the next presentation in conclusion I really want to point out that in all systems of modern machine learning there's an evaluation of correlation going on it's not about causation mathematical causational reaches of cause and effect in systems is very much a new field and the description of correlation is just one of the steps necessary for that so before we go over to the Q&A session Hendrik and I still want to point something out we really hope we pointed out the limits of machine learning but of course that won't keep us as a society from reaping the benefits of this new technology so in medicine, biology and of course that raises political questions that concern us all so these aren't decisions that scientists can shoot or must make on their own so who's responsible for making decisions human beings or machines is it about engineers or corporations when who's at fault and how can we actually enforce existing laws when confronted with this new reality of algorithms and how and why should they be regulated so thank you very much for your attention and yeah, let's hear some questions thank you, thank you, Hendrik thank you very much we're now opening up the round for questions please line up at the microphones microphone thank you very much for the talk really interesting I think there's a really important subject and how we deal with it especially because it has real impact on politics and mobilization and all those things Hendrik, you mentioned artificial intelligence and its application to let's say political propaganda Cambridge Analytica and so forth and also as a second topic the filter bubbles that you mentioned and I'm interested in especially one question in the context of Brexit and the Trump election what are the possibilities, the options that are there to actually oppose and counter that opinions are being echoed and even strengthened through the phenomena that you mentioned AI and big data so do you have any concrete suggestions to counter those developments? thanks very much for the question so that's about how do we tackle these problems and the political sphere and this is a huge question and I feel like it's going to stay with us for quite a while so with our friends and in corporations when we get together for me it's about how can we support people how can we visualize algorithms how can we make people understand what happens there and our perspective is how can we open up this black box and how can the system show how secure it really is and that's a lot of work for example there's the fairness accountability transparency machine learning there's a huge group of people from all sorts of backgrounds in psychology, sociology and of course computer sciences that deal with these kinds of questions please leave the room as quietly as possible and try to keep using the right door microphone 5 please I'd rather have a short comment in machine learning what we've seen with regard to the gorilla example isn't this an example of the cost function? since we've said the algorithm it's as expensive to take the class of gorilla and the class of human it's as costly as mixing up also other words this needs to be differentiated a bit more good point of course you can defend against that sort of thing and say this is worse than that for example with Google photos where we want to have all these categories it's also a task for human beings it's not just about machine learning and the problem with this bias and that was what it was about is more complex and it's not just about this cost function and the point was also about there must have been a wrong data point and then how this bias was calculated in some sort of negative spiral when the algorithm looks at that and maybe that's too much simplification going on of reality and of course these are really easy examples to illustrate this here and of course there's always blind spots and this might be one as well because Google might not be diverse enough and they didn't test their photo algorithm on enough data points before they made it public microphone one thank you very much I think this is a great approach of putting this topic on the agenda making it more popular you talked a lot about bias and very often when talking about bias it's about neutral and judgment and you also mentioned machine learning in that context and in this context with regard to political debates bias and screwing is that actually the right term to describe political phenomena or should this be maybe described differently with a different term? so you know we're not all about communications that's not our expertise there's people that can do that a lot better than us but you know people like us we were doing a PhD within this field as computer scientists how we can deal with these problems but it's going to be a long-term task to really get this out onto the street there's a question from the IRC is human learning not comparable to machine learning aren't the challenges the same or are they different? do you want to do this? well that's a question it's not just about machine learning it's also about evolutionary theory and psychology and I don't know too much about that but the idea behind machine learning is to find mistakes communicate them and then adjust and correct the model I'm not sure whether that's the case for us humans as well so in the sense that our brain is optimized over time I feel like I've heard psychologists deny that so the problem with this question isn't necessarily how does machine learning work it's how do we ourselves work and I think we're still trying to figure that out and the models of machine learning are very very rough estimations of how we ourselves work and so back then there was the thought of the body as a machine and now we have these neural networks and I feel like that's the model we apply to think of ourselves so I feel like there's always some adjustment going on and I feel like for us as human beings all the assumptions we make all the stereotypes that exist that might be useful from time to time but in totality they don't apply all the time so that might be the case for machines and for humans microphone 4 maybe a bit naive but are there metrics with regard to the bias that we see every day in the net are there techniques to visualize those kind of biases to make it known so people realize that this is a highly complex topic and I'm looking for ways of this complex topic being visualized so people know and get aware of those statistical problems there's research for this there's the MIT Media Lab that deal with this sort of thing they have examples of individual people what sort of gender bias they have for the people they follow on Twitter for example they only follow mails so that's part of the puzzle but this sort of media education that has to be taken seriously if we want to use all these algorithms if we want to drive around in self-driving cars we really have to take this seriously there's always blind spots and up front it's not possible to take all of those into account and for some more context regarding this question from algorithmic science point of view for us we don't have a we don't distinguish between positive and negative distortion for us there's just properties that let us draw conclusions and when we say that we have certain values that are just proxies for different data sets for example where I live is often a proxy for where I am from or other data points and so if I know what sort of data points are sensitive and shouldn't be taking into account to draw further conclusions then I can feed this into an algorithm and have the result computed from that so to actually see how the algorithm makes predictions what I'm really interested in is not the distinction between good and bad but rather a visualization of data sets and those results for example that one is based on a very small number of data points and others are based on a high number of data points and this I think I'm looking for visualizations of such cases yeah there are cases like that and we can measure that but it's always a simplification it's always the question of how easy can we make it how much can we simplify it there's this core approach and that's not about how do I have to think about my hypotheses when I look at the data but it's more about how good is my data for my hypotheses and this approach allows us to make different statements so for example if I have an outlier data point then that changes the statement I can make and then I have to be upfront about it and say that this is a very vague guess but the research for that is still sort of doing baby steps microphone one please thank you very much for this talk and also thank you very much for mutually translating this ongoingly on a constant basis I'm interested in your take on how useful those approaches are to take milestones and evaluate the opportunities that those approaches machine learning and so forth actually yield for society that's what we wanted to get at it's a political question and we don't want to answer that question for the law so it's about society dealing with this getting together and finding answers yeah I know but once you have to find a milestone separation between good and bad and you look at the data points on one and the other side and if you look at those data points and some are actually relevant and others are not let's say we have a point x a milestone with a certain quality criteria attached like a car doesn't kill more humans than a human driver and are there is there something like values grouped around this boundary that you can draw conclusions from or are all those points scattered around the decision horizon? I think it's problematic to sort of focus on this point because it's about a lot more environmental factors and then just take this one singular point and draw conclusions from that that's probably too simplistic and once again want to point out the political question and self-driving cars they can never test crashes at least not with humans so these crash scenarios are only ever tested in simulations and that exactly is the problem that we have today at least as far as I know we have very few scenarios where we have training in certain model scenarios what assumptions can we actually transfer into the real world and so even really pointing out these assumptions is very hard for us today and so it's still quite a long way to go Thank you again for the talk the discussion connected to all of this is actually how can we find truth and how can we make a computer find truth we have the same in science like how do I design my sample is there a bison in the sample have you ever dealt with this parallelism so we're scientists and of course we have to know for ourselves what we found out yesterday there was this talk science is broken so it's always very hard to have the right sample size and to think of the effect size and this is an epistemological question aren't there kind of values you can orientate yourself to with regard to neural networks are there any values you can rely on with regard to layers or parameters well this question is maybe it goes into early 90s early neural networks, Baltimore machines, Hoffman networks, stuff like that we can say that when they're saturated how many bits of data were put into that system but for the non-linear systems that we use today it's not possible to say that anymore it's just an estimation it's not an exact size but can't you say you need a certain amount of data points to saturate your system and therefore we need a sort of pre-validation approach but that's what happens you have this huge data set another data set and a training or evaluation where you really have to check did we actually learn something or did we just learn to repeat our input data set but this is still a PhD dissertation waiting to be written these biases are not new we've known them for decades in statistics what has now actually changed with artificial intelligence and in this context do you know any studies that filter bubbles really have an impact that you can also measure I've heard a lot about it in the media but I've never come across any scientific study that has shown that you are actually strengthening and making something stronger that hasn't been there before I forgot the first question please repeat it sure what has changed with the biases in machine learning no of course that's our new assumptions are new as well so we have great data sciences hyped right now at universities but now people go ahead and throw stuff into their systems maybe to make money but don't really think about these problems at all regarding the second question I'm pretty sure there's a lot on echo chambers out there I'm not sure what you asked exactly whether it was possible to test scientifically how an approach to testing this would work but I'm pretty sure there's lots of literature on echo chambers but I can't think of an offer right now thank you very much for the talk again I see that there are more questions please approach the speakers right after with your questions thank you very much everyone