 Okay, hello. Okay. Okay, let's get started. Okay, so it's my great pleasure to welcome Professor Marc Mezhar from Ecole Normale Superior in Paris for this Salam Distinguished Lecture. It's the first Salam lecture since I took over as the director, so it's an honor for me also to introduce the speaker. And as you know, this lecture series is an important part of ICTP's academic activities. Every year we have some very distinguished colleague lecturing here for three days on topic, quite diverse topics. For example, in the last five years, we have had, let's say, Amaldesena, then Alan Guth, Michael Berry, Brian Hoskins, Don Zagheer, talking about topics all the way from quantum gravity and climate modeling and number theory and cosmology. So this year we are very happy to have Professor Mezhar, who will be talking about artificial intelligence, which as you know, is a very hot topic nowadays. And there are, we would like, it would be a good opportunity for all of us to learn about what are the new exciting developments, what is the hype, and what are the threats also, because it has all kinds of poses, all kinds of ethical questions, both for many points of use. So I'm very happy to welcome Mark. First of all, let me thank, before I introduce Mark, generous support from Kuwait Foundation for the advancement of science. Unfortunately, their representative couldn't be present here today, but they have been supporting this activity for the last so many years. So ICTP is grateful for that support. So Mark is a theoretical physicist. He did his PhD from Ecolnormal Superior, where he is now the director. And his main field of research is statistical physics of disordered systems. And it's used in various branches of science, all the way from biology, economics, computer science, signal processing. And in recent years, his research has focused on information processing in neural networks. He's a recipient of many prestigious awards, like the Lars Onsager Prize from the American Physical Society, the Humboldt Gay-Lussac Prize, the Silver Medal of the Sienaris, and the Ampère Prize of the French Academy of Science. And he will be telling us today about artificial intelligence and deep networks and exciting topics like that. So welcome. Can you hear me? Yes. Thank you. Thank you, Attish. Thank you, everybody. It's a great pleasure to be back in Trieste and at ICTP. While preparing these three lectures, I had a few months ago, I had some interaction with Fernando Quevedo, who was in charge. And he had asked me to give a first lecture, which would be more like kind of like a public lecture, let's say. So not so much targeting scientists, but more like the public. So I have targeted the lecture for my wife who is in social history. And I hope she will enjoy it. But I hope also that the many scientists in the audience, even if I will not go into the scientific detail, I think the framework is interesting for all of us. And also in terms of where science can go on this topic, which is one of the major questions. I will start with just a short introduction, very short about myth and reality. Starting with a few statements. Ray Kurzweil, who got the National Medal of Technology given to him by I think he was President Carter at that time. What's actually happening is machines are powering all of us. They may not yet be inside our bodies, but by the 2030s we will connect our neocortex, the part of our brain where we do our thinking to the cloud. We are going to get more neocortex. We are going to be funnier. We're going to be better at music. We're going to be sexier. So these are promises, let's say. It's in the section of myth, so you understand what I think. Artificial intelligence will make you smarter. That's our colleague at San Diego. This is the enthusiastic side. Now I can go to the more pessimistic side. Elon Musk is well-known for making strong statements about artificial intelligence. AI is a fundamental existential risk for human civilization. In a way that car accidents, airplane crashes, faulty drugs or bad food were not. They were harmful to a set of individuals in society, but they were not harmful to society as a whole. And we should pay attention to what Stephen Hawking has been stating. He was also also extremely concerned on that. I fear that AI may replace humans altogether. This will be a new form of life that outperforms humans. One should remember, of course, that Hawking for many years had been using a series of computer-assisted devices in order to communicate, to talk and to write. So he could himself witness the progress of all these devices. That's probably in this perspective that engaging with the success that I will talk about in language analysis that has been done in recent years, this is probably the perspective that one has to adopt to understand this statement of Hawking. So these were two completely radically opposite sides. And I want to tell you a little bit why there are all these statements at this moment. After all, artificial intelligence has existed for more than 70 years. It has promised many things many times. And why is it that we talk about it so much now? And the reason is that there has been a very impressive progress in the last six, seven, eight years. Probably one can date the progress to the year 2012. And a significant breakthrough in terms of new methods, new tools, new devices that make what I will call a kind of technological revolution. Technological, I'm not saying scientific. We'll talk about that later. And so probably the field in which this progress has been the most impressive is image analysis. And by image analysis, I mean classification, detection of faces, segmentation and so on. Just to illustrate what has been done and the progress in terms of time. There has been a challenge that has been put, that has been prepared of one million images categorized in 1000 categories. It's called ImageNet. And there was this challenge which is give me a computer program that you give an image and it will tell you in what category it is. So you might say this is, I don't know, this is a star, this is a dog, this is a tiger, this is a dog, et cetera, et cetera. This is pizza. You have to find for each image in which category it falls. The challenge had been started in 2010. And you see here the various competitors and what is the percentage of error that was done by the various program. And this was 2010. The best program was making error on nearly 30% of error and all the other ones were worse. Some of them were making 75% errors and so on. The year after, small progress, the best program goes to 25%. And then in 2012, there is a breakthrough and the best program goes down to something like below 15%. And that was the first time that deep learning was used in order to address this challenge. And you see there is a big gap. So there was one guy, one team that had, that has used deep learning and all the other ones were more or less in the standard range of the previous year. And then immediately everybody was catching up with the same method of deep learning with variance, of course. And you see the progress that has been very, very impressive in this classification task, which 10 years ago seemed nearly impossible. And so this was classification. Another aspect of it is image segmentation. You are given an image and you have to analyze this image and be able to say, on this image, there are cars, there are buildings, there are people walking and so on. And to classify all parts of the car. If you are able to do that, of course, it opens the way to, sorry, it opens the way to self-driving cars. That's what you have to do in order to have a reliable self-driving car. It has to be able to analyze the scene around it. That's the first step. Then from this analysis of the scene, it should also understand what are the dangers, what can happen and so on. So this is not just a tiny thing, the self-driving car or the self-driving trucks, for instance. There are the first semi-automatic truck series of trucks, which are put on the highways, just to give one order of magnitude. At the moment, there are 13 million of heavy trucks on the roads in Europe. So if you change the way they are being driven, if you don't need any more a driver or if you need a driver for 100 trucks, there is a question of evolution of the society in the organization of jobs, which is a major one. But in my opinion, some of the most interesting progress is in terms of image analysis for medicine. There are several examples of it. I mean, I see new examples arriving every week also. I have selected a few of them, which are all well documented and published paper. This was a paper in Nature 2017. These are images of skin lesions, epidermal lesions, and you see here the one on the top are benign lesions. These ones are malignant, and you have to distinguish which one is which. And deep networks that were trained on ImageNet, actually, on a very large dataset, and then trained specially once they have an architecture and they are more or less able to start to analyze images. You train them on here 130,000 clinical images, which have been classified by medical doctors. I will explain that. It's called supervised learning. And using this training, they show that the artificial neural network achieves performance on par with all tested experts. And so it can be very helpful to analyze this. And it's also something that you can do with your mobile phone. So you see that in terms of screening, in terms of detecting and indicating someone, look, you have this thing on your hand. Maybe it is time to go to the dermatologist. It becomes rather urgent. It can be quite useful. One which appeared recently about breast cancer screening, analyzing mammography images. Again, it was the idea of trying to see, with mammography is in general the first screening test that can be done. And its analysis is a long and painful process. And it can be done now by computer program, again by a deep network. It was trained. These guys here, they have organized a big, very large database of 1 million images. 1 million images of mammography and trained on this database. Again, it's a labeled database. So it's 1 million images on which the experts tell you this is a suspicion of cancer. This is not. And from that, you train the network. And you see this is a measure of performance. I will not explain in detail what is this measure of performance. But on this measure, this is the performance of the individual radiologist who is doing that routinely, let's say. That's what you get when you have this screening done. The computer program, the deep network, does very significantly better. And in fact, they did the statistical analysis in order to reach the performance to be better than the program. You would need to have a pool of at least 14 radiologists and do the average of all their prediction, let's say. And then you would gain. So again, it's a very significant achievement. There are many of them. This is for lung carcinoma. Again, this was in nature communication. And the conclusion of the paper was our results suggest that automatically derived image features can predict the prognosis of lung cancer patients and thereby contribute to precision oncology. And again, they were on par or better than all the best experts. So these are some aspects of image understanding. There is much more than that. There is language recognition. I don't know if you are using automatic translator, but everyone who is using automatic translation can witness that it has made progress. It is still imperfect. But there are translators. And also in the way that you use them, for instance, they are much better if you give them a paragraph to translate than if you give them three words. And this is very significant. It means that they are able to get something of the context. It has been, again, a major breakthrough in automatic translation that is in some sense the experts have given up the same thing as in vision. Actually, there were a lot of work before. And as soon as these new deep networks arrive, the activity has shifted to these new devices, putting aside all the semantic analysis and so on that had been done before. So this is very clear. There is a very significant progress. One can say that probably now in very, it already exists, but it will be very efficient in a few years. You will have some very device in your, that you have here on your glasses or whatever. And you go to China and if you want to say something of everyday language, like taking a taxi or getting somewhere, people will understand you. And so that is something which is significant. A small warning because I am always, when I say that, people ask me the question about translation of literature. And translation of literature is another field. It's not at all something that you, it's not in the same range. I mean, if you look at this sentence, which is the first sentence in Proust à la recherche du temps perdu, I put it in an automatic translator and I got this translation into English. It is not bad. It is correct. It is not Proust. But in some sense, and then I went to see actually how à la recherche du temps perdu is translated into English and I found at least two translations that I found interesting. The title, there is some people who have translated it as like in search of lost time and other people have translated it as remembrance of things past. The two are very nice. The two are beautiful. The two are, have something of Proust. Which one is better? It's not a problem of an automatic translator. It's a problem of literature. I mean, translating literature is an act of creation. So it's not, I will put it aside. It's not, it's not in the thing that we are looking at. Games. Games have been, okay, I think it was already 1997 that Deep Blue was winning against Kasparov. Some of us remember that and it was a shock for people who were playing chess. At the same time, it was a shock, but Deep Blue at that time was using power. It was using computer power in terms of exploring many possibilities deep enough that so it was brute force in some sense. And at that time, I remember very well everybody saying, well, but go, it's impossible because go has a combinatoric factor, which is such that at each step, you have many more possibilities. So it explodes much faster. And a few years later, AlphaGo as in 2016, AlphaGo again, a deep network trained on many, many games played by humans. AlphaGo succeeded in beating the world champion. More interesting than that, the year after there was a new version of AlphaGo, which trained from scratch. It did not have access to human played games. It just trained against itself. It was using the machine, a random machine to start with against another random machine, see which one wins, reinforces, iterates that many times and builds the world champion with ways of playing go that are very atypical that the experts do not recognize. Let's say it was a new thing. So this is a success. Another one was a poker was supposed to be something unreachable because you have to you have to there is a level of psychology in terms of, you know, bluff and so on. So and there is even the, how is it called, Starcraft 2, the computer game that our children love, and which require to collaborate with other players. So again, it's another dimension of game. It's no longer one against one, but it has a cooperative aspect. In all these, there are programs now which are very strong. Beating the most, the best humans. So this was just as a kind of teaser to say some aspects of this, of this tech, what I call a technological revolution. And now I want to explain a little bit how it is done. It turns out that the principles are very easy. And so I will try to explain it in very elementary terms. So the whole, the whole idea is an idea which is called machine learning. So in machine learning, you have a machine. And the machine, it is an object. It's like a computer program. There is an input and there is an output. And in the machine learning, you will present some input and have some output. So it will, it will be what, what I call, not I, what we call supervised learning. So in supervised learning, there is a supervisor and the supervisor presents some examples of the task. So the supervisor will present hundreds of thousands of pictures of cats saying for each of them, there is a cat. Hundreds of thousands of pictures of dogs saying for each of them, this is a dog. And the machine will be trained, it will not be programmed a priori, like we have our computer program that have been carefully designed. Instead what the machine does, it has a lot of, a lot of knobs, a lot of parameters inside, like all these cranks here. And you have to turn, to tune all these knobs so that the machine gets a successful result for all the images that you present to it. So there are many, many possibilities and the machine itself explores all these possibilities until it succeeds well on this, what we call this training set, on the database on which we do the training. So once you have this machine, the real test is not that it does well on the training set. In the end, on the training set, you already knew the answers because that's how you have trained the machine. But then you present a new image of a cat and you want to know if the machine succeeds in getting the result. So you present a cat, you present a dog, it's not good, so you can compute the number of errors that it makes in the phase that is called the generalization. It's called the generalization error or test error. And that's how you gauge how smart is a machine. So this is, this was a paradigm, the paradigm of machine learning. And it is very important because in some sense it completely shifts the activity even of the programmer. Instead of building a program, you know, there were a lot of our colleagues who were doing computer vision and who were trying to identify cats on an image. Don't ask me why one wants to identify a cat. But this was always considered a kind of challenge because a cat, it can have a lot of perspective and different cats and different colors and et cetera, et cetera. It's not, and it has been a major challenge. So you can say, you start with that and you say, okay, let me sit in front of that. I will do a small mask. I will see if there is a small triangle and another triangle. And it means that there are ears like the ears of a cat. And you can try to do that. But the successful programs, you have given up all of this. You put to the machine a lot of internal parameters, a lot of internal knobs that you have to turn to tune. And you let the machine go around and search in all these parameter space until it finds a solution. And strangely enough, it succeeds and it generalizes. So now let me explain the idea behind these machines. In some sense, this is one of the tools, one of the, instead of cats or dogs, you can also analyze simple images like images like this one. So these are handwritten digits. Recognizing a handwritten digit, it is something that again every child can do once you have taught him what are the digits, let's say. And again, for computers, it was a big challenge. Now it is solved. If you think of it in mathematical terms, in the standardized description of a handwritten digit, you will have, let's say, a square 28 by 28. And each pixel is white or black. So you have, so it means there are 784 pixels. Each of them can have two colors. So there are two to the 784 possible inputs. And for each of them, you could say the output should be a number between zero and nine, a digit. Or maybe if you have a completely crazy input, the output could be, this is not a digit, let's say. That would be another possible answer. So this is a task. It's not easy. I mean, just for those of you who are trained in science, going from a 784 dimensional input to this output, the number of functions that can do that is absolutely enormous. And you have to find one that does exactly what we do. So the kid does it, or we do it, using our visual system and a bit more than that. And basically, using neurons, the neurons have, are basically elaborated in such a way that they get the signal from other neurons through dendrites. They accumulate the signal in the cell body. And if the signal is large enough, they will propagate it along the axon and send the signal to other neurons. So they are very tiny elementary processors, which do a kind, an average sum of all the other signals. And that was the idea already long ago to describe, to create artificial neurons. So in artificial neuron, this is an artificial neuron, it will get the signal from the, from the previous neuron. It will do the sum of all these signals and output something which depends on this value. There is a nonlinear function applied to it. The only, the, the, the heart of the computation is in the fact that this neuron sends a signal here through a synapse. And the efficacy of the synapse, it's not the same. It depends on the synapses. And so this signal can be amplified when it is, when it arrives here, or maybe it can be depleted. So there are, here, there are four parameters, four knobs, that tell you how this, this artificial neuron is, is working. This is the value of these parameters, which are the weights. How much weight you put on this signal, this one, this one, this one. Basically you do a linear superposition of all these weights with coefficients. And these coefficients are these four weights here. So this means that, sorry, you can immediately, with that idea in mind, you can start building a neural network that will try to analyze handwritten digits. And this was done in the 50s already. So it's not a new story. It's a very, it's a very old story. It was done by Frank Rosenblatt. And the idea of Rosenblatt was, okay, I have all my pixels here. I put a neuron for each pixel. It is zero or one. I do a weighted sum for all the pictures and for all the pixels. And I will pick up as an output the one which has the largest signal. So hopefully you should find, here in this case, 784 weights telling you what are the coefficients of this linear combination. And this will give you the output. This was recognized immediately as an extremely innovative idea. And the proof of that is that it made its way to the New York Times in 1958. And it was, I mean, the first sentence was, the Navy revealed the embryo of an electronic computer today that it expects we'll be able to walk, talk, see, write, reproduce itself, and be conscious of its existence. It was not funnier and sexier, but it was in the same spirit. Unfortunately, it was soon recognized that the perceptron has extremely limited capabilities. Because the perceptron is a kind of, here I talk in scientific terms, it's a linear separator. So it can be, it can do only linear operations. And so there are a broad, vast range of operations that it cannot do. For instance, it's not bad, it's not good at recognizing unwritten digits. So it's not sufficient. So, sorry, what is new since Rosenblatt Perceptron? Well, first of all, we have devised now multi-layer network that I will describe. We have much larger database. We have much, much, much more computing power and a few minor improvements in the algorithms. But in some sense, all of this was already here in the 80s. And for those of us who were working in the field in the 80s, I think the major new ingredients are the availability of very large database, which has been crucial, because you need very large database to train the machine. And for that, you need a very large amount of computing power. All this was not available in the 80s or 90s when people were working on that. And this appeared gradually to a level that allowed this technological breakthrough. This is one example of a database called MNIST, let's say, database of 100 digits labeled. So we know for each digit what it is supposed to be. And this has been one of the tools that has been used. This was a small one. It has only 70,000 images. But then we went to cats. And fortunately in between, there was the World Wide Web and people started to put images of cats, of their cat on the web. And I don't know why they did that. But it had the people who were programming because they had the database. Okay, so now let me do the exercise. I want to build a neural network that analyses 100 digits. So I present a 100 digit here. It has 784 input neurons. They are black or white, depending on whether they are in the five or not or outside. I have not drawn all of them. And then I try it with a hidden layer of 50 neurons. And then I have the output layer. And I can train it. I take my 70,000 examples. And you see that here I have many, many parameters. All these knobs of the machine that I have to turn, that I have to adapt. It is the value of the transmission. I mean, if this one is active, how much does it force the activity of this one or not? This is given by one number. I have to find it. And so what I do is something completely stupid. I start with completely random numbers. And I look at what happens. And I look at how many errors I do on the database, on the 70,000 images. I will do many, many errors. Probably I will have success only in one in ten statistically. And then what I do is that I take this number here and I try to increase it a bit. And again, I run all the things. Does it improve or doesn't it improve? If it improves, I accept. If it does not, I reject. I do that many, many times for all these parameters. It's a very time-consuming and computer-time-consuming activity. It's a bit of a caricature of how it is done. It's done in a slightly more clever way. It's called a stochastic gradient descent. But forget about the details. I think the idea is more or less this one. You try. You try and improve. And after a long training phase, what you get is a neural network that works. So I've given you this example because it's something that the most clever in the audience in terms of programming, they can, at the end of this talk, they can sit down and implement that. Very easy. You just have, you know, you write a program. You look sequentially at each of the weights, try to increase it, decrease it, do it many times. And what you will get is a neural network that is good on 96% of the images in the test. So it's not that bad. It's not enough to be efficient for the post office, let's say, but still. And if you go to deeper architecture that I will describe later, you get something which is correct at 99.8%. And it makes only 21 errors. And the error that it makes, these are examples of the error that it makes. In each case, it's not so clear what the guy was wanting to write. So okay. So it works. One problem is solved. Not a very spectacular problem, but still something that we didn't know how to solve. And you see it is solved by the machine searching in this large dimensional space. So the real success is going to deep network. And deep, what does it mean, deep network? It's very carefully chosen. Deep network just means that instead of taking three layers, I have my input. I have a first-sealer layer, second-sealer layer, et cetera. And I pile them up. And I can have hundreds of layers. Each one takes the information from the previous layer, does this small sum of the information for each neuron, and transmit it through a nonlinear function, which is here. In a deep network, you can have hundreds of thousands of neurons, hundreds of layers. There are more than millions of parameters. So in this sense, you see that the training is trying to find in this very large dimensional space of millions of dimensions, which is a space of... We want to find a point in this space, a point that will do well on your training set. And it works. And it works thanks to, first of all, to the database. Still, there are a couple of puzzles that... So this is a parenthesis for the scientist in the audience. I want to stress a few puzzles that people are really wondering about at the moment. The learning phase... So if you describe it in terms of science, it is finding... It is an optimization problem in this one million dimensional space. This typically, in such a large space, this has all reasons to be... If this is the energy, it's a number of errors that you do. You would like to find a point with zero error in an energy landscape, which has all the reasons to be extremely agitated. It's an extremely complicated optimization problem. Why does it work? This is not so clear. And it is... It's one of the things that people are working on. So why isn't it what we call a spring glass? Are there flat regions? Are there many possible different solutions that all are equivalent? What is the difference between deep networks? So the deep is just the depth in terms of the number of layers. Experimentally, it is easier. Why having a more shallow network but broader? It should also work. Experimentally, it's not as far and not as good. All this is something that is being discussed. A second puzzle is the puzzle of overfitting. In principle, you have so many parameters and even if you have many data points, the thing that should happen is that you should have overfitting of the data because you have fitted the data with many, many, many parameters. You remember the sentence attributed to von Neumann? I'm not sure if it is true or not. With four parameters, I can fit an elephant and this is a curve actually of a colleague who has done the exercise of taking a parameterized function with four parameters until it looks like an elephant. It succeeded more or less, let's say. So normally, when you have more and more parameters, yes, you decrease the training error, but the generalization gets worse. That is what happens all the time in statistics. I mean, I have data, I give you a three parameter fit, give you a four parameter fit. With the four parameter fit, I fit better, but the generalization and new data is worse because I do overfitting and fitting the noise. Here, not other puzzles that is being discussed. Then when you look at what is being done, it is very interesting. If you look at face recognition, for instance, you have an input layer which is a picture and you look at what is the signal in the input that triggers the activity of each neuron in the first layer. And you find that in the first layer, you will have some kind of edge detection. This guy here, it will be sensitive to an edge in this region of the screen, et cetera, et cetera. By the way, this is interesting because you have these cells in the brain also. Then if you go deeper in the network, you will have neurons that are sensitive to this kind of pattern. This kind of pattern is looked like an eye. And then you go, you go. And then in the last layer, there will be the whole layer will be sensitive to, it will react more strongly when there is your grandmother face in the image. So it is elaborating collectively the information to larger and larger scales, which is something that is expected in some sense, but you can see it. So all these are beautiful achievements. And I want to stress why all this is not yet a panacea and what are the problems. And I will focus on three problems. There are many more, but I think these ones are essential. The first problem is the need for very large amount of data pre-processed and labeled. Then the second one is the fact that there is no understanding of what is taking place. I will tell you why later on. And then the fact that all this has no general intelligence, no reasoning, no representation of the world, no consciousness, no attention. I should add no causality. It's not at all an intelligent analysis of anything. It's a clever way of detecting some correlations. First of all, the huge amount of labeled data, I have emphasized several times the importance of large database. It is unpractical from a tech technological point of view. It's also insane when you think of the number of people who are employed in the world and in some countries specifically just to label images. Because labeling one million images takes time. It takes human time. So it's very unpractical, let's say. It also gives the proof that deep networks are very far from mimicking the brain. And I give you an example about the activity of babies. Here is some experiment that is being done at EqualNormal about language acquisition by 20-month-old babies in the group of Anne-Christophe. And they are playing with babies. And they play with some objects. And the objects, they have a name. And this object is called a co-rabbit. This is called a K-tractor. It's called a K-book and a co-han. And the baby plays with all these objects for hours or time enough that he gets used to all these objects and their names and so on. And then after that, you present the baby two images, one on the left, one on the right. And you ask the baby, oh look, there is a co-bamool. Do you see a co-bamool? And then I will ask you, if you are a baby and I ask you this question, do you look at this one or do you look at that one? Who looks at this one? Who looks at that one? There are many of you who are undecided. You are not very good for experiments, but still the vast majority looks at a co-bamool which is this thing here. You're right. I mean, and that's what the baby do. You're right in the sense that's what the baby do. So it means that they have in some sense gotten the information that the prefix co is related to something which is an animal. So this does not exist. It is not an animal, but this neither. But if one of them had to be an animal, it would be this one. It has two eyes and so on and so forth. So the generalization. So it means that the baby generalizes from two examples, not from 100,000 examples of what is an animal and not. So this already tells you, look, we are in a different world. Yes, we are using the neurons which are artificial neurons which are mimicking some of the aspects of what the brain is doing. All this is wonderful. We have a new paradigm which is learning from data instead of learning from a set of rules that has been given to you. But it is clear that the neural network is not working exactly in the same way as the brain is doing. So there is a challenge that we have shaped at Ecole Normale. Emmanuel Dupu was presenting that in particular, which is to build a machine, maybe a neural network, might be an artificial neural network, that learns a language on the basis of what has been heard by a baby in his first year of life. Believe it or not, we have colleagues at the moment who have children who are registering what the child listens to for the whole first year of life. And statistically, all the children understand a language after one year. By the way, they understand this after one year in spite of the fact that depending on the family where they are, the culture where they live, et cetera, they will be exposed to a number of words which fluctuates a lot. Some children listen to many more words in their first year of life than other ones. All of them, more or less, at one year they understand. They don't speak all of them because speaking is another problem which is motor control, which is a different story. But understanding is rather stable. So all this does not fit with what we do in machine learning. So that is an interesting challenge. What do we know about the deep networks? This is the second problem that we have. And this I like a lot. I like a lot because in some sense you have your deep network. It has one million of parameters. You have found them. You have trained it. It is discovering cats and dogs in a very hidden, in an image as smart as you. And you know everything of what it does. That is, you can look at it. You can say the neuron number 50,300, et cetera, this neuron, it is connected to this and that and that. And I know the value of this connection. I know the weight. I know this coefficient. I know all these coefficients. I have trained them. I have them in my memory. So I know all the single detail. So it is what I call the neuroscientist dream. You know the activity of all the neurons, of all the synapses. And now I ask you, do you understand what it is doing? And I mentioned this paper because I liked it two years ago. There was this paper by two colleagues, a neuroscientist and electrical engineer who was asking, could a neuroscientist understand a microprocessor? They picked up the challenge of taking a microprocessor, actually an old microprocessor, the one of the, I don't remember, the Atari game or something, you know, that very old one, very simple one in some sense. And they started to analyze it using tools from neuroscience. You say you measure the current from here to there. You knock down this small piece, does it work, doesn't it work, etc., etc. And they spent months on that and they never understood how the thing was working. So this is a good example. And in some sense, this is my excuse for working on all that, because I think that basically the collective processing of information, how it works, how it elaborates information, how the information becomes more and more collective when you go deeper in the network, when you go to deeper layers, is a whole problem of what we call in statistical physics emergence. Emergence is this idea that the whole contains more information than the sum of the parts. And that there are a lot of information in the correlation of the activities of the various neurons and so on and so forth. This is what, I mean, in statistical physics, we know that very well. We know that water molecules, they can assemble in a solid or in a liquid or in a gas. I mean, this makes a whole difference. They are the same molecule. They interact with the same rules. But depending on the circumstances, they can have a global pattern that is completely different. I mean, the structure, the reaction of an iceberg is not the same thing as the reaction of the sea, of the water. I mean, every Titanic knows that. And so this, I think, is what is missing. That is, we don't understand exactly this collective behavior. So when you face that, you will understand that one consequence of the fact that we don't understand it fully is that we cannot give a guarantee. We cannot give a guarantee that a machine will work in all circumstances. And this is kind of worrisome. If you want to use one of these machines to analyze the scene, to analyze an image, and use it for driving your car, you would like to have some kind of guarantee. And here are some examples that there is no guarantee. Among all the games that our colleagues are working on, just to have something different from cats and dogs, there were some other ones, some of the teams who were working on panda and gibbon. So I give you a data set, a database of pandas, images, and gibbon images, and you have to recognize what is a panda, what is a gibbon. You train the network, many images, tune all the knobs, hit the planet with all your things and so on, and you have a computer that does it. And it does it well. You present this image and it says, this is a panda. It's not very sure, but still, it is a panda. And now you present this other image here, and you ask the machine, what is it? And the machine say, oh, I am sure it is a gibbon. So what has happened in between, this is interesting, what has happened in between is that another perverse team of scientists, another team of perverse scientists has decided to design an image which would be as close as possible to a panda image, the one that is well recognized, but perturb it a little bit, adding a little bit of this pattern here. And this pattern has been optimized actually using another deep network. And it has been optimized in order to fool the program of the other guy who was doing so successfully in panda and gibbon. This is called adversarial attacks. And that's a real problem, because if you can, you know, here you cannot detect that it is, by eye, you cannot at all detect that it has been, that it is an image that has been perturbed. And so, and furthermore, you can do the same with, here is another classifier. This one was classifying banana, slugs, nail and oranges. You present that, it is absolutely sure that it is a banana. Good job. And then you present this, and the guy says, oh, this is a toaster. And again, the only thing that has been added is this small patch. And thus, the patch has been very carefully designed in order to fool this program. But now, if you have this sticker, you know, you have this sticker, you will kind of patch, you put it as a sticker on an image, and you fool the thing. It is clear, it is obvious that if it is available, and you put it on the stop sign in the street, and it is recognized as a green light, there will be crazy people putting that all everywhere in the street, because there are enough of crazy people on earth that some will like to play this game. So, this is, so it is a concern, let's say. Then the third problem that we have with the present situation is that there is no general intelligence, no reasoning, etc. The deep networks that we have at the moment, they are very smart machines that solve very specific problems, very well posed, well defined, with a simple measure of performance. I mean, you play Go, you know who is the winner, and you know what is the rule to be the winner. It is still a complicated game to play, but the rule is very well defined. I will add an appendix to this analysis, and this corresponds to a debate that has taken place in the recent years, and it was a debate that was initiated by Chris Anderson, chief editor of Wired, and he was saying in an article that he published in his journal Wired 2008, he said, this is the end of theory. The data deluge makes the scientific method obsolete. Faced with massive data, the traditional approach to science, hypothesized model test, is becoming obsolete. The new availability of huge amounts of data, along with the statistical tools to crunch these numbers, offers a whole new way of understanding the world. Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all. Now you are worried in this audience, because you are all out of job. So I decided to, I mean, it's much more serious than bananas, cats, and dogs. I mean, we are talking about the scientific method. Are we going to give up Galileo and et cetera? What is happening? So I actually wrote recently a refuter of this argument. You will not be surprised that I tried to refute it. And I was trying to analyze to give a small thought experiment. I mean, it's always interesting when you ask questions about, in some sense, philosophy of science, we know how it is to play with thought experiments. And the thought experiment was that I was thinking of was the following. Imagine that you want to train a neural network so that when I throw an object, I want to know where it will land. That's simple enough. And I don't want a model. I have read the article of Chris Anderson, and I just take a video camera, a guy who is throwing objects, and I do 50,000 recordings of that. And I feed my deep network with that. And maybe, after taking all that, a well-trained deep network will give good predictions about where the object is landing. It will give good predictions. First of all, it will need to have registered a few parameters. I mean, if I don't have an idea that the speed of the wind is something important, for instance, or maybe the hygrometry or the temperature, which will have an impact on the friction law, and so on, if I have not put these parameters in my database, maybe it will never make it. But imagine I have put many, many parameters, and so on, and the age of the guy who sends the object, and it will use the good ones, and it will do a good job. So maybe it is as good as what I do as a physicist or as engineers using the Newton equation. Which one is the best? So now you have two methods, one which is just based on correlations. Take images, and you deduce from that correlation. And the other one, which is based on basically a couple of equations, you have Newton's second law, and then you have the drag, and you have the laws of friction, basically. The scientific model, I mean, the two ways of approaching the subject are completely different. Scientific model has an extremely compact representation, and it is a representation that can be decomposed. You start with the main element, which is gravity, and then you add the effect of friction, you add the magnus force for the rotating objects, and so on, and from all these ingredients, putting them together, you get the result, and you can predict where the object will fall. This is scientific intelligence. It's expressed in terms of models and equations. It's combining. I like this idea of composition. It's a composition of elementary laws. Each of these elementary laws being relatively simple. And it's applicable in very different contexts. I mean, if I have that, I can throw an object on the moon. I have to change one parameter in all of these analyses, but I will predict very well where it lands. So, you see, there is this idea that once you have this compact representation, you can apply it to completely different contexts. While if you have the training through the video camera, there is no way other than going to the moon with the camera and taking another 50,000 things. And there is no way at all that anyone with the neural network will be able to understand that the same law applies to the planets, to the gravitation of the planets, of course. I'm not saying that it is trivial to guess that, but at least there was at least one person who did it long ago, and that was successful. And so, this is a very big difference. So, this tells you something. The deep networks have no reasoning, no representation of the world. They don't have a notion of causality, at least not yet. No consciousness. This is a very big issue. I will not explain it, but it is clear that they don't have it. No attention, paying attention to something. So, the conclusion of all this is that we are infinitely far from general artificial intelligence, and they would be really ready to say that in spite of all the technological progress that I have shown you, and the analysis of skin cancer, and the lung, etc., etc., and the mammography, all these which are spectacular and very nice result, we have not made a single step in the direction of general artificial intelligence. I think we are at the same place as we were a few years ago. So, this is just to take the right distance with what has been done. And so, I know that as a conclusion people always ask me, so what is it that is going to happen? So, it means predicting and predicting, in particular predicting the future is something which is difficult, and just to remind everybody that it is difficult, I have put here a few statements of famous people, and you pick up your favorite one, but you see miserable failures all over the place of people trying to predict the future. So, having shown that, I will try to predict the future, but you understand that you should take it with a big grain of salt. I think that you can adopt a positive view. If you adopt a positive view, you will say, it helps, it will all these technological breakthroughs, they help to achieve better diagnosis in medicine. This I am really sure that it is taking place. Case law search, looking at jurisprudence, looking at all the, all what has been done, and trying to find the best case so that you can describe your argumentation with respect to that, optimizing displacements of people and goods along railway, freeway, etc., etc., devices people, devices that help people to have fast access to relevant information, automatic translation, and so on, customer support, robots that can help for instance elderly people, that's clearly the case, identification of new pathogens, developments of new drugs, this is a whole field of quantum chemistry, I mean in terms of prediction, predicting something in quantum chemistry, this is very efficient and very much used, so there are many positive possibilities. There is also a big transformation in many jobs that can be certainly foreseen. A more concerned point of view is that all this will lead to a large destruction of jobs on a timescale which is too fast for the society to be able to organize and adapt to the new situation. The question of timescale is very important in terms of organization society. I told you the breakthrough was done in 2012. There are many, many companies who are using it at the moment now, and it is very simple. I gave you the principles, I mean everyone can do that in some way, everyone. No, precisely not everyone can do it, but many people can do it relatively easily. So in each of these cases that I was mentioning, you can have many drawbacks. I will not address all of them, but you see at the same time as you can have a smart customer support, you also have a customer profiling and so everybody becomes profiled and you get, okay, this creates a lot of concern in terms of individual freedom even at some moment. That's just one example. So I have a major concern also for the present, not for the future, but for what is taking place now. George Soros had this interview in which he was stating, I want to call attention to the mortal danger facing open societies from the instruments of control that machine learning and artificial intelligence can put in the hands of repressive regimes. We know that there are some rapidly expanding information system in some countries which are monitoring the population and when they are put in the hands of repressive police, for instance, then they can be extremely dangerous. It's not an anecdote. I mean several countries are adopting it in the recent years. This morning, I added this sentence this morning, I was reading it in the New York Times, the London Police Department has announced that it would begin using facial recognition to spot criminal suspects with video cameras as they walk in the street. So this means that you have video cameras in the street and they are recognizing everybody and they can know absolutely everything that you are doing. So I think we have to think twice in terms of what is happening there now, not in 10 years or five years. And needless to say, I will not elaborate on that, but there have been many examples of smart with many question marks, political manipulation of information in elections in many countries. So I would say that there is a very urgent need for control mechanisms for the elaboration of ethical rules for the development of a global vision on the possible impacts on our societies and all this needs, it really needs some strong resolution coming from the scientists and going to the political world. I just have this sentence of conclusion from Henry Kissinger. Philosophically, intellectually, in every way, human society is unprepared for the rise of artificial intelligence. And the title of the article is how the enlightenment ends. So this is not a very positive conclusion. Still, I would like to summarize, by a few take home messages, there have been spectacular progress in very well-defined tasks with a clear objective. They can help humans in sophisticated, repetitive tasks. I mean, it is medical diagnosis is one of them, but there are many more. And this is something that can be extremely useful and that will free us from a lot of repetitive activity. They can also be misused. Awareness of the dangers, regulations, ethical supervision are absolutely necessary. And by the way, all this is based on having access to and using very large database. And so it is in some sense an extremely monopolistic scheme of evolution of the society where a few companies will be the leaders because they have access to the larger database and they are the leaders. So then they have access to even more database in the cycle and with the feedback. So it is something that is very serious. We are extremely far from the general artificial intelligence. And the last thing, last but not least that I like a lot, we know everything on this machine. We understand very little. And I think it's a good conclusion for scientists at least. Thank you very much. Okay. Thank you, Mark, for a beautiful talk. Accessible to everybody, I think. So are there any questions? I'm sure there are many questions. Well, thank you very much for your talk. Virtually I agree with everything. And I think I will take home the message. We should interact more with our colleague in the neuroscience department because that's the way we can learn about our innate capacities, which are not represented in these algorithms. And plus many other complicated things like one of the problems we showed from the segmentation to you. That's a binding problem in psychology we have been doing. So bottom line is this, we have to learn more about ourselves. Yes, I think I fully agree. I think that we have to learn from neurobiologists. We also have to learn and in some sense I think probably the next horizon, a possible horizon, will be to try to mix these methods of artificial neural networks and deep networks, mix them with some knowledge-based approaches. I mean, in the end, there has been a lot of work that has been done on language analysis, for instance, and take language. And we know a lot of things about language. All of this has been thrown away, or not thrown away, but let's say put aside temporarily, I think, in order to adopt this completely blind attitude. But then mixing the two is certainly a very interesting challenge, but it's very non-trivial. So this is for language, but then of course there is a lot of things to learn from neurobiology. I have a very simple question. In the part where the machine was trying to learn the digits, is there any particular reason to use a 28 by 28 grid, or is it just a random number? No, no, it was just, it just happens that the first database that was done, which is this NIST database, was standardized with a standard shape of the letters and so on of the digits, and it was standardized on a 28 by 28 grid. 28 by 28 gives you already a good resolution that you can yourself identify the digits easily. And it's not that large, it's not like an image of a cat, which has immediately one million pixels. Let's say it's in between. I have a question concerning language. Do this network help or can be used also to classify the complexity of a language? If it's possible to quantify? I don't know. I'm sure it can be done, or at least it can be tried, but maybe it has been done, but I'm not aware of it. The only question that I have with respect to that is you need to have availability of database of kind of equivalent size database if you want to do this comparison, and very large database. And so this is the only difficulty in doing it. If you have in mind rare languages or languages in which there is not a very large written corpus digitalized and available for you to analysis. Mark, hi. Thank you. This was an excellent talk. I'll play devil's advocate with your warning bit. I mean, is this technology any different from any other technology that could be abused? Let's take the example of radio and television that were used systematically to misinform. You did mention the point about the risk of being monolithic, but in this case, we contribute to this. I mean, we have control because it's, we are the data. So first of all, we are the data, but I mean, very few of us at the moment, very small number of us is controlling his data or is limiting the access of some big companies to our data. I mean, we have not been, all this we have seen it, all this is very recent, and we have not been protecting ourselves or protecting the society. I'm not saying protecting as an individual, but protecting the society by limiting a bit the access to the data. So I think, yes, you're right. The technology can always be used in a bad sense. Still, I was very interested by, I was reading this article, I think I gave the reference in, yes, this article by, by Feldstein in the Journal of Democracy in 2019. And basically, I will make his argument a bit caricature, but his argument was the following. If you are, if you are a dictatorship and you want to maintain the order in the country, you need to have a strong police. If you have a too large police, there is some probability that in the police there will be other people who want to take power instantly. So the good thing is to have a police which is reduced, very faithful to you, and very efficient. And then you start to see, well, a face detector, putting here, et cetera. All this makes the repression very much more efficient in some sense. So I think, yes, you are right. It has always been the use of technology in order to manipulate information and so on. This has reached a scale which is, which is much beyond what has been done previously, I think. And also in such a, in such a short time that we have not yet developed the, the antibodies to that. I mean, the fact that there is propaganda on TV, we are used to it in some sense. The fact that some of the messages on social networks have been targeting these subpopulations in order to, to induce votes and so on. Now we start to be aware of it. But two years ago, three years ago, we were not aware of it. So we did not have yet the antibodies. So I think we have to, collectively, we as mankind have to be very, very much aware of it and, and act on it. Just two very quick ones. The first question was, has the deep neural network really kind of taken over as the monopoly of machine learning? Or is there still kind of a niche for things like genetic algorithms of a certain kind of learning tasks? And the second question was, from your example, from the, the, the co-car language and also the, the toaster misidentification. It seemed to be what you were saying was that the neural networks, because they seem to break down everything into the, you know, the pixels for the numbers, that they, that they still, the approach doesn't have the, the ability that we have to kind of pick out the salient features in the way that the, the data or the information is presented to the algorithms. The fact that some animal-like behavior or that the panda features, is, is that kind of a fair way of putting it? Is that where the, the advances need to be made somehow? Yes and no. So for the first question, I think what has been spectacular is that in, in various fields, I mean various topics, there has been this very spectacular breakthrough of deep networks. And so there are many people working on it. At the same time, you can see it has been also an incredible development of software. I mean, you have elaborate software, high level languages, libraries of high level languages. You can, you can do all that. You can program it with 15 lines of codes because there is a sophisticated, or this has been evolving very fast. It does not mean that there is no research going on, on other topics, let's say, but it has shifted a large amount of the resources towards this field. As always, I'm pretty sure that there will be the, the other ones will also at some point get their, their moment of, of glory and maybe make some reading between the two. As, as for your other question, I think it is, the answer is, is slightly subtle. If I, if I look at this, at this image here, so it's a cartoon, it's a cartoon of what he's taking. This is a receptive field. This is, again, this is taking a neuron in some of the first layer and asking if I want to, if I want to find an image that maximizes the activity of this neuron, what is this image? So I find this thing here. This is an image that, for this neuron here. This is an image for this other neuron. This is an image for the third neuron and so on, so forth. See, these are just edge detectors, very simple, low level things. If you go deeper, you will find that the image that triggers the largest activity starts to look like a, like a face. So my, my answer to your question is, is mixed. I think there are relatively high level signals that are being used deep in the, in the network and it has to be because the network in the end is able to say, this is the picture of the grandmother. So it has to be at the same time, all this does not mean that it has a representation of the world. It is a purely, it is a purely visual analysis. It's analysis of lines and so on. The notion of an eye does not mean anything to this machine. It does not know that an eye is used to see. So this, but this presents a very major obstacle for, for developing it in other contexts. The fact that we know that an eye is used to look at something, it means that if the eye does that, there is another object. I mean, we immediately induce a lot of behavior from that. So, so the limit is really the fact, it's not that there is, I mean, these machines, they have a little, they have some aspect that integrates the, the data, but it's purely visual and it does not have a representation of the world, does not conceive it as an object which has its coherence. Is it clear? Yes. So one of your conclusions was that there is not yet any progress towards general artificial intelligence. At the same time, many, many companies advertise that they do artificial intelligence. The German government wants to invest some billion euro in artificial intelligence. Are all those people just confusing machine learning and artificial intelligence or is there more to artificial intelligence? No, the, the, the, the objective general was very important. I say that there is no progress in general artificial intelligence. By this, I mean a machine which precisely will have a representation of the world, that will know what is an object, that will know that there is a time, that there is causality. All these aspects I think we have not made progress at all. We have made progress in some aspect of artificial intelligence, which is using machine learning, but it's one tiny aspect of artificial intelligence. It's not at all the general one. It's one dedicated to very specific tasks, recognizing melanoma on an image, playing go smarter than the world champion. All these are, so it is tricky. Probably if I would have asked you 30 years ago, is playing chess an intelligent activity? Most people would have said yes, it is intelligent. You have to plan and so on and so forth. If you ask it now, I would bet that less people would say that it is intelligent because the machines are doing it. And we have learned that actually it is, it is not as creative maybe as we thought it, it could be initially. So we are also shifting our definition of intelligence gradually towards more and more creative activity, which I like personally. So may I ask if there is any application of artificial intelligence that is not machine learning? I mean, again, I am not among the people who say that all the work that has been done should be thrown away, not at all. I mean, there has been many, many things. If you think in terms of automatic treatment of languages, let's say that has been beautiful things that have been done in terms of understanding the semantics or the rules and so on. All this has been, there is a corpus of knowledge which is there, which again it has been put aside at the moment because in terms of technology, this other approach is outperforming it. But I am pretty sure that at some point we will have to mix the two. So I have a question on the case of the Panda and the Gibbon. So in order to fake the network, do you need to know the, the data, I mean, do you need to know all the parameters or you can just have the same database and. No, you train. You train. You need, you need to have access to the network in order to present images and see what it is. Okay. And this is a, say, it's a hard computation. So, okay. So if you don't open the box, you don't get to open the box. Sure, sure. So there are questions from online audience. So maybe Sandra is going to ask on behalf of the. Thank you. We actually have about, about 80 viewers on YouTube and Facebook following your talk. So the number of questions, the most pressing one seems to be about who should be responsible to regulate the risk of artificial intelligence. I am not going to answer this question. No, I mean, as a citizen, I have some, I, it is an, I have some answers. What I think we should do is as scientists to be able to warn the rest of the society to indicate all of these, to explain what is taking place, to also open the box, as I was saying today, that people understand what it is. I mean, it's not, it's not magic. Then personally, I think that there are several kinds of things that should be done. There are the same way that there are ethics committee for life science that have existed for a long time. There must be at some moment, I think committee is about the use of data. And so this, this is certainly one aspect, but it's not the only one. I think it's, it's very plural. There must be, there must be also some activity about trying to ban, for instance, one thing that I, that I really fear is the development of autonomous decision-making warrior robots. Based on analysis of scenes, analysis of images and so on, the next step, which is not that difficult, if you think about it, is to have a robot that analyzes a thing and is trained to shoot. And you could have a robot that makes a decision by itself. And I think this is a step that really is, is extremely dangerous. And, and this there could be a ban as there has been bans on chemical weapons and so on. We know that these bans, they are not absolute, but still they exist. It means that when, when some countries transgress the ban, it is something that is noticeable. So this is one other aspect and, but there are many more, I think. And then there is the whole thinking about the evolution of the, of the job market in different jobs and so on, which is another, just to give free aspects. Okay. One last question. So you mentioned here, I think it's the central point here, the difference between the deep neural networks and the attacks to the first place and then generalization. I think that even human thinking also is not, you know, also has some element of these deep neural networks. So let's take very simple examples. So there are some politicians about whom we say, well, they are excellent tactics. And about others we say that they are strategists. And so what is tactics? Taxes is essentially like this deep neural network because it is testing the reality locally. And they could be very excellent in short, short run, but then could bring, you know, the subject to his opposite and bring disaster to country, let's say. What I'm saying is that I have a very, very strategist is someone who looks beyond the circle. So there is a circle of convergence. If you simply, you know, sum up the series, like z to n by n, something like that. So you do not know what is happening beyond the radius of convergence. But if you know that there is an underlying analyticity that there is analytical continuation things like that, then you do know what's happening beyond. And this is, to me, some analogy between tactics strategy between this. Yes. Maybe what you say has something to do with what I was calling a kind of representation of the world. That is, in some sense, one thing which is characteristic of human intelligence is that you are asking or you are being asked or you are asking yourself one question. And you answer another question which is somewhat different. Let's say we do that all the time in research and we know that it is part of the human activity which is to identify, because you have representation of the world, you can, you know what is what and you can displace. You are not just focused on one only simple answer. Okay. So in view of the time maybe we should. Mike, how many is about the question asked by somebody, what else is there in AI beside machine learning? Well, machine learning is basically categorization, nothing else. It takes a lot of examples and make some kind of abstraction and then categorize that object. AI has been, in the last 70 years, a lot of things have been knowledge representation schemes now and the deductive systems and so on and so forth. This has become, in the last few years, become all of a sudden a powerful technology, as you said, and I agree with that, because of the computing power. Because if the computing power was not there, it was nothing. I remember in the 80s and 90s, nobody cared about it. So that's the answer to that question, what else? In AI, there's a lot of things, not just a lot of things. Okay. So I think I'm sure there are, I also have actually a number of questions, but I will keep them. I think there will be some drinks and stuff outside. But before that, the, all the diploma students, the traditional ICTPs, that they are required to, they're well, they're encouraged to, but I would say they're required to stay here and ask their questions to the speaker for a few minutes and then they can join everybody else outside for refreshments. Thank you.