 solo per registrazione, quindi potete anche passare con l'olio di decola l'olio e su? dai, questa è guarda, metti questo qui, perché se non si si mette a muoverci poi che... Cioè, sì, ce n'abbiamo un'ambulanza, proprio non so perché il video non è dell'audio... il video dici... l'audio la si... cioè, l'audio la sta funzionando un maggio di sé, adesso stanno vedendo... vado a vedere la controlla e di là si vediamo lo che... Badoom, grazie, grazie molto non è ok, potete mi sentire? non è bene quindi, io lo so... sai le regole, ma io lo do sbagliate quindi... oggi ci sarà una lectura un po' differente dall'ultima volta proviamo a spendere un po' di tempo e a capire cosa la studia di rinforzamento è di quanto può essere interessante proviamo a capire il senso di come funziona e quindi facciamo una questione generale quindi il formato di oggi è abbastanza peculiarmente è molto interruttivo quindi è un esperimento perché ho fatto queste cose in diversi contexti con diversi audience vediamo come va da avanti per farvi pagare a 100% con le matematiche e tutte le cose che devi fare ma per oggi potremmo avere un'interazione e una discussione ok? quindi... proviamo a guardare un short clip ok? mi chiedo di pagare attenzione per questo clip e poi vediamo c'è una cosa molto sorpreta non mi è inserita pensavo che non è una misura non mi pare non mi pare non mi pare non mi pare non mi pare non mi pare non mi pare non mi pare Pettite riuschi 分 c'è un'altra faccia Di enforzioni Jagger you le visite recenti e usate dalla natura, che è un papero che è avuto per print in ottobre 2017. Quindi io voglio attraverso il tuo attenzione su un paio di sentici. Io li vedo per te perché è probabilmente che il font è troppo piccolo. Quindi questo è un programma di intelligenza artificiale chiamato AlphaGo Zero, che ha mostrato il game of go senza alcuna data umana o guide. E poi voglio riuscire a leggere questo paio di sentici per te. Come fa AlphaGo Zero? C'era raffreddito un'automagnetone improvement in molti dei numeri di renna rispetto alla formazione AlphaGo, che ha defeatato Lisadol. Quindi è un game di 5 milioni di traini, piuttosto di 20 milioni, 3 giorni di traini verso 7 mesi, e una macinata per solo con 4 uniti di trasmettia tensore, piuttosto di multa macchinata etc. La nuova versione AlphaGo Zero, senza alcun guide umano di data umano, ha attaccato il versione former 100 al 0. Ok, quindi chi ha un'idea, un'idea generala, che questa погa veniva e che questa pappera è stata? Prendetele da voi. Ok, molto bene. Ora avete un gruppo in piccoli gruppi, diciamo 5 o 6 persone. Facciate attraverso che in ogni persone there is at least one person who has raised the hand now, who elect as a facilitator for the following session. So once you form this group, so please move, make a mess around, change places, switch places. La purpose of these small groups is to confront each other and ask questions. So the outcome, the output of this short 5 minute session will be asking questions. So what do I know, what do I think I've seen, what I didn't understand, what are the challenges, why this thing is relevant, questions. You don't have to come up with knowledge or answer. If you have some knowledge you can help facilitating the discussion but you have to come up with questions. Ok? 5 minutes from now. I will move through. Are you organized? No, no, no, you have to organize yourself. We don't know anything. That's the point. So you don't, nobody, nobody. Matteo. No, c'è sono due o tre in fila. Poi quando sono finiti, sta con la, sta con il microfono. Loro non hanno nessuna idea, poi guidare la discussione. So he's gonna facilitate the discussion. Is there, is there any group, so question? Question, excuse me, you have attention a second? Excuse me, so are all your groups set, is there any group who is lost in space without any guidance? You are? Good. You're alone. Ok, please form a group over there, I will provide a facilitator for you. No, no, I am. You are a facilitator and then gather someone else over there. Please, please, your attention. Ok, so don't, if you, so if you have joined some other group, don't disperse because you will be asked again to gather in the same group as before, right? So try to find your chairs, move around, try to organize yourself in order not to be shuttling for all time back and forth. Now we will go through all the groups asking for input or feedback from you. So let's start from, there was a group here. There's one spoke person per group, of course the others are free to add more information. So sorry, just a second. I'm asking a technician if you can have a microphone so you can be recorded as well. Can we have a microphone please or I just use my one. I know this one, I'm sure it's one. Is it? Keep disperse? Ok, now it's working. Ok, could you please? Ok, so from what we understand, this is different and more exciting than a computer learning how to play chess. Computers knew how to play chess for a long time now, but chess is different from go how. Chess is, I guess by understanding some kind of deterministic game, you have a set of initial conditions, some moves that you have no choice but to play, and by following those rules you can get to your final desired outcome, that is to say getting checkmate or stalemate or whatever. I think those are only two options for winning. And a computer used to be able to learn how to do that by just repeating, repeating, repeating, practice, practice, practice, learning all of the possible outcomes and then just playing them. It couldn't do that with go, now I've never played go, I've just learned apparently the rules of go, that in some way it is more of a kind of Markovian game. Does that seem correct to people who know about go? Ok, so yeah, and the whole point of playing go is that you can create some kind of bridge from one side to another by placing your color wherever you want and being able to connect the dots in whatever pattern for the end and that's the condition for winning. Ok, so does all this description make sense to all of you or was sort of lost in space again still? It's about, let's go back on that simpler thing. So it's about one computer program that is able to play a difficult game. Ok, so that's the baseline of the discussion. And it's able to do that at such a level that it's largely outperforming humans in this task, especially in this second generation. Ok, so anyone from your group wants to add something, some command? No, so please pass on the microphone to the next group. Yeah, the thing we observed that this computer learned without any human data. I mean it excel in something far to the humans without any input from humans. Ok, so this is already an interesting point of difference. In the first version of this program, AlphaGo, which was the one who defeated Lee Settle, which was champion in this game, this was heavily trained with examples. So just the computer was receiving many, many, many go matches and then looking at this example is sort of based its knowledge. In the second generation is AlphaGo Zero, the recent one. There is no such input. The computer is just given the rules of the game, so it doesn't perform moves which are not allowed. All the rest is just self-teaching. What does that mean? Can you speak up? Self-play, ok, self-play. Self-play repeated games starting from very low level and then going up all the way. You see, it took only a few days to go up to a level where it mastered an enormous ability. Ok, next group? Yeah, the questions that came to mind were precisely what data the machine uses to learn how to play, because there's an exponential number of moves so there's no way of learning all possible moves. And as regards the reinforcement, the only reinforcement is when or lose at the end of the match. That's very good. This is already pointing to some key issues that we will look into in great detail so there are two big problems arising here. The first one is how to represent the chess board. So what does the algorithm see? It cannot just have a list, a table of all possible configurations. They are way too many. They are ready too many for chess. It's a standard large number of configurations that you have. So it doesn't work that way and that's the first problem. And the second problem is it's difficult to build upon this process of self-play because the outcome actually comes only at the end, when you win or you lose. So the signal that tells you you've been doing better or good comes only after the end of a very long process which requires, as all of you play a game, requires planning, requires forecasting. How does it really do this? These are good questions. Please pass on the microphone. Who has the microphone? Oh, please. Let's just ask a question then you will move to the other way. Ok, so in our group we said also the same things. That ok, without rules, just with a process of self-learning the program was able to teach himself to play go and beat any machine or human in the same field. So with the rules of machine learning. So the question is actually yes, how can be devised an algorithm which is able to create and gather data from itself and then build them without any influence of the external world into a winning algorithm. Ok, so that's basically a slightly arranged version of the same question. So how can these things actually be done? So that will be the purpose of this two-week course. Understand the basic, not the nitty-gritty details of how to type this hard, but understand the ideas that lie behind e that allow this progress to be made. Please move the microphone on top. Thank you. Is it possible to extract this learning from the machine or it's like hidden in a set of neurons and so on and is there any symmetry or let's say different scales in the go game? So the question is if I can summarize it correctly if I think it wrong. Do we understand how the machine thinks? Yeah, is it possible? It's possible for us to decode what it's doing. Because then for physics it could be something... Ok, so I can provide an answer in very rough terms to a degree. We don't understand in full what the machine is doing. Much of the process by which decisions are made is obscure to us. Also because it takes place in some abstract space which we don't have the intuition to understand. But this could be much told for many of the actions that humans make. It's very difficult to understand why you came to certain decision. It's a complex thing here. Really sometimes we don't understand the language that the machine is using. And this is a problem potentially. So there are projects, ongoing projects about what is called transparent AI development. So development of artificial intelligence which is understandable by humans. And it's an interesting issue. A little bit more on the futuristic side. So building up safe artificial intelligence one that we can understand dialogue without having machines that are just big black boxes which do things that are not really understandable to us. To a degree we can do it, not all of it. That's a provisional answer. As what regards the question, do we understand what they do and can we decode it? I would just attract your attention by playing back this initial part of the movie. In light of this question, this was a meeting between go players and machines. Let me say that the machine has transformed the way humans play. The machine now is teaching something to humans. It shows that some things that happen in the machine are cool. That's a very surprising move. That was a mistake. So this is a very nice short answer. Let's just play it again to explain the context. So during the game of the first machine up-and-go there were two expert players who were commenting. Looking at chessboard, the moves that machine does and then reproducing them. And look at the expression. So this one is not a world champion but is one of the greatest experts about go in the world. And it looks at this and it's really surprised. And actually what it does, it's not shown in the video, is that he's reproducing the move on the chessboard, here on the magnetic chessboard. It moves this and then looks at this and then puts it back again because it can't believe his own eyes that the AlphaGo had made his move because it was totally surprising. It was a move that no expert player would have played in that context. So the machine is doing something that not only we have a hard time in the coding, but we have intuition why it's doing that. And this is transforming the human play at this level. I just one thing that came to my mind is what do we teach the machine when we say that we have taught, like in the previous version, we taught some set of rules. Rules, we already teach even the second version. But what was the extra information that was there for the first version? For the first version, there are many games which are played. This is very usual for chess. Many, many situations, et cetera. And then we just feed head into the computer. So the computer knows the development of many, many games. Knows who has won in the end, right? So it's a strongly, in a jargon, supervised part of the learning which takes place, which is absent from the second part. But then as someone said there are exponential possibilities of possibilities. We can only teach a finite number of... That's what we were saying before. There's no hope, and these machines don't do that. They create like we don't do. When we play chess or go, we don't just have one configuration and say, okay, from that configuration like compute, I plan, et cetera. You have an overall view, a strategic view, which of course wipes out many, many details and leaves room for many, many errors. But you have a sort of compact, compressed representation of what's happening in the chessboard. We have one. Sometimes we can explain what we see on the chessboard. Sometimes we can't, the machine has another. And we will discuss how to construct this efficient representation of a world which is too large for us to be encompassed into a single table. I just have some questions. The first one is, is any special rule for the corners or not? For the corners. Oh, now you're asking a question about Go. I don't know. I don't know the rules of Go, I admit. Okay, okay. Please, can you tell, do you know how... Sure, sure. Can you pass on the mic? We can answer for you, I hope. Okay, so I used to learn to play Go when I was young. I got amateur photon. And these guys, these professional players, are professional night dons, which is the highest level of Go. So I just want to make a remark of what you were just talking about. So I think one really important thing of Africa brought to us is that you make a lot of moves that was nonsense to us to what I've learned from Go. So the move you just showed to me from a Go player's perspective was okay, it was unexpected, but I can tell what it's doing right there. But a lot of moves made by the later version of Alpha Go, it was just totally nonsense. For example, one of the favorite moves of Alpha Go is called 3-3. When I was learning Go, my teacher told me that that is the worst move you could possibly make in the beginning of the game. That's the favorite move of Alpha Go. So I think like Alpha Go really... I really think that Alpha Go is the single most important events in the history of Go. It's just like it brought so many new things to us. And all the professional players now are studying the moves of Alpha Go and they are trying to, you know, form a completely new understanding about the game. Thank you very much. So there was a technical question you can answer in my place. What was the question again? About the rules of the corner. And some other questions. I don't know I'm able to ask them or not. Are they all about Go? Yes. Ok, so you can address them directly to him. And probably separately because the goal of all this is not to learn about how to play Go. Ok, just to take it as an example I would be failing miserably. But it's about taking this as an outstanding example of what machines can perform at a human level and how they do it, etc. Ok, so you can address directly. Sorry, what's your name? What's your name? Chris. Chris, you can ask Chris all the details. Ok, thank you. Any other questions? Ok. Ah, match your question. You told that in 30 days it was able to beat one champion, right? With 30 days of training. That's what I read. Yes, I already forgot. But 3 days. Ah, 3 days. Ok, so is there a limit for that? I mean if it keeps on training itself is there any limit up to which that machine can learn? It's a hard question. I don't know how to answer. It should provide probably a metric for this. So is it improving? Is it ever improving in what sense? I can't hear you. Is it? The question is you have to provide some metric for this to know whether it still keeps on improving. Like a ranking of a low rating. I see most of us have heard about machine learning but the question is really important for me that AlphaGo Zero is a machine learning platform where it can learn from itself so it played with itself only with the rules given. How? What about learning physics? Can it publish something because it's really important for science that nowadays we have see a lot of phase transitions because of having machine learning but we haven't seen them before. So if we give for example all the rules do it. Can it publish anything? Or can it really learn in the sense that we do everything? Can it learn anything? Or there are some limits on the way that it can learn because if it can publish, if it can learn so, what's the purpose of us here? These are existential questions and of course they are very important I'm not putting them aside. They require an entire different question of discussion. So first of all this is our question about what it means to be to learn for us. Again all these questions first we have to face general questions what is intelligence in general before asking questions about what can artificial intelligence do? So what is intelligence? Is it the ability of just collecting information and coming up with answer? Is something more creative? Is there a gap between what machines can do or it's a continuum? So everybody can have his own opinion at this stage because we don't know, right? So these are all legitimate questions. For the specific point of condensed matter yes these things are already on their way so people are extensively using machine learning in order to classify materials finding materials, exploring so this is something that already exists. On the more general setup it's an interesting question if most of you are interested we can organize another discussion like this on more existential issues or social, no I mean I'm serious this is something I'm very much interested in. So if you are interested just we can find some time later in the week or next week in some afternoon to meet again and discuss like this all together about social, political, legal and ethical and existential issues about artificial intelligence but for now we just keep on more sort of mathematics related part. Any other questions? Concretely how does this algorithm work it's like a neuronal networks or... If I tell you now when the course is over that's exactly what we're gonna do in the next two weeks, okay? Any other questions, comments? Yes please then we'll be on the back after. So about the the surprising moves can we consider them some of them like like a way to deceive opponent or a human? A way to deceive, oh yeah that's interesting yes Might be In fact there is an entire emerging notion of adversarial learning, okay? So might be that it's one thing in a more I don't know for again like I said I'm not a go expert so you should ask Chris for this but it might well be that it's in the benefit of a long range The algorithm reward is doing things that are apparently negative, this is something that happens very often especially when the rewards are very much delayed as we will discuss So here and then there One of the biggest differences between AI and the animals is that using The difference between AI algorithms and animals is usually AI algorithms needs a large amount of training data to learn while animals just need few experiences but here we are dealing with something which trains on self-produced data So can this way of self teaching be an answer to the problem of learning from a small amount of data? First of all I would say that in comparison between AI and animals it's a bit unfair because animals have been around for millions of years for training So it's not training that occurs during development or during their lifetime There's also a lot of training which has been going on through evolution It's a different kind of training, different kind of algorithms but when you compare what animals can do with what machines can do you have to account for that you have to account for what all evolution has provided animals with in order to go for their goals And then on the other side there's a plus to artificial intelligence in that the rate at which experience can be collected is much faster in a virtual environment as compared to the ones that we have to gather in order to do things we have to spend time energy there's not much time that we can allocate for doing tasks and repeating themselves the most obsessed person who plays goal all the time will have to go and eat and pee Right? So we are biologically constrained machines aren't on that side we have some advantages from our millions of years of evolution back machines don't have So I see them more than a opposition I see more as a continuum on a sort of very large dimensional space of abilities and that's my view so everybody is entitled to his own My question is about what we said earlier actually can a machine have the concept of deceive an opponent or in general how can their tactics in a game be related to immense because the notion of deception is a moral notion, right? in the sense that you are freaking her or him into believing something ok? So this is something which concerns ethics and again the question, the larger question is can a machine have an ethical behavior so I could play this move but this move is actually deceiving so I wouldn't play it, I would refrain from doing this or I want to do it because I want to deceive so the notion that you want to do you want to do this because you want to deceive might be available to machines but we don't know I'm not thinking in a moral way I'm thinking more in a tactical way for example when I say deception I mean setting in game traps like make the opponent believe because if you are a player of any game and you know that your opponent is human and thus is fallible you may trick him you may trick him into think that no deception is manipulation is making the opponent think what you want him to think and then exploit this thing like making a trap in chest and waiting for the opponent to fall in it whatever works I'm not sure that so since alpha hero is playing against itself ok? so what it learns from the game it's already shared with the opponent so they always on same footage I'm reluctant to think that it leans on the fact that the other player has some weak points because the same weak points of the other player are his own weak points so this kind of shared information between the two opponents makes it for me more difficult to believe that it's using this strategy deliberately to take advantage of some weak point of the opponent but again I don't know can I I don't know if this is a rather existential question so maybe it's not the right time to ask it but I would like to ask we have seen that these machines are capable of outperform us in tasks which require planning or forecasts or in general logic reasoning I would like to know do we know and if we do what have we done in this direction if these machines are actually capable of doing purely creative processes like drawing a painting composing a music piece or writing actually I think with writing a poem I have seen a but also the painting and also the music ok perfect so the answer the short answer to these questions is yes yes yes so they do this the more interesting question is again when you define creative to define creativity somehow for us humans so if creativity is essentially a combinatorial exercise in which you see many things are different and then you say ok but if I combine this with that then something new emerges so if you feel like and then many scientists think like this way that most of their creativity comes from having vast knowledge and seeing connection between things or seeking this connection if you allow for this definition of creativity then machines can be creative again this is my viewpoint but if you might talk with other people I've been talking with audiences which are very diverse and there were artists and there were senior more senior people and they say no no no creativity is something different something that pops out of nowhere right and then you ask a question it's popping out of nowhere because you just cannot follow the process by which it's been created or it's really something new and I don't know the answer might well be that we do something which is totally really there is a gap substantial gap between doing simple combinatorics and exploring possibilities and combining things and creating something really out of nowhere I don't know how it works my view is that creativity is one form of very smart exploration and ability of connecting things and having large views and is accrued by large knowledge but I might be wrong thank you is the machine aware of who he is playing with sorry can you start over again is the machine aware of the opponent who he is playing with will he know if the opponent has changed because perhaps he could find some weak spots in person if it plays with me he will know I'm a beginner and just try to win very fast there are so many levels to this question so on one level you might ask the question ok is the machine able to detect whether it's playing against a human or against another alpha goes zero so there's also borders on the touring test and all problems so you can decide whether there is a machine or a human on the other end of the line of course very difficult opens up also questions I'm not going to address them all but there's a clear path of references you can look into and there is another question that is the machine building up a model of what the opponent does building a sort of theory of mind of what the opponent is doing I don't know if these algorithms do there is no obstacle to machines doing this we will discuss this how to build models of the environment and use them for improving learning and ability so the answer is again I don't know from a yes or no depending on what kind of generalized you're seeking in your question I'm going to take the last question because we have several other things actually go through ok is the way that the machine learns if I have two alpha goals programs or machines will they play always the same style when they learn from themselves how to play will they play always the same style so can any different styles can be emerged there can be some convergence you can try to avoid this kind of pitfalls of convergence to some specialized behavior which would then be predictable by adding some exploration so there are ways of it's good if you converge to something because you want to converge to something that plays well or best but you also don't know what best is really best so you want to also allow for experimentation so this way of balancing the things of getting all your knowledge into something that you are able to play well and also allowing room for other experiences and changing and being unpredictable which is very important in learning and we will discuss I hope this answers the question very very last question pass on the microphone because they were being recorded so for two same party is the outcome the same I mean given an initial configuration or first so yeah may I just especially in presence of incomplete information which is the general setup in which you don't have perfect control of all configurations of your game or task it often turns out that the best policy is not deterministic so there will be inherent randomness in what you do ok just based on the fact that your knowledge is incomplete so I think this answers the question ok thank you very much very very interesting and helpful if you have further questions and want to discuss first among you that's the purpose of the whole thing and then eventually with me of course very happy to follow up now we see another sequence of videos from a different realm and we go through the same exercise again right we see the videos we gather together then in 5 minutes discussion and of course many of the themes we already been discuss so we have just to discuss what is difference with what we seen previously ok without the new questions that emerge and what kind of curiosities arise from you seeing this and this is the next one there's another small clip again from Boston Dynamics so again gathering groups as before 5 minutes for discussion coming up with questions ok ok ok ok ok ok ok ok ok ok ok Ok ok ok Ok ok ok ok ok Okay, so yeah, um, two words really come into mind when I saw that. Just a sec, just a sec, please. Those two words being mass unemployment. Well, I don't think so, but that's a subject for the other meeting, right? Sure. Yeah, so let's address it just for a sec. I mean so, mass unemployment, where does everybody come from? So the combination of, say, the learning techniques that we saw before with the hardware, you know, how possible could that be, where could that take us? Yeah, so that's a concern, that's a real concern. It's, in the, you can keep your hands down, we'll just move around in a second. So that's a real concern, how this will impact our societal structure, the work will be transformed or erased. Okay, there are many, many interesting issues, nobody knows the answer yet. There are many, many suggestions in how to proceed, so there's a certain something that's very important. Very interesting, very timely to discuss especially for your generation, right? So, again, if we want to meet some other time and discuss all these issues, I would be more than happy to learn from you how do you feel about that, especially. And then I can tell you what your professor thinks about it, okay? Any other comments specific to this? Is that, what's the difference between what we've seen before? So, I would say that what we saw just now is almost less surprising, right? I don't know, am I the only one who feels this way? Yeah? Okay, let's have a poll. Who's surprised from what they've seen? Surprise, okay, let's remove the sort of emotional part for the moment. Let's just think about surprise all, like measure the in-bits, okay? Who was surprised by what they've seen? So, you are surprised or unsurprised depending on whether they live deep learning or not? Just one at the time, sorry, I can't hear you. Can you pass on the microphone? I don't know nothing about this, but it depends on which way the rules are given to the robots. But the question is another, why are you surprised? Yeah, and that depends if I'm surprised or not. If you took them, you have to keep your center of mass in the middle and do everything we do to move. Is that the way you walk? No, I don't know. Is that the way you walk on snow? But, okay, I have to keep my body mass balanced. That's exactly the kind of thing that people have been doing for 60 years now, because these robots, working robots have been around for 60 years, and in the last 59 years, they've been failing miserably because they tried to control every step. But actually, people finally realize that when you walk, when you walk down a slope, when you walk down a chair, what you're actually doing is you're falling. You're falling in the least controlled way. Okay? So people now, they're putting a lot of research into least controlled algorithms in order to achieve tasks which are such natural to us, such as walking down a slope, et cetera. Okay? So it is surprising or not surprising depending on whether you've seen a clip of this kind of robots walking on snow six months ago and basically rolling all over the way hundreds of times. Okay? But I'm taking your answer for what it is. Your degree of surprise depends on how this thing is done. That's what you're saying. Okay? Okay, I want to ask about the algorithm and how it is doing it because first we saw AlphaGo learning how to play Go. But in this case, these machines are learning how to move. Oh, it's only in the hardware. Like the hardware tries to be the most efficient in order to move. Okay. So this points to one big difference between what we've seen before and what we have seen later is that these are physical machines. They have to interact with the real world. Okay? So they have actuators. There's a lot of engineering in how to build feet or something avoiding to slide or a handle, right? There's a lot of technology if you wish in creating a physical interface between the machine and the world, right? Whereas the first, the other case was totally abstract. Just an algorithm proposing moves. The only way it interacted with the environment was just by sending out bits. Whereas here there is a physical interaction with the environment and it's environment which is wildly more variable than the environment of another Go opponent. Okay? At least we'll never stand up and start punching in the face of the machine, okay? Whereas this can happen in the real world when you interact physically with an opponent, like in this case, the environment is a start. Think about stepping on the snow. You don't know. There might be just one hole in front of you and then you just fall over there. Everybody, I would think, sorry. I'm talking as a westerner, sorry. Most of you might have experienced what it is to walk in the snow down a slope, okay? If you haven't, you might have other experiences of how it is to walk in very unpredictable situations. Right? Which might be different for snow. Sorry for this horrible whitewashing of the problem. Okay. So, yeah. There was a lot of training induced by... It's not by crafting examples, right? Because, as you've seen, they have been opposing to this in very different ways which are not intuitively predictable by the robot. It's just that it's been sent to the ground, abused many, many times. And it takes a lot of time. Because, again, if when you have a physical interface rather than an algorithmic interface, like we said before, like animals, you have to recapitulate the ability of standing up from the ground in a short lap time which might be months or years of what has been the result of a long evolution. Okay? Now, I have a question related to this characteristic, to be physical. In the AlphaGo Zero, you can... Oh, I mean, this is reinforcement learning too. This is not supervised, right? No. For the robots, it's more complicated than this. There has been initialized supervision because, again... It's not just that they put the robot in an arena and then kill themselves until they are able to stand up and move. So, it's different because you don't have the time allowance for this. The last part is, again, largely important. Yeah, because you cannot train them digitally. And then you cannot train them only digitally simulating an environment because you cannot reproduce the complexity. So, is it a fact that the time you need to train these actual physical machines are... Yeah, is it an issue? The fact that the physical machines need to move actually in the... In one minute, you cannot do a million of trials. Yeah, of course. Clearly, their ability to generalize... These machines are very good at generalizing, in general. Starting from a relatively small subset of experiences to extrapolating how to do it in new... That's key secret, right? When you're able to do this, you do well at least to survive. But there are limits to this, right? No, wouldn't be able to drive a robot and the robot wouldn't be able to play go at no level, right? But probably a lot of difficulty in moving the pieces on the chess board without just fumbling everything around. So, still we are talking about artificial intelligence system which do very well on a relatively narrow task. No, I mean, the question was even in the narrow task that just needs to be reproduced digitally. You can... I mean, in three days that AlphaGo Zero can reach the top level. I don't know the figures, but perhaps Matteo knows. We are trying to recreate these digital environments which they can try to let the robot learn and then transfer what they learn in the computer in the software inside the physical robot. But this is very, very difficult and I think it just... I'm not an expert, of course, but it's just at the beginning. Yeah, this is also one way virtual reality using virtual reality to boost the ability of learning. And another thing that was very important, I don't have an example, but it's also very important in development of AI, it's video gaming, okay? So that's all branch of research I've defined which extensively teaches or allows artificial intelligence systems to play video games, which have a variety of experiences which starts to be comparable to human experience. Even though in that case there is no physical interface, so there's again this added layer of engineering which is also very important. I think that to exist. I think that in the case of the robots what we saw was that nobody can stop them. Yes, and they continue to reach the goal. Yes. In the first case, in the case of the goal, the machine can may lose, so robots can do much better, yeah? Yeah. Yeah, so that's very important. If a machine learns well, well, determination is something that doesn't lack, right? So it's very important to shape the objectives of machines especially when they interact with us. As long as it's a computer, you might always think about taking the switch off, which is not easy in general, okay? Because they might learn when you are wanting to do this and prevent this, okay? But see, this is some sort of open questions in AI, how to make interruptible machines or their way of designing algorithms which are always interruptible. No matter what they learn about you, about how you act, you can always have a handle, a back door to shut them, okay? And this was one side of the thing and then the other side is exactly. The problem of correct alignment of goals, objectives, you have to specify clearly your objectives. You have to think about artificial intelligence machine, robots, next generation, not the ones that we have now, such as the genie, the genie of the tails, right? You have three desires. Then you know you have to be very careful in asking because you will get for what you ask. You know you will not get what you thought you'd be getting, okay? So in artificial intelligence it's very important shaping right rewards, correct rewards which reflect your knowledge. There are many interesting discussions in philosophy. So for instance, if you say I want to build a machine that, this is a classic example now in artificial intelligence, which builds paper clips, okay? So the objects that you use to click paper. And this machine, you just told, build the best producer of paper clips. And then you can imagine a runaway scenario in which these machines just starts confiscating all the metal from the world and building paper. And everybody will be submerged of paper clips, okay? There will be nobody else being able to build a tractor to work in the fields or cell phones, etc., because the machine is taking over. What's lacking for this machine? Common sense. And you say, okay, listen, I said, good. Very good, excellent. So as much as I can get a nice house and big building and I'm gonna be rich out of paper clips, but I didn't mean that. But that's what you asked for, okay? So this is very important in shaping objectives. Just a remark, the most sophisticated robots are performing their task according to coordinates. But humans evolved another way to perform their task. I don't know the specific way, but it's not a light on coordinates. So what I wanted to say is despite performing the same tasks, they are seeing the words differently. Yeah, that's very good. So this is another aspect, okay, which is very important, and we will discuss them talking from tomorrow. So any agent which works in the physical world, okay, it's also true for algorithms in general, but it's less obvious, but for robots it's more clear. There are two interfaces with the word. One is the actuator, okay, in how the agent does things in the word. The way that legs are built, if it's four legs, two legs, if it has a sort of way to handle things, if it's a hand or something different, this is the actuator part. But there's another part which is equally important, the sensor part. All these robots have cameras, self-driving cars have lasers. So this is the sensor part. This is an interface which receives information from the environment. And the actuator part, after processing of this, whatever that does between, brain, algorithm, whatever, then produces outcomes which are actions which impact on the world, do things. So it's equally important to have very good sensing in a way that might be peculiar to machine, different from humans, like you said, and having actuating, which might be, again, similar to what humans do. So humans and animals might be a source of inspiration, but we don't have to just keep our eyes shut. You see, the machine that was opening the door, robot opening the door, did not have a hand. That's something, a different clamp, which was very well-served in the purpose. But perhaps if it was to order two beers, that would be different, right? Okay, more difficult. The actual design of robot is right now subjected to what is a human sea of the world. We build a handle, we build having in mind maybe animal limitations of nature, a stuff like that, but what I'm thinking is couldn't be devised an algorithm for self-evolution. What I mean is, for example, a machine can, in its lifespan, can learn what is its environment, and then it could be allowed to better select the components for the next generation of machines. And so, in a certain sense, how to improve itself is like artificial selection. This is itself. Again, there are so many layers in this question that it's difficult to unravel them all. So, this idea of self-constructing machines dates back to von Neumann to the 40s. This is something that's been around. More than self-constructing, self-projecting. And then they also construct themselves, right? Or they send a project to a firm somewhere. Let me have this. Amazon, please ship me this. So, all these things have been around. And, actually, you can think about exploring. The problem with evolution is that it's slow. The big difference between all the things, all the behavior that you will see here, okay, could be achieved by a very different way of learning, which is just evolution and genetic algorithms. These are entirely different class. Because there you just do things and posteriorly evaluate. It's been going good or bad, and then you clamp down your population and you replicate. Here, the algorithm's reinforcement learning has a totally different point of view. You experience, you improve, you have a goal, right? So, if you have some biology class, you always say, remember, evolution is not directed towards a goal. It's just variation and selection, variation and selection, okay? So, this is very different in spirit. So, in principle, it could work. On computer, it could do something. Perhaps, depending on computer power, one way of learning might be better than other, but we don't discuss this kind of selection evolution scenario, okay? I hope this is clear. If not, I can expand further. Any other comment? Hello. So, when you started those clips, and I thought the problem is about sensing and sensing the changes in the environment and re-adjusting your decisions accordingly, but you said in the first statement that it's not that. It's not adjusting your center of mass or something like that. It's more giving less control to your things. Can you explain that? Yeah, let me explain this. Let's not think about robots. Let's think about ourselves. So, who can ride a bike? So, when you learn how to ride a bike, do you exert some specific control? Do you obey some certain set of rules? Or just at some point you find a sweet spot and you're able to ride? We try not to fall. You try not to fall, right? So, that's certainly a punishment that comes in the end. But when you've fallen down, do you really know? Can you trace back what kind of actions you did that were good or not? And you said, okay, I should adjust my center of mass in that way when I was in that curve. Is that the way you learn? Is that the way to keep ourselves balanced in the center by adjusting our hands? Okay, then we must have a ride together and see how you perform. No, actually, it's very much of a shortcut with that casino brain when you learn to ride a bike. You just feel your body is doing these things and then you act reflexively. You don't have time to process information, et cetera. So, what I want to say was more in that sense, is that it's a very complex way these kinds of robots, these new robots, sense the environment, process the information in a way that doesn't rely at all on even simple calculations like Newton's law, right? So, there's no physics knowledge there, such as there's no physics knowledge in every toddler who tries to work for the first time, right? So, it's a very practical and non-intellectual way of... But then what does it try to do? I don't know, it works. Again, I'm going to say like this, like I don't know, actually, what AlphaGo thinks when it plans, where does it want to go with this move? I have no idea. It's not looking into a table and saying, okay, it's playing the famous game between Master X and Master Y, played in 1967 in Shanghai, I don't know. It's not doing like that. It's doing just like... It's a combination of experience, collected in different ways, merged together in a way that we don't know, we don't understand. We don't have the same experience. And the same... These robots, you couldn't possibly decode all the sequence of individual actions because it's a whole. Just a second, so, the microphone moves. But you know we have something which is already called consciousness and unconsciousness. So, when we first learn to drive a car, to look, okay, if it is not automatic, when, for example, changing gear, it is one, then two, then three, for example. So, at first we care about what we are doing, but after a while, then we are learned, for example, we know what we are doing, so we do it unconsciously. I'm not sure if these two terms are still valid in the sense of machine learning or artificial intelligence, but I think his question can be somehow translated in this way that, okay, everything which is happening by machine maybe is different rather than doing us. Maybe there is no difference between consciousness and unconsciousness. Maybe we should rephrase these things and maybe we change our minds about how human we learn. Yeah, so again, these are very deep questions. I think that the line is very blurred between conscious and unconscious, machine, human, animal, but again, that's my own opinion which is just as worth as yours, okay? So, but these are definitely, I think, very, very interesting questions in general beside the fact that we are uniting here to learn how these algorithms do and what they have to do with statistical physics or et cetera, okay? Any other comment or question? Okay, if not, we go to the final clip which, again, this one is a bit longer. So, the goal of this clip is actually two-fold so I'm anticipating a little bit. First, we move away from machines. We also move back in time, so this was 70 years ago because this is the aim to show that even if these things are very recent, very new, et cetera, the ideas have been there for roughly a century, okay? And this is very useful and interesting because you have to have a historical perspective of things of old. And it will also allow us to understand better at least a very qualitative level, the level of conversation. What's the relationship between what we do in artificial intelligence now and what we have learned about animal behavior over the years and the two things talk to each other very much. So, this is the clip. Notice that it's Institute of Human Relations and these are two humans. I cannot hear the... Sorry, I can't hear the audio. It's gonna be long without the audio. From the booth. Do you have any audio coming? Ciao, non sento più audio dal video. Adesso ho fatto ripartire un video, non c'è più audio. Puoi controllare. Ah, we're trying to fix this in a second. Ah, probably that's... I know that. What do you think is the reason for the difference in their behavior? The animal on the left is hungry. The one on the right with the food in his cage is satiated. This is hungry, this is satiated. How will the difference in drive affect their... the difference in learning? We're going to use an apparatus with two identical compartments. One for the animal that is very hungry and another for the animal that is not hungry. Very hungry on the left, not hungry, satiated on the right. At the outside end of each compartment is a food delivery apparatus. This consists of a stirrup-shaped bar placed above a food dish. Pressing the bar delivers a small pellet of food into the dish immediately below. Ok? Press the bar then a small pellet of food falls into the small container. Then they take the mice and put them into their respective... We will now put the hungry and the satiated animals into the apparatus and watch for differences in learning. After being put in an unfamiliar environment both animals are active. But the hungry animal is somewhat more active. They don't know the place first time that they visit this cage. So they look around. The very hungry one is more active than the not hungry one by some measure. Ok? If you become more active. The hungry animal remains active. But as the satiated one becomes adapted to the new environment he settles down and becomes inactive. So there's no problem around. My belly is filled. Take a nap. Watch the wide variety of responses which the hungry animal makes. This is a lot of smelling and sniffing so when rats out like this they smell a lot. Mice or rat I don't actually know. So all these behavior experiments started in the 20s with work of skinners. Since each of these responses occurs without reward it is soon displaced by a new response. The behavior is variable. There's still a lot of exploration and sniffing and looking around. Apparently no food here. What responses do you think the other rat is making? The satiated rat is inactive. But even if he had hit the bar and got a pellet food would not be a reward without the drive of hunger. The hungry rat is active. He stands up near the bar but just misses it. The correct response of pressing the bar cannot be rewarded and learned until it occurs. He approaches the food device stands up near the bar presses it but since he does not see the pellet and the food cup he is not rewarded. Now he finds the pellet and is rewarded for approaching the food cup. From now on he confines more of his activity for the region of the food dish. The next interesting step just a second to clarify this. So it went up and pressed on the bar so the pellet fell but he didn't see the pellet in the box so he just moved away so there was no association between the action of pressing and the reward coming from the box. But later on he just got to the box and said oh there's food there. So in that case there was reinforcement for staying around that area but not yet for performing the correct action. Ok? We've seen this. Is it? No, it's ok. At last he hits the bar and gets an immediate reward. Has he learned will it take him as long to press the bar the next time? Although the animal still makes different responses he presses the bar much sooner the second time. The occurrence of a single reward has strengthened the tendency to make the response of pressing the bar. Notice that the rat performs the second. So the first time that he does this he makes this association. Again this is something which is peculiar to animals. Machine wouldn't learn so fast on first try. Ok? Animals, babies often displayed this single shot learning which is probably a combination of genetic abilities inherited abilities and learning. Right? So this is something that pure learning from scratch in machines doesn't happen. You have to repeat these things many, many times before establishing the association. Ok? Exactly the same response which was rewarded. Is there room for improvement? In the next trial you will see that coming up to the bar from a slightly different angle the rat makes a new response. He goes around the side of the stirrup instead of down through it. So by chance it discovers that going by the other side you can have the same thing. He still makes a few irrelevant responses. Watch him go around the side of the bar again. This response gradually wins out because it is rewarded the fastest and with the least effort. But that's true in general. If there's habituation there will become less and less interesting there's less motivation. There are two things, motivation and reward. On the next trial you'll see that the animal starts down to the food dish before he has pressed the bar hard enough to tell it. He makes several anticipatory errors which are not rewarded and regresses to his old response of diving through the stirrup. He's been pressing a couple of times but it was not just hard enough so he goes oh it doesn't work. These anticipatory errors occur for a number of trials and then are gradually eliminated. It eliminates this part because the reward was partially diluted by the fact that he had to choke himself through the ring. Ok, then the movie goes on and shows another experiment which by no means would be possible nowadays. So you might find this a little bit disturbing. So this is an experiment with punishment. Essentially there's an electric grid and there's a voltage applied. In this device we can put a mild electric shock on the grid on which the rat stands. The shock is adjusted to be annoying but not painful. Ok, so they're going to put an electric wiring at the bottom of the cage and they say we're going to modulate the voltage so that it's just annoying and not harming. So that's the experimenter that says that of course. The rat which doesn't seem to be simply annoyed and the idea is that if they press the bar now this will stop the current from flowing into the wiring so it will stop the bad sensation. It will interrupt the punishment. Shock is on, shock is off. Pressing the stirrup bar turns off the shock. You will see that it supplies enough drive to produce a radical change in the behavior of the satiated rat. He hits the bar, the shock goes off and he's rewarded. It's clean to that for his life. He hits the bar and is rewarded again. Even a satiated rat it is provided a stimulus which elicits motivation. After a few more trials he is learned to press the bar quickly as soon as the shock goes on. Ok, fine. Good enough for our sadistic show. Last session 3 minutes discussion What do you think about it? What's the connection with the other things? Does it have anything to do? Ok, I gave you some background already. 3 minutes, it's very short.