 So who has the microphone? OK. Yeah. Any questions from that side? Any comments? Anyone? Anybody wants? Just talking a little bit. Can we see the same reaction for a human? We can try. I don't know. How much do you act according to rewards? How much do you feel that your behavior is affected by rewards? It depends on the reward. OK, good. So why are you sitting here and you're not just, I don't know, visiting Venice this morning? I could have been a star. You did, OK. Too bad, bad example. Any other place you would like to visit in Italy? Rome. Rome. Why aren't you in Rome now? So I have a punishment that awaits me if I don't. So you're acting on the basis of I could gain something in the long term. I could get punished in the short term, OK. So your behavior is somehow shaped by some high-level notion, OK. Of course, most of human behavior is a highly filtered way of behaving, in which you do things for some reward that might come many years ahead, OK. So this requires a very, very different. So why do you, I don't know, any runners in the room? People who go running. Running. You don't exercise. Any other sportsmen, sportsmen in general? Oh my god. OK. So those of you who do, why do you go running? In a wet morning, Sunday it's raining, it's cold, it's damp. This happens at these latitudes. Why? Make you feel happy, OK. Something very ethereal, makes you feel better. You know that it's better for you, even if you don't experience immediately, right. At some point you get reward in signal from your body. Serotonin, dopamine, they jump in and they give you reinforcement and then you feel good and then you want to run more next time. And then you break the knee, OK. Fine. When you're running, for instance, you also do other things, right, you do. I want to run the node at a certain length and then when I'm running down I feel tired. There are some points where you just have a crisis and you say, OK, let's run until the next heel, let's run until the next sign, OK. These, again, are small rewards that you're shaping up yourself in order to produce, to optimize your behavior. This is also a very interesting thing. So reward and punishment are present in our behavior in ways that are more subtle than just getting a pellet. Anybody has a dog? When you train your dog, assuming you do, to do simple things, like just, I don't know, don't bite my son or like this, right. You do this by combination of very different things. Punishment, small rewards, which might be physical rewards. There's like, you can train about using small bits of food. But also emotional rewards, OK. They matter a lot, OK. So there's a whole spectrum of these things. So human behavior is often very much impacted by rewards in different ways, many different definitions, which might not be just getting money out of the thing, but it's more motivational, getting pleasure, et cetera. So it's a large spectrum. But rewards are there. Yeah, that's a very interesting question. It's somewhat diverging from what we're going to tackle here, because it's also a problem of behavior with many agents, right, which is something. OK, so I don't want to delve into this. It's a very interesting question we could discuss about this. But it takes us a little bit too far away. So apparently, I would say that the best answer is apparently no reward in the short term, but there might be reward in the long term for this behavior. And this might have a complicated way to be forecast, or might be some sort of a hardwired in our experience of behavior, so again, it's a very layered question. We might get back to this perhaps in the lectures. Sorry, I have a question. Here, I was a little bit skeptical. What is the reward here? Sorry, so can you say it again? But what is the reward here? So getting food is reward, or just disappearing the annoying sound. So in one experiment, it was getting the food. In the other experiment, reward, positive reward. In the other experiment, there was no food. There was just a negative reward. That is a punishment applied at all times. And the action was there to interrupt the punishment, which might be considered as a reward in that case. You are rewarded because your punishment is interrupted. So do you think that rewards and punishment are just so I'm talking about especially rewards and punishment like positive things or negative things or an axis. Do you think that's correct? And another question is that Paul, just sorry, Paul, who thinks that punishments are just negative rewards? Who thinks that rewards and punishment are totally different things? You can think about it. The good news is that from tomorrow on, we will regard punishments just as negative rewards. So it's a simpler setup. But in psychology, it's not that easy. So that's also a reason for which we don't talk much about learning humans. And we are interested in machines and algorithms, et cetera, because there are many subtleties to human responses. And perhaps sometimes even animal responses that are more difficult to map just into a single real axis in which you get rewards on that hand and punishment on the other end of the axis. But you know, this is not a symmetric example because the reaction rate of this mouse was not the same as the reaction of those. Because it's really reactive faster to stop the voltage. Fair enough. It was not intended to be just the opposite part of it. Fair enough. Another couple of questions, and then we sum up. The second experiment was on the mouse that didn't learn anything before. The first one is on the hungry one. No, these were two entirely different rights. They all look the same. That's a problem with rights. This is not hungry. So it didn't learn anything before. They were two different individuals. They had gone through the same process, being hungry or being satiated, but two different individuals. They had no knowledge. The environment was unfamiliar to them in the same way. It's not the same. That's a good methodological point. Thank you. Just another question over there. OK? So I've perceived the behavior in a different way in base of the setting. Like, the rewards were the same in the case of food. It was the same for both the mouses. But one actually wanted and perceived it as a reward. The other one didn't need it for its end goal that was survival, so didn't search for it. How do we implement this in the machine? OK. That's a very good question. So there are two things, motivation and reward. How are these implemented in machine learning? That's a very good question. We will touch a little bit on to this in the lectures. Just keep the question alive for the moment. There are ways of doing this, of course. There are ways also of inducing attention, which is another behavior which is important. Because if you have motivation but you lack attention, you will not be able to make the associations. So all these things are something. So the basic message here is that there is a continuous flow in and out from animal behavior and animal neuroscience into machine learning and reinforcement learning and operant conditioning. All these things have been permeating the research over the years. So we have to keep in mind that these ideas were long. They are the same ideas that we will use and exploit to construct algorithms that are the same algorithms that AlphaGo uses, eventually, are there from the knowledge of animal behavior since the 20s, 20s to the 40s. Just a couple of quick questions. We have to wrap up lunch. I need my reward. Sorry, without the sector of a pre-construct reward, how the machine could understand what is a reward and what is a punishment? So you have to build that. So that's exactly the point. When you construct a machine which does this kind of job, you have to give an objective. And the machine will gauge this behavior, means saying that it's good or bad, depending on the objective, on how much it performs on that particular scale. If you don't do that, then there will be no tendency to improve on this. We had a question over there. Once I work with DFT, we were trying to minimize energy of a given system. The point is that different potentials work in this goal. So is there any way to measure? OK, it's almost the same thing as they ask. Is any measure of reinforcement or motivation not being punished in a continuous way? Let's say that if we are above a certain level of threshold, there is going to be a better learning. I don't know if it's clear. OK, so I hope I got the question correctly. So the general question I can extract is that, if you modify the way that the rewards are distributed, you can have the same outcome, the same final policy or decision that you want to achieve. And it can be faster or slower depending on how the way that rewards and punishment are distributed. That's certainly true. And it's very important in constructing machines that learn quickly. So that's for sure, and we will discuss this point. I hope this is the thing that we're looking for. Super, very last question. Yes? Can you get the microphone? Sorry. We can keep on discussing offline, but we have to wrap up the thing. It's always measurement between the present and the forward. But it's electrified and present. So it's doing like joint measurements. And we know that puts it, they are strongly correlated. Classic, they're not the same. They're just classically correlated. So it's typically said, change those methods, like put the button on the floor, and put them in single state. In that case, does the mouse learn anything? I'm really surprised, because I never thought about this. That's the first time I think about it. I don't know. Most of what we'll be doing just concerns with classical ideas where everything must be observable or partial observable and correlation rise. I never thought about this. I don't know whether there is any. I'll think about it and give you an answer. So sorry, I don't want to get short, but it's about time. Let me finish by outlining very briefly so what would be the content of future lectures. So we will have four lectures this week and one on Monday, which will be about the theory of reinforcement learning. So we will be discussing simple examples, moderately simple examples of what is reinforcement learning, depending on how good is the representation of the world that you have, how precise is the way of activating your decisions is, and how this all depends. For all this, the basic mathematical tool will be Markov chains. So if you feel that you are not familiar with this subject, there will be a tutorial this afternoon at, Mohamed, 5. Please stand up. You too. Please stand up. These two guys are kindly in charge of giving this tutorial. So please, please, please, if you feel that you are not comfortable with this, it won't be very, very difficult, but I don't have time during my lectures to recapitulate everything. So please, spend some, I don't know, 45, 1 hour, 45 minutes, 1 hour time this afternoon to keep up. And if you still have problems on the way, just let me know. We'll try and catch up with this. Thank you. Then after these five lectures, there will be two practical lectures in which you will see the same algorithms that work in AlphaGo do another more simple task in reinforcement learning. You will have Python codes, examples, coupling deep learning with reinforcement learning to produce this kind of optimization behavior. The two persons in charge of this are Mateo, who just left the room. I was the guy in the corner who talked before, and Alberto, who never entered the room. And we'll show up at some time. So there are two persons we'll just guide you in this. Then we will wrap up everything on Thursday. And on Friday, you will have an exam, which will be theoretical exam. So there will be examples taken from examples that we've been doing during the lectures. There will be slightly modifications thereof. So if you pay attention to what we do during the lectures, you should be OK. Tomorrow, we start with reacting. See you tomorrow.