 Okay. Thank you. Thank you. So, there will be one fact and one opinion in this talk. And the fact is, sorry, I will just let the people in. All right, guys, this already started. Oh, people on the back. This in principle already started. There is nothing on the screen, but that's just a detail. Okay. Thank you. So, this is going to be a little bit different with respect to other talks, mostly because I'm not a member of this community or only marginally interested in the aspect of bacteria physiology and metabolism and ecology. But nevertheless, the purpose of this talk is to provide one fact and one opinion. And the fact is that there exists a coherent, well-developed mathematical and algorithmic framework to address decision problems in complex and dynamically changing environments. Okay. So, this is the fact. And I will not ask myself with the purpose of giving you all the details of this mathematical construction, but just to provide you a sort of map of the landscape of the kind of questions that people ask and find solutions for in different contexts. And the opinion is that this framework to address decision problems in dynamic environments, my opinion is that this could be a very useful language and toolbox to address several questions in microbial physiology and microbial ecology. So, I will sort of try to give you the basic concepts and ideas that pertain to this framework and of course let you do the work of making your own opinion whether it is potentially useful or not. Okay. So, let's start with something that we would call the theory of decision processes. So, there will be very little mathematics, like I said, but a few ingredients are necessary, but they are very simple and easy to understand. The main concepts that make up a decision process are three. One is called the environment. The other one is called the agent. And what separates the environment from the agent is an interface. And this interface is where all the exchanges between agent and environment take place. And these exchanges take typically two forms. One form is a sensor exchange. So, there is information flowing from the environment to the agent. And there is a second exchange that is a flow of action in which the agent executes actions that have an impact on the environment, might have an impact. So, the environment is characterized by a quantity, a vector, an unspecified object, which characterizes the state of the environment. It might be a very huge dimensional object so far. Okay. Another key thing that I need to tell is that everything takes place in time. So, in the most general case, we are interested in this process doing something over time. So, the counterpart of the state of the environment, which is something external to the agent, is an internal state of the agent, which we labeled by the letter M. The choice of M is not in coincidental because we also mean it to be a memory. It's everything that is inside the agent and plays the role of a memory. I will outline the formal setting very briefly and then sort of we try to understand what kind of things might mean in a biological context. And like I said, the exchanges between the interface and between the environment and the agent occur through two objects. Okay. So, this is called the state. This is called the memory. This will be called observations and this will be called actions. So, given a certain state, an observation can be made. Okay. So, this is the process which we could call sensing and given a memory and an observation, an action is taken. So, this is probably what we would call the decision or the control. As a result of the decision made, the world can go on to a new state as prime at the next instant of time. So, this might be time t and this might happen at time t plus one. At the same time, the observation is encapsulated in the new internal state. Okay. So, this information is processed somehow and together with the action, these are collected and manipulated. There is a computation process taking place here that gives a new internal state of the system and so on and so forth. So, this is a very abstract and high level description which makes sense because this kind of abstraction comes from the merging of different ideas coming from animal behavior, from human behavior, from psychology, from neuroscience, from operations theory that is the theory that tries to give a mathematical description of how factories work or how you do logistics but also from control theory and engineering. Okay. So, it's a sort of way of encompassing all these kinds of approaches into single mathematical language. But, for instance, just to give you an idea, there are two examples that might be sort of, might be resonating with your interest. One is if we think the agent of a cell as a cell, then the environment might be the chemical environment outside the cell. The process by which the environment is sensed might happen through receptors. And these receptors might interact with the internal machinery of the cell, kinases, which respond to the binding of the ligand with the receptor by changing the internal state, changing the phosphorylation level of several downstream effectors. The actions that are taken, the decisions that the cell make, can also affect both the environment and the internal state. And so, also, these are kept, just reason to keep track of them. The reverse situation is, for instance, when you try to manipulate a chemostat or a turbidostat or a bioreactor. In this case, the environment now becomes the cell culture. And the experimenter or the engineer is the agent, which observes certain concentrations or certain levels of optical density. And according to this, might decide to manipulate, keeps records of the observations, changing things in order to, for instance, over time, optimize the yield or... Okay? So, this is the crucial point. You have to ask yourself, can I frame my problem as a decision problem? Yes or no? If it's no, you can go and take a coffee now. If it's yes, to some extent. Then the second exercise is, how do I connect these abstract concepts with the things I'm interested in? And once you succeed, everything else is downhill, more or less. Okay? In the sense that there is, like I said, the framework, which I will try to give you just a glimpse of, that can address questions like, how do I extract information from these things? What are the constraints? What are the things that I can do and cannot do? And more importantly, if I couple this problem with the notion of some objective, how do I optimize my decision-making in view of these objectives? And what I mean, I do optimize is, how do I find optimal solutions and what are the techniques and what can I do in several different conditions? Okay? So, this is the outline. Mathematically, and this will be really the last few questions that I'm going to write, there are some objects that describe these arrows. So, one is a transition probability from the previous state to the new state, given the action. Okay? So, all this framework is cast in terms of naturally stochastic processes. Okay? So, noise is already compassed in the formulation. Second object is how observations are made. Okay? So, what is the probability that I make a certain observation? Why? In my space of observations, which I need to have identified a priori. And how does it depends probabilistically from the state of the external environment. So, if this is a receptor and this is the outside concentration, this y might be the binding state of my receptors, which could be 0, 1, or if it's an allostyric receptors, 0 and whatever. And then, there is another object which is the probability of taking a certain action, a, given the pair m and y. Okay? Which you can think of. I'm making a decision based on the current percept, what I see now, a snapshot of the current station of the environment, and some memory that I collected in the past. Another, just a second, I finish the sentence, and then another example, for instance, in bacteria chemotaxis, you make decisions based on the current concentration and some integral memory of what you have seen in the past, which is expressed in terms of methylation level of the receptor. And based on that, you decide whether to run or to tumble, if you are E. coli or whatever. Okay? Question? No, yes, that's exactly one of the key points, where to place the interface. Okay? So, year y doesn't depend on memory. It's something which arrives there. Everything else is post-processing, if you wish. So, if you desensitize your receptor, that's something that is the effect of memory, but you do it downstream. Okay? So, this part is really placed at the outer level of your information processing. You might make this thing as complex as you want in principle. Of course, then this will impact the way that you are able to do stuff properly. But there's no limitation in this. This object could be a very long vector, which contains all the occupancy levels of all receptors, if you wish. And then how do I encode how many there are? This is your choice about how long this observation vector is. But I can change the state of the cell to have only one receptor, zero receptors, 100 receptors. Okay, fine. In that case, you would have to modify this slightly in the sense that you would have to find a way to back-react on the kind of observations you make. That's your regulation of the number of receptors. That's the point. They don't know how to separate the Y from the M. Yeah, you could tweak the arrows here in a way to include this as well. But for simplicity, I'm sort of drawing the simplest diagram that you can handle, but you could tweak it appropriately. Okay. So, the last thing is this process here by which your new memory is generated possibly stochastically from the previous memory and the new observation that has come into the game. Okay. So, in general, there is a distinction between this part, which is typically assumed to be a property of the external environment. So, it's not something that the agent has sort of control over. Okay. Whereas this part is the part where the agent is susceptible of performing or to be optimization. Okay. So, there is that there is a way to change parametrically these probability distributions. How do I decide things and how do I process the information that I receive in a way to optimize a certain goal? Okay. So, here, this is the first important fork because there are two different approaches now. One is I would call modeling. So, modeling is the agnostic viewpoint in which you say, okay, I don't know what my agent is actually interested in doing. I'm only interested in collecting data about my system. Now, I've identified what all these things mean. I can collect data and I construct by modeling these probability distributions starting from observations. Okay. So, this is the agnostic viewpoint, which was very well expressed by our previous speakers who said I have no idea whatsoever what cells want to optimize. That's it. On the other opposite end of the spectrum, there is the point of view of optimization, which of course is everything but agnostic is a strong a priori bias. It's principled. It says that I have an opinion about what the system is trying to achieve and if I assume that in a sort of top-down approach, and I will show you briefly how to do it mathematically in this case. If I assume that, then this provides me with a principle and then I can turn all the cranks, start all the machinery and come up with predictions of what would be the optimal behavior of the system and then compare it to what I see. Okay. Here, you have to take a stand. What I will discuss in the following is this part. But of course, this first part is also interesting because you could ask yourself questions about, given this framework, how do I extract information about these things in this language through inference, through data analysis, etc. But I will not discuss in this question. Yeah. So when it comes to the agnostic approach, even then you are making some demands of the system, right? You are expecting the system to follow certain rules, for example. And so don't you make a priori assumptions? Yes, of course you do. I mean, so how do you differentiate between the assumptions that you make in an optimization scheme and so far, just I'm seeing a certain behavior and I made assumptions about how to describe this behavior in terms of these objects. And these are strong assumptions and say my memory is made of the phosphoration state. My observations are the binding states. My state is the concentration. Okay, these are clearly strong and eliminating assumptions. But then I'm not making any assumption about whether the system is optimizing this behavior towards any objective. This is an additional step that I have to take. Is there some way to mathematically, let's say, reverse engineer the optimization problem? Answer in 22 minutes. Okay. So if we focus on the optimization principle, we have to add another piece to this picture. That is, what is the objective function that we want to optimize? And the largest part, but not all of the of this decision process theory, it focuses on maximizing an object which is based on another, yet another quantity, which is called a reward. Okay, so at every step of this process, there is another scalar quantity called the reward, which quantifies how good was your decision process in this step. Okay, so it's something that occurs on a very short time horizon from one time step to the next. And the goal of the optimization process is to maximize the expected cumulative reward. So if you are at any given instant of time, given your decisions, there will be a trajectory in this abstract space of external and internal states, and you will be collecting a sum of rewards. Maybe the reward will come only at the very end of your process, many steps ahead. So the fact that you can encode for long-term rewards makes the problem extremely challenging, because short-term rewards, you will behave greedily. But long-term rewards require planning, require some sort of ability to forecast what will happen in the future as a consequence of the actions that we take now. And it's also expected because everything is stochastic, so you have to sort of nail this object down to a real variable, and you take expectation values. In the end, I will also comment on how to go beyond this. Okay, yep. In principle, I should expect also to minimize the variance. This is something I will address in the end. This is a so-called risk-neutral approach, in which you care only about expectation values. I will comment in the end if I ever write there about how do you deal with variance and uncertainty. Very good. So a few facts about this scheme. So like you mentioned, the way that these objects are constructed, the choice of your state space, the choice of your memory space, how these objects can be parameterized or written represents or represents constraints. So all of this is part of the more general framework of constrained optimization. And of course, constraints will affect the kind of answers that you have. Second remark. You can distinguish policies, decisions in reactive policies depend only on the current observation, and memory-based policies are the more general ones. Reactive policies are in general suboptimal. They achieve much less than what memory-based policy would do, which is sort of a general explanation of why you should have some sort of computation and processing inside the agent. But there's one important exception is that when, if the observations are the states themselves, so if you see in full the external environment, this is called perfect observability, you can have an optimal strategy which depends only on your percepts. Reactive strategies can be optimal. So you can really ignore this. Why is that? Because if you observe the state, then the top part of this diagram is a Markov process per se. So you don't need to keep memory if, since your state is already predicting everything about the future. But this is a very special case. In this particular case, your optimal policy is also pure. In the sense that in every state, there is one specific optimal action that solves the problem, a single one. Whereas on the contrary, if the observation is only partial, the general solution of this problem depends on the memory, and now it is encoded for information. And also, typically, you can have random strategies and be optimal. So this is a very important general message that whenever the observation of the external environment is partial and you cannot base your strategy on the full state of the environment on the Markov state, then there can emerge random strategies that are optimal. Question? I was thinking about the part of maximizing the expected cumulative reward. So if you're accumulating actually the reward a long time, so you need to keep a track of when you started doing that. There are different classes of problems. There are problems in which there is an explicit time dependence, in which, of course, what you will do will depend exactly on the time at which you start your process, that would be a non-autonomous system. But there are situations in which the system is stationary, statistically stationary, in which the decision policy will not depend on the instant of time. Both of these can be addressed by these techniques with different Okay. Other remarks that I'm done with this comment so far? Yes. Okay. So how does one find the optimal strategies? Once this problem is defined, there is a very well-defined mathematical question. Find what is the best way of acting in this setting? And you could do this by keeping the memory fixed. You can do it by including the memory as part of your optimization process. But nevertheless, there is a clear, defined question of optimizing over some set of parameters that describe these position probabilities in order to achieve the best outcome in terms of cumulative rewards. And when you do this, there's actually a further classification to make, which I maybe I'll try to keep it here, if I manage. So you can divide very different situations in terms of two different components to your problem, which are roughly expressing your level of epistemic knowledge, a priority, if you wish, and your level of observability. So what is the epistemic knowledge? Is how much my agent knows about these things? So does the agent have a accurate model of what happens outside? Or it does not? This model of the external environment in higher organisms might be something that comes from evolution, might be hardwired in the genetics, or might be something that has been learned during development and then put the test into a specific task. Okay. The point is how much of this priority knowledge is available to the agent in the decision process. And this clearly sets a very big difference in how you try to solve this problem and the techniques that you use. When you know a lot about this, this would be the direction in which you say you are working on a model-based setting, because the way you try to optimize this problem, we rely on your knowledge of this. How the impact of how accurate the model is, is a problem in itself, but let's discuss the situation in which the agent has a perfect knowledge of how the world works. Exactly. And this would put us on the far right of this diagram. Otherwise, on this other side, that is what is called the model-free situation. In which at this extreme, you have no idea whatsoever of how the world works. And of course, in order to optimize in this context, you will not be able to rely on a priority knowledge, which you will have to replace with empirical knowledge. Okay. So on this side, it will mostly be a problem of computation. Given this input, the problem of finding optimal policies is the problem of planning. These are the inputs. The outputs are the optimal policies. It's just a problem of computation. On this other end, it's more of a problem of learning. Data come in, and from starting only from data, from experience, you will have to find the best behavior. And typically, this is done by algorithms that reflect the common idea of trial and error. You start, you try something, you get some feedback, you update, and you control, and you improve. And in doing so, you construct projections and estimates of what would be your future. If the outcomes are satisfactory or optimistic or pessimistic, you change accordingly. And these algorithms, which are based on this principle, provide a way of optimizing without knowing how the model, how the world works, just by interacting with it. And of course, there's a whole lot of stuff that happens in between when you have partial knowledge about some aspects of the model. So maybe you know how your receptor works, but you don't know how the world changes. How do you deal with these hybrid approaches by a mix of techniques that come from the two extremes? Along these other axes is the notion of observability. At the top, you are in this situation where you truly see the states. This is the perfect observability setting. You work directly with the perfect knowledge of the environment. There is no filter through a sensory system that throws away information. Or if it filters, it does so in a way that it keeps all the relevant information. And here at the very bottom is the situation where the system is totally unobservable. You just move around in the dark. Maybe you can keep track of your own actions, in which case you still can do non-trivial stuff. An example from biology is what is called depth reckoning or inertial navigation. Even without knowing where it is in space, an insect can know how far it was from the initial location as a vector, just by integrating velocities or accelerations along its trajectory. So you don't observe anything outside. You don't see where you are in a map. You just see your own actions. And then by integrating in time, you can know where you are from your starting point. If you don't have access either to observations, contextual observations or actions, then you're totally in the dark and there's little you can do. And that's the bottom of it all. Okay. So there are basically three extreme cases on which everything is built upon in this game. One is the situation in which you are the top right of this diagram. And this is where MDP Markov decision processes live. In this upper corner of the diagram, you have a way to compute the optimal policy given the fact that you know how these two things are. And in particular, since you are perfectly observable, you know that your observations are exactly the states. So in this upper corner, the key tool is an object which is called the Bellman's equation, optimality equation, which per se is a nonlinear vector equation, which you can solve by a variety of techniques. But probably most interestingly for you, there's a way of rephrasing this Bellman's equation through linear programming. So this top right case in which you have perfect observability and perfect knowledge of your model, you can solve by linear programming. And this will give you the optimal actions in every context you find yourself. So what happens if you move down along this axis? So you still know the model, you still know this, but now the observations are partial as it almost always is. Then you enter the realm of what are called the partially observable Markov decision processes. Here you can still solve the optimality by techniques which are more complex. The Bellman's equation in this case becomes, which was a nonlinear vector equation, it becomes a nonlinear functional equation, much harder to solve, but still solvable. The key trick is that in order to solve these PMDPs, you can use Bayesian inference to understand how the memory collects information about the observation. So if you hear, you say that your memory is the belief, which is the probability distribution over states. So rather than knowing where the state is, you have a memory which keeps a probability distribution over what could be your hidden state. Then in this case, PMDPs are just controlled hidden Markov models. So there's a hidden Markov model which is taking place here. This state change, they are hidden, but there are observations coming from these states, and you want to control this problem. In particular, for instance, if you make assumptions about the statistic of these, if these things are all Gaussians, then this boils down to very old stuff in engineering about control of Gaussian processes, so it's a sub case. So in general, if these are your MDPs, for instance, stochastic optimal control is a subclass of this, and optimal control theory is a subclass of this. So for those of you who have some ideas about control theory, for instance, point-tracking principle for solving optimal dynamical system is just a special case of the Bellman equations, which comes from this wider concept of MDPs. So it's in this sense that this kind of framework tries to gather several different approaches into a common language. And in the end, this is the, or the real things happen in which you have limited knowledge about the laws that govern your world and limited observability. So this is what is called the full reinforcement learning problem. And here, of course, the necessity becomes to express all these objects in a way that is sufficiently rich to encode for the structure of your problem and sufficiently flexible to be prone to optimization. So I have zero time to go into this of this, but just to give you an idea, these are what the kind of algorithms that people use to train artificial intelligence to play video games or table games or the game will go. So these objects combine ideas from these different things together in a way that I'm not able to express. But the main message here is that there is a way to systematically overcome the limitations that come from partial observability and partial epistemic knowledge by using data to find the optimization. But what I want to do in the last five minutes that I have, or so, 10 minutes, okay, is to, first of all, to check if I didn't miss anything here, but it doesn't seem the case, to list things that go beyond what I've just described to you. And that are, most of them are already very solid. Some of them are still under development, but I think they bear even more interest to problems in bacterial physiology and ecology to this case. So let me use, probably, again, this part of the slide. So item number one, risk sensitive, okay, so what I forgot to say is that this ensemble of techniques that generalize the decision processes to the situation in which you don't have partial observation and partial knowledge goes under the name of reinforcement learning, mostly for historical reasons. And all the things I will discuss are extension of this basic framework of reinforcement learning. So risk sensitive reinforcement learning is what tries to address the question that this might not be necessarily the object you're interested in, the expected cumulative reward. For the reason that, like it was said before, you're not caring about variances. So if you're confronted, for instance, with two options, okay, in which you get, I don't know, say, one dollar, 10% of the times and $100, sorry, the other way around, $0.90% of the time and $1.10% of the time, and you have confronted with the option in which you're given 10 cents every time. So if you're a human and if you're a monkey, if you're a rat, you would decidedly tend towards the risk averse option, but they are indifferent from the viewpoint of maximizing the expected cumulative reward. So where does this ingredient comes from? Is that because in one case you have some variance, some dispersion about your outcomes, and the other case you have not. And how do you account for that? You account for that by extending this setting, and when I mean extending, I mean that this part will still part of a subclass of this to the concept of risk sensitive reinforcement learning. For instance, what you set up to optimize is, for instance, you might want to maximize, this is one of the possible mathematical translation of this idea. Maximize an object which is just 1 over a alpha of the logarithm of the expected value of the exponential of alpha, and these are the sum of rewards. Okay, so why is this a measure of risk sensitivity? Because if your alpha is positive, you will be very much interested in situations where you get a lot of rewards, and you don't care much when you get punished a lot. So alpha larger than zero will be favoring a risk seeking behavior, but alpha less than zero will be a risk averse. And if you take the limit of alpha 10 into zero, you get exactly the risk neutral case. Now this is interesting in several respects. One respect formal is that this bears a similarity to notions like free energy in physics. But the second and probably most important thing is that this object is the exponential of something which is accumulated over time, so it bears a clear formal resemblance to notions like growth rate. You don't see how it could be because this includes notion about the probability distribution. Yes, but these are random quantities, right? They vary depending on your policy. Yes, but I could still say I call my new reward e to the alpha sum of the rewards. But this object will become a product of things over time, whereas this before it was a sum. There is a multiplicative nature to this object that is not present in the purely. So why is that not allowed? No, I'm saying it is allowed. It's a different problem. It has different solutions. If you use this parameter to tune your problem of decision making in which you add the varying outcomes or stable outcomes with this, you would come to different conclusions depending on alpha and you would favor risk averse or risk sensitive. All else being unchanged. Distributional rewards is always the same. Your objective is changing, but you cannot re-parameterize one problem to the other. These solutions are radically different later. Okay, so second extension, adversarial. So in this description, you're trying to get the best out of a given environment. So this is fixed. Now suppose you want to formulate the same problem against a class of environments. And these environments might be trying to rig the game in the sense that they might choose in which state to be depending on the actions you make. And you want to sort of insulate yourself with respect to this variability of the environment, which could be a strategic variability. So another way to say this is you want to ensure certain robustness of your response because you want to be optimal against the class of environments. And this is what technically is called adversarial reinforcement learning. And there are algorithms for this, which integrate and expand the kind of algorithms that are used in the classical single environment reinforcement learning. The note that I want to make with respect to adversarial reinforcement learning is that there is a class of these algorithms of adversarial reinforcement learning algorithms that formally map into the replicator dynamics. So it seems at least superficially that when you deal with an unknown environment and you have to sort of choose a strategy that does its best against this class of environments in a subclass of problems, you find that this can be mapped formally into dynamics which we know makes sense from evolutionary slash ecology standpoint, which is sort of tantalizing remark. Other extensions. Multi-agent reinforcement learning. So this is against an extension of this, because this was sort of a game against an environment. This is a game against other agents, against or with other agents. Might be cooperative, non-cooperative. So in this extension, there are several agents and each agent has its own reward function, which depends on the environment and what the other agents do. So in this sense, multi-agent reinforcement learning is an extension to game theory. So that you can provably show that there is a subclass of these multi-agent problems, which gives you the setting of game theory, matrix game theory, state-dependent game theory, repeated games in case you incorporate for memory, etc. Question? Yeah. So maybe I missed this, but what I understood is that you go from state S to S prime because of the action A. Might possibly depend, but the environment could change independently of your actions. It's a possibility. This also includes for the possibility that you are making changes to the environment, but it's perfectly okay if you consider a subclass of problems in which it goes by itself or it's even stationary. All these are subclasses of them. And specifically, when you use techniques from the single agent and you let the agents play one against each other, this is exactly the kind of techniques of self-play that are being used in the applications that I was mentioning before. Another item, multi-objective. So since the reward is a scalar function, here you have a single objective, but in many situations you would like to know how the system behaves with respect to vectorial objectives, which of course, in general, do not allow you to find an optimal solution unless you're able to sort of order these vectors in some way. Otherwise, if you cannot order them, then you have all the world of multi-objective optimization with Pareto volunteers and etc. And all of these has been explained to reinforcement learning as well. One common technique is the one that is the most obvious, that is to scalarize your vector of rewards. You combine them together with coefficients and out of a vector you make a scalar. And all the coefficients that appear in this scalar are expressing the trade-offs between the various things that you want to optimize. So you can, once you scalarize a multi-objective problem, you can use it with the same techniques here. And then you can study how it behaves in the function of the trade-offs parameters. Okay, so these also might have applications. The key point is that you can do, depending on your system, you could do this analytically or you could do it in a data-based way, which is sort of the upshot of the reinforcement learning part of the decision-making process. And then last but not least, there is a thing which is called inverse reinforcement learning, which precisely asks the question that you would expect. If I observe a certain behavior, can I infer from my observations what was the reward, what was the objective? So this is a very ambitious question, as you can imagine. It's a sort of something that we naturally ask when we see a behavior, okay, but what was the purpose of that? This is an ill-posed question, in the sense that for any given behavior, optimal or not optimal, but even if it's optimal, there might be different reward structures that give the same result. Okay, there are several examples that you can think of, but it's clear that there is an infinite possibility of choosing the rewards, which sort of collapse onto the same kind of optimal behavior. So how do you get out of this ill-posedness? Well, you make a recourse to concepts like the Occam razor. So when you ask yourself, okay, but what is the simplest possible structure of the reward that would explain this behavior? So if you combine this idea with the maximum entropy approach, then you have effective algorithms that are able to disentangle the rewards. So typically what you do, you start with a vector of possible rewards. Okay, so my system cares about growth rate, but it also cares about cost, metabolic cost, but it also cares about time to completion, but it also cares, and I compile a list of them. Then I scalarize them, just the way I do in multi-objective optimization, and then I run my maximum entropy inverse reinforcement learning approach to sort out what would be the best explanation of all this trade-off coefficients. So starting from data, there is a way, a systematic way of answering to this kind of questions using this. And with that, I think I'm done. Happy to take questions. There is time for a couple of fast questions. Thanks. This was very nice getting a refresher on control again. I was wondering, is it feasible to think of sort of a cell in these different states that it can have as sort of an MPC on reinforcement, that the objectives, depending on what I have and my memory, actually updates the objectives continuously over time? Okay, yeah. So this is yet another chapter, which pertains to another thing which is called broadly speaking reward shaping. So in several applications, it's been shown that you can improve a lot learning and performance in these algorithms by adding additional terms to the reward, which might favor, for instance, exploratory behavior or curiosity-driven behavior. It comes under different names. So yes, then short answer is yes. It's an art, rather than a science, to sort of introduce the kind of reward shaping that favor your process of learning and optimization without destroying the structure of the problem. I was just wondering what was the opinion part that you mentioned in the beginning? That you should care about it. Okay. Then you make your own. All right. If there are no more questions, let's thank Antoni again. All right, good. So we are one hour late.