 So as an organizer, probably this is not for me to say, but I just want to share with the other speakers my gratitude, to express my gratitude to Changbong and to Zhongyou. We did an excellent job, that did the real job, okay? So I just was sort of overviewing the thing from the ICDP side, but they did all the real job in organizing this school and conference, and I think it was a great success, and I hope you students really had the opportunity to see so many different aspects of statistical physics applied to life sciences and beyond. So, thank you. Okay, so, olfactory search is a century old problem, and the first naturalistic observations, actually back to the beginning of the 20th century, by Jean-Marie Fabre, a French naturalist was a extremely curious person and made a series of, I wouldn't call them quantitative, but systematic observations about the behavior of moths, okay? So he discovered quite somehow accidentally that the female moth can attract male moths from very long distances, okay? So overnight he found that when the female got out of its cocoon, all of a sudden hundreds of male moths came to the room where this female was just come to its adult stage. And he made first the hypothesis that there were some chemical signaling involved into this. But of course, this was extremely contentious because of obvious considerations. How could it possibly be that the scent emitted by a very tiny animal? We know now that they emit these odors, which are called pheromones, at a rate of few nanograms per hour. How could they be able to convey a message over such long distances, given the complexity of the atmospheric environment? But since then, we have now collected lots of evidence that this is actually the case. And now there are experiments which show that moths released hundreds of meters away from the female can actually reach the female in substantial proportion, so 80 to 90% from hundreds of meters away, downwind. So this is an extremely interesting problem from the viewpoint of biology, neuroscience, but also physics. So in a nutshell, this is a very simple schematic of what happens in practice. So the female depicted here emits this pheromone signal. These are very light molecules, short molecules, which are very volatile. They can stay up in the air for a long time. This message is very specific, since these are sex pheromones and they are meant to attract males. So the typical life cycle of these moths is that they live most of their life as caterpillars. They undergo metamorphosis. And then when they turn into moths, flying insects, actually they live just a few days. They don't even have a functioning mouth. They don't have a digestive apparatus. They just live on the lipids and on the sugars that they accumulated during their life as caterpillars. And their only purpose is to find a mate. So females typically stay on the branches of a tree and emit these pheromone signals. And males scramble out to reach for the female, and then there's a complicated hardship that takes place. But one of the limiting steps is to approach the female from large distances. So what is the difficulty? Well, the difficulty is that the environment in which this signal that conveys the signal is the turbulent atmosphere. And as such, as you know, the motion in the atmosphere, which is far from being diffused in transport, mangos and destroys almost all the significance of the signal. So what arrives at the end of the main mouth, hundreds of meters downwind, is a broken message. But nevertheless, the mouth evidently is able to reconstruct the information and to perform a search process which will lead it close to the female. With a high probability of success. So in order to grasp more quantitatively how broken the message is at the receiver's end, this is a typical time trace of a tracer. In this case, it's not odor, but that's the proxy for it. Hundreds of meters away from the emission. So these are experiments conducted in the atmosphere. There is a detector which is placed hundreds of meters downwind of the source. And this is the concentration level. These are time series. And you see that the signal is extremely intermittent. There are strong bars, which are bars which are concentrated. And these are interspersed by long phases in which there is no signal at all. Remember that at the female mouth location at the source, the signal is continuously limited in time. So all this intermittent is just due to the fact that the turbulence is mixing clean air together with air with this laden with odor. In addition to this, of course, other odors that might be confusing, et cetera, but don't discuss them. So the dynamical range of this signal is enormous. So between two single bursts, they could pass a few milliseconds. And we know by other experiments that the moths and insects are sensitive to time ranges as short. So they can detect signals which are separated a few milliseconds apart in time. And the longest periods are when there is no detection at all could be minutes. So it's an almost dynamic range of this signal. And this corresponds to, if you wish, to a visual picture. So this is a snapshot, but you should imagine something that is dynamically changing from time to time. This is just a snapshot of a tracer. So these long periods of absence of signal corresponds to large region of space in which basically the signal or the tracer has been pushed away from the flow. And then the finer scales of this spatial structure are the ones which are associated with this burst. So you see there's a structure in clumps and separated by blanks, which appears when there is no signal. So some time ago, with my colleague Massimo Vergassola and with Emmanuel Diemo, we set out to build a theory about what kind of probability distributions this signal might have. And what it turns out, so this is again our experimental data taken from atmospheric experiments, what turns out is actually there is a power load distribution, which is very close to this theoretically predicted response minus 3 over 2 over a very large range of timescale. This is the duration of a whiff. There is the time that the signal spends over a certain detection threshold. And this is the up crossing time, which is basically the time that it takes for it to be visible again after a long blank period. So you see clearly that this is not at all a Poisson point process, so there is very long time correlations. And the origin of this very long time correlation is turbulence itself, because it's a process which has all scales involved, and all scales talk to each other, which of course makes it a very structured signal. And that's, if you wish, in a nutshell, the secret of being able to detect the features of a very long distance source, because the fact that all scales talk to each other means that even in such a broken system, there is information about the location of very distant sources. So I don't have time to go through the theory of this, because we will be mostly interested today in discussing the process by which the moth searches the female, navigates in space, and locates the female. So this is one of the few experiments on the field, because as you can imagine, it's impossible to track a moth over the distance of a several hundred meters. You cannot put a backpack like they do with birds, with GPS or something like that, because they are too small. They don't accept very easily. Their behavior is not the one that they have in nature. So in this experiment, which is from the 90s, they went up to a tower, 25 meters high tower. They painted the back of the moth with some bright pink color, and they used an old VCR camera to track the trajectories, and then manually reconstructed the trajectories. So this was a very painful process by which they collected something like 20 or 30 trajectories. At the same time, since it's impossible, at least it has been impossible until very recently, to track at the same time the odor. So these odor molecules are very small. You cannot put any tag on them to visualize them. The concentration are very low. You cannot put fluorescent tags on them. You cannot use lasers because otherwise you fry the insects. So there's a lot of problems in being able to visualize at the same time the odor and the behavior. In this case, what they did is that they used soap bubbles to visualize qualitatively the flow. So this was really a pioneering experiment, which actually has had no successor because of the difficulty of the experiment itself. But nevertheless, what emerged is that you can track this motion. You can see that they have a very distinctive behavior, which is an alternation of crosswind trajectories. So the wind is blowing from this direction. This is the main. The main does surges, so goes towards the wind. It's called the surge. But then there is this crosswind excursion, which is called the cast. Then there is another surge. So these arrows are proxies for the direction of the wind, as measured by the soap bubbles. So it's going against the wind, then casting, then against. So this alternation of crosswind and upwind movement has been dubbed by the entomologist as the surge cast model. So this is an idealization of a surge strategy, which would work as follows. As long as the moth is in contact with the odor, so receives detection, the odor is above the threshold, it moves upwind. When it loses contact with the odor, then it starts this program of casting, which has alternated the sideways motions that are meant to locate again the plume, which is this part of odor that you see, and then again a surge, et cetera. So this is the abstraction that entomologists came to when observing data. But of course, one wonders, where does this strategy come from? How can we explain it from first being supposed? Does it emerge from a certain optimal surge process? So these are the kind of questions that we are asking. So I should add, before we move into the theory part, that very recently there has been a very nice experiment by the group in Yale. They discovered quite serotoninously that walking flies can actually respond to smoke, the humble smoke. And they actually like smoke, so they are attracted to smoke. So this solves quite accidentally the problem of visualizing both the attractant odor and the behavior of the flies. These are walking flies, which are also confined in almost two-dimensional environment, the two glasses that are keeping them in. So I can show you the movie here of what happens in this case. So you can do a lot of beautiful data analysis. So this is the evolving odor. These are several flies which are walking. So at every time you can locate the fly, its head, its head in direction. And you see that the other flies sometimes. And you see that all the others passes by. So you can measure the speed, the orientation with respect to the wind, and the signal at the same time. So this is a treasure trove for modellers and for theorists in order to try to understand what kind of decisions are made. The behaviors that are observed here are different from the one of the moths. But the question stands, can we rationalize these observations within a certain theoretical description? Which brings me to the second part of the talk, which is the more conceptual one. So how do you approach a model, a problem like this? OK, you have several levels of description that you can decide to look at. You could try to model it. So you cook up a model with certain rules. And then you see whether it fits the data. And then you can compare models. Or you can take a higher level viewpoint, if you wish. An algorithmic viewpoint, and say, I want to formulate this problem as a problem of optimal decision making process. And that's what I'm going to do in the following. So here in this slide is just a very, very short overview of what are the key ingredients when you describe any process as a decision making process. Incidentally, the techniques that I describe here belong to a branch of machine learning, which is called reinforcement learning, and was mentioned by Pankaj Matta in his lectures. It's basically the third branch of reinforcement learning, which is machine learning, which is concerned about prediction and control of a dynamical system. And of course, a navigation problem is a decision problem. It's a control problem, in essence. So what is the abstract description of any decision making process? Well, there is the environment, which is specified by some state. And there is an internal state of the agent, which is specified by some memory. I call it memory. You can think that these are all the internal degrees of freedom that the agent has. Could be the brain. Could be any chemical, biochemical process inside a cell. So this is a very high level description that includes a data. I wouldn't know what kind of processes it does not include, actually, as a matter of fact. And between the environment and the agent, there is this sensory-motor interface, which is the place where information is received and where actions are made. So that's how it's called the sensory-motor. This is something that comes also from the engineering community. You could describe the behavior of a robot in the same way. So the state emits observations. The state could be a very, very high-dimensional vector, which includes all the possible configurations that index in our world. But you observe only a very reduced part of it. These are your observations. Then the internal state, a combination of the observation that is received and the internal state, the memory of past observations, the prior information, can conjure to take an action. This is the decision that is made. As a result of the action, the state of the environment changes. The action affects the environment. And then the previous memory, the current observation and the current actions are mapped into new internal state. So the agent has received an observation, has made a decision, and then puts in memory everything that was passed. And then the process repeats itself. So in this global view, this is a Markov process, which is mediated by these observations and actions. So this is a very general framework. If you wish, you could rephrase Maxwell-Demons with the same language. So this is the signals that you get from the environment. This is the controller. So that's the very same ideas that work. Only that we do not care about thermodynamics here. We care about a specific goal. The objective is not to minimize heat or maximize work. The objective is get to the goal in the shortest possible time for our search problem. So in this case, in general, the objective is to minimize the sum of the costs that are incurred at each time step. So every time the time step passes by, you pay a price for that, and then you want to minimize the sum of costs. And minimize over what? Well, you can minimize, typically, over the decision you make. This is one possibility. Or as we will see in the future in the following, you can minimize also in the way you update your memory. But essentially, this boils down to an optimization process, which could be done by different techniques. OK. So what are these things for our factory search algorithms? So these very abstract things, what are they? So the states for a search process is the position of the source, which is typically unknown. We don't know where the signal comes from as we search for it. And the position of the searcher, which might be known or unknown. So typically, for simplicity, it's postulated that the agent knows exactly where it is in space, but does not know where it is the source. But it's possible to use broader formulation, in which even the position of the searcher is known only to a certain degree. The observations could be auto-detections, in our case. Did I encounter the other? Where does the wind come from? There could be also visual cues, references that the insect has, where is the ground, where I'm moving respect to the ground, trees, et cetera. All these things could be included as observations. The actions are move towards the wind, move away from the wind, move sideways, change your speed, stop, progress, these kind of things are all actions. The objective, like we said, is minimize the total time to reach the source. And the cost is, in this case, very intuitively, is just that every time that one time step passes, then I have to pay a unit cost. Minimizing the total cost means minimizing the time to reach the target. But what is the memory? Okay, here we have the freedom to choose different kinds of memory. So in different algorithms, different known algorithms, that have been used in the past, can be categorized according to their use of memory. And so that's what we're gonna do now. So for instance, this is one famous algorithm that was introduced some 20 years ago. And this is a very simple algorithm in which every time that the agent encounters the other, you can think that the source is here and it's emitting particles, for instance, which you can use as proxies for other concentration. And every time that the agent meets a particle, it starts a program. And this program is go upwind, then turn left, then go upwind, then turn right, right, then go upwind, then turn left, left, left. Okay, you see, the zigzagging is a program which goes on until a new detection is made. At this time, there is a reset and the process starts over again. So what does it have to do with our scheme of our process? Well, in this case, very simply, the memory is just a clock. Okay, so you can think that this algorithm is doing the following. It starts from here, then if there are no detections, you move up here and you go left. You move up here and you go up. You move up here and you go right. If you make a detection, then the clock is reset and it starts over again. So this is a diagrammatic description of what the algorithm does. So this is an algorithm which has certain features. First of all, it's biomimetic. So we're just copying nature and trying to distill the recipe of what the entomologist said. It's mother-free in the sense that we are not making any kind of assumption about how the other distribution is generated. Okay, this works potentially for any distribution as an algorithm. How effective it is, it will depend on the properties of the signal. There is basically no systematic attempt at optimization. You could decide, okay, I'm gonna change these rules. Maybe I don't make, once I've left, I make, I don't know. I choose another sequence. Rather than zigzagging linearly, I can zigzag parabolically. Okay, I could decide the duration of these intervals. I could, but this is not systematic. I mean, you would make different trials and see which one is better, depending on the environment. Okay, so this is not a kind of algorithm approach which is built for optimization. An entirely different approach is to use to go fully Bayesian, okay? And so to say that the position of the source is a random variable that I don't know. And I have a model for the kind of detections that I receive. And I can use this information to update my prior into a posterior and then repeatedly, continuously do the Bayesian update. So what is the probability distribution of the parameter? It's a map. It's a map of space in which every location has its own probability of being the location of the source. And as the agent moves by, this belief is updated and eventually you will see it localizes closely and closely to the source and therefore eventually this Bayesian algorithm can locate the source. So this approach allows to find the exit of the strategy. Okay, so there are techniques from dynamic programming that allow you to solve this problem which is essentially a problem of planning with uncertainty. So you know the model, you have to do some very complicated calculation to sort of conceive all possible futures and therefore you can act optimally according to this insert of certainty. So I don't have time to go to introduce. There are techniques to solve approximately the optimality equations, which are called the Bellman equations for partial observable Markov decision processes. It's very painful and hard. There are also very good heuristic algorithms that are available and notably one of them is called Infotaxis and the international, the idea is that you can try to optimize the amount of information you get about the source and this provides you with a heuristic algorithm which is very effective in search problems. These advantages, well it requires a cognitive map, okay? So this probability map must be in place somehow in the brain of the fly because it has to use it. It could be approximate, it could be coarse grain, but it has to be there as a structure. We know that some, most mammals do have cognitive brains, cognitive maps. About insects, we don't know very much about it. It's always a model-based approach because if you want to perform Bayesian updating, you need a model and the model must be good. There are also technical assumptions about how the system produces detections and it of course requires a very large memory space because this is a space of probability distributions over space, okay? So it's a huge object to manage. The third, so in this case memory has become a map, okay, a spatial map. The third possibility comes from generalizing our problem and say, okay, in the previous cases, the memory was decided a priori. It was either a clock or it was a Bayesian map, but let's assume that we fix a certain space, up to a space for memory and we leave to ourselves the possibility of also optimizing over the memory updates, okay? So this becomes something that we can optimize for. It's marked in orange here. One example is to use, for instance, recurrent neural networks as basis for the memory, okay? So these are neural networks which have a rather complicated structure. So this is one particular gated recurrent neural network which is called an LSTM, but basically it does the same thing. It's just a memory, a complicated nonlinear structure for the memory which is parametrized by all the weights of this recurrent neural network. And therefore at the same time, you optimize over the decisions you make and the way that you update your memory, like in the graph I showed you before. And if you run this algorithm which is extremely hard to do, you can find the kind of trajectories that are used by this artificial agent. This is a model-free approach. So you just let it interact with the data. You don't have to provide any a priori information about how the detections are generated. Like I said, both decisions and memory updates are learned from the data themselves. However, it's very computationally heavy and it has a large memory space. It requires a lot of data to be trained and maybe very difficult to interpret. And in fact, at the end, when you run all these things, eventually you turn yourself to do very simple PCA in order to understand what is happening inside this very large dimensional space of weights. So, and now I'm coming to the part which is specific to my talk. Our approach is to bypass entirely these very large dimensional spaces and to restrict ourselves to memory spaces which are countable and small. So the other is that here is our search agent which is in a memory state which I call red. Okay, whatever that is. Then it makes a decision and it stays on red and then on and on and on and then maybe it encounters the other and turns to yellow. So in this case, basically there is a memory state which is just described by three states, three colors. There are observations which could be presence of order or not. And this is mapped into a decision to make in space and the new memory state in which you turn. So in this case, this map for instance would have just three times two, six times three times two because these two are probabilities which has to be normalized. So it would be a total of three, I guess 36 parameters if I'm not wrong. But anyway, you have dozens of parameters to deal with and not a huge number of them. So I'm gonna show you very quickly a couple of results in this. So now you optimize by some techniques. Actually you can use gradient descent. You optimize over the parameters and then you see how the system behaves. These memory states are abstract. So you have to look at the behavior in order to understand what they mean. So in this case, for instance, this is a trajectory of a searcher and you will see that it goes through a specific sequence of memory states which are depicted in colors which corresponds to this diagram. So what does that mean? It means basically that if there is no detection it stays in the red state and after a while it transitions to the yellow state and all these correspond to going up, upwind. And then it transitions to the green state which is actually a combination of a bias random walk. So it works random walk which goes either side and backwards because it means that it has lost track of the presence of the order in the forest while searching backwards because it says I have missed something. So that's what this algorithm does. And you will see that in this case it's actually behaving as a clock, the memory, because at the beginning it's in the red state then it transitions to the yellow state and then like that. So this diagram is very similar to the one of the search cast model. Only that now it has just emerged from optimization of a very simple finite state controller. What happens if you enlarge the memory a little bit in four states then the situation becomes a little bit more rich. In that case there's a sequence, you see the diagram is still made of something which is a reset upon a detection but different things happen in different memory states. So in this case for instance there is a search, then there is a bias random walk and then there is a cast to the left and a cast to the right. So there is a richer repertoire of behaviors that are emerging. So maybe this is best seen by movie. Okay you see it goes up, then it loses track, a little bit of random walk, then casting, sideways, then contact, up, random walk, random walk, up, up, source, okay. So I insist we just optimize over the parameters and then we interpret it what happens from the data. Nothing is put by hand in this. It's a system which finds itself its best strategy in terms of how to change the memory, when to do it and how to do it. Okay so I'm going the other direction of course. So what is interesting in this case is that if you plot the probability of being in a certain memory state given the position in space, you will see that different memories correspond to different regions of space. For instance, if you are in the search memory, memory one or read whatever, you are in the wake of the source. If you are in the memory, in the bias random walk, you actually are on the boundaries of this wake or upwind of it. And if you are in the left memory, so to speak, you are most likely on the left side. So in this case the memory is presenting a sort of coarse grain, very coarse grain map of the environment. Then you can check the performance and you will see that it's actually performing quite well much better than the hardwired clock which fails most of the times, actually more than 30% of the time. It always reaches the source. Of course it is not as good as a Bayesian algorithm. You would expect that because the memory is compressed and et cetera, but still the peak performance is very good. I, did I have any other side? No, I think that's it. Okay, conclusions. Final memory controllers are a good alternative to other memory settings that are very large dimensional and very difficult to manage, are easier to optimize because of their small number of parameters. They are, and this is important, they are expressive enough to encode for a very rich behaviors like a clock or a map. You don't need, really for this algorithm, you don't need a brain. You don't need a cognitive map. You don't need very complex structures in your brain to perform such a task. And they are clearly immediately interpretable. Just look at them and then you realize what is happening. And as a final statement, I think that this kind of approach is very interesting to me because it expresses the viewpoint of looking at behavior from the lens of optimization which is not always a good thing to do in biology, but for certain specific behaviors it's a good thing to do and allows to shed light on very complex phenomena that would be otherwise difficult to interpret. Okay, sorry for being late. Thank you. From the audience? Yes. So this is awesome. I was wondering if you thought of applying this to something like worm where there's very few neurons that are very well known and it has some very typical behaviors. So what would be really great would be if you would apply this with the worm circuit, known circuit, right? So that you know exactly where biologically the memory is and where the connections between neurons and motor behaviors are. I think that would be just awesome. I don't have anything to add to this. I absolutely agree that C elegance is the best model system for this kind of thing. Thank you very much. So my question is about the training of the finite memory control. What kind of new world or cost did you assume that that is I think a very difficult program? How we should design that to get this kind of nice? Okay, so this question has two answers I think. So from the technical viewpoint, the fact that you have this mark of process underlying allows you actually to compute exactly the gradient of your cost function in terms of the parameter of the policy. Okay, so this you can write closed expression for this if you know the model. Okay, so you can basically do ordinary gradient descent or whatever are good mutants if you're able to estimate the second order of the limit is you do whatever you want or retrial. We did several things and then you find your optima. There are actually several of them so you have to do with usual restarts, et cetera. So this is one thing. If you don't have a model actually the good thing is that this kind of system also allows you to write down an estimator. Okay, so these things are known in reinforcement learning as policy gradient and stochastic policy gradient approaches. So you can learn what your optima are with or without a model. So considering the second part of the question if I get it correctly is how do flies actually perform this optimization? No, not really. So my question is if the leeward include the long term so when the controller is trained the leeward may be received for one case only when it leached the goal. So it's the reward or the cost if you wish is minus is one cost everywhere except the target. So this is what sets. So once you reach the target basically either the process stops or it takes zero forever and stays in place and this is what pushes it to minimize the time. Okay, so in that case does it have any other leeward from sensing or the... So no, no, this is pure, the only cost is just time. There are no additional terms which sort of shape the reward. There is no reward shaping it's the most trivial task at all. Okay. In the neural network approach with the current neural network they use reward shaping and table to make the algorithm converge without reward shaping but we don't need it. Oh, thank you very much. It's amazing. Thank you for the beautiful talk. I have a biological questions. You said both exist for their time mostly as caterpillars. I think they don't have much chance to train this decision making process by... They don't need to, that's evolution does the job. Yeah, so that as always there are very different timescales for learning, right? So there is evolutionary learning and you're born equipped with that. Then there is developmental learning and this takes place during the development of the organism. And then there is learning on a specific task that you do in a specific range of times in your life and that's still another thing, okay? So in this case the optimization is thought to be performed by evolution and then just this algorithm is applied to the station. And then there might be some fine tuning depending on the environment, right? So you might have one possibility is that evolution provides you with the structure of the graph but then still you're able to adapt the rates at which you transition from one memory state to the other depending on the properties of the environment. Okay, so there might be different timescales for this and different levels. Thank you. I think there are more questions but for the sake of time let's stop here and then let's thank Antonio again.