 Hello and welcome everyone to the second Applied Active Inference Symposium hosted by the Active Inference Institute. It is July 31st, 2022 and this is the first session of the symposium. The focus of the symposium will be robotics and the presentations will be centered around that theme. If you have ideas for future symposium topics and want to participate in organizing, please reach out to us. For those of you watching live, please post questions in the chat and we will ask the presenters during the roundtable discussion. This symposium will be recorded, transcribed and archived for lasting access. We will make the playlist available for asynchronous participation. If you would like to participate in the transcription of the video, please reach out to us at ActiveInference at gmail.com. We will have five presenters followed by a roundtable discussion. The presenters in the first block are going to be Tim Schneider, presenting Active Inference for Robotic Manipulation with co-authors, let's see. Oops, sorry. Oh, I don't have any listed here. Okay, and then next, let's see. Next is Tim Verbelen and he is presenting robotics modeling the world from pixels using deep active inference. Also no co-authors listed. And then next will be Ben White with artificial empathy, active inference, and collective intelligence. And he has co-authors Mark Miller and Daphne Demascus, Demacos, sorry if I got that wrong. And after that, we will have a talk by Nor Sajid, Learning Agent Preferences. And let's see. Her co-authors are... Oh, she doesn't have any listed here either. And then finally, we have Wenhua Chen with a talk called Dual Control for Exploitation and Exploration in its Applications in Robotic Autonomous Search. And that's it. So with pleasure, I introduce Tim Schneider. Please take it away. Yeah, thanks a lot for the introduction. I'm just quickly going to share my screen. I hope you can see that all right. Yeah. So my name is Tim Schneider. And today, I want to talk about our work on Active Inference for Robotic Manipulation. So I think we can all agree that manipulation is one of the central abilities that we need in our everyday lives, like be it cooking, writing, or using tools. And you can obviously think of a variety of other tasks that also require a dexterous manipulation. However, despite this significance of manipulation in our everyday lives, robotic manipulation is still a largely unsolved topic. And I think one of the main reasons for this is that usually in classic robotics, how we did it for years, we always assumed that everything is kind of known, that we know where everything is, how everything behaves. But in unstructured environments, this is usually not the case. And so what we need is very adaptive policies that are able to react to changes in the environment and are robust to all kind of perturbations. So what my lab focuses on, or at least in part focuses on is applying reinforcement learning to robotic manipulation. So learn these skills instead of programming them by hand. However, this is also not super straightforward in manipulation. And one of the central challenges here is to perform the exploration. So in reinforcement learning, we always have to explore a task before we can complete it. Like for example, here in this task, this robot has to move up this little ball into a target zone on this tilted table. And what we usually do and what is done here is that we just apply some random actions in the beginning and also throughout the entire optimization procedure. And we just hope that this will give us some useful insight into how the world actually behaves and how we can create a high reward in these settings. But I think in manipulation, this is usually not the case. Because if we just apply random noise, we end up dropping the objects we are trying to manipulate, we end up maybe destroying even parts of the environment. So I think this approach is simply not feasible for interacting with the real world in manipulation. And if we look at how humans explore the world, we see a very different picture. So this is a toddler that has obviously the task of building the highest possible tower out of these blocks. And what we can see is that this toddler is not simply applying some random noisy actions. This kid has actually a fairly directed way of exploring. It has an idea of what is useful to learn, what might bring it forward, or maybe this is not as planned, but more of an intuition. But certainly this exploration is much smaller than what we are currently doing in reinforcement learning. And so to summarize this, exploration is a very challenging task in reinforcement learning for manipulation. Humans, if we take them as an example, explore very actively and in a very direct fashion. And the question that we want to ask in this work is, can we do this with robots too? And so this is where we started looking into cognitive science and also came across active inference, mainly because it proposes a way or it proposes a theory of explaining curiosity in intelligent beings. And the question was whether we can transfer this onto robots. So I just, for completeness, I want to quickly go over the basics. So in active inference, we assume that every agent maintains some kind of model of the world, which consists of observations O, hidden states X that the agent cannot observe and some actions or policy pi. And the objective is to minimize surprise, which is the negative log likelihood of the marginal observation probability under the model of the agent. And it has, the agent has two ways of doing this. The first one is inference. So understanding the causes of the observations it is making, this is done by applying variation inference to compute an upper bound of this objective, which is called variational free energy. And then we minimize this, which corresponds to finding a variational posterior for hidden states given the observations we are currently making. And the other avenue, which is what got us interested in the topic is the selection of actions in order to minimize this free energy objective. So if we want to do this, we have to make sure that we kind of plan ahead. So we have to take an expectation of future states we might encounter, given that we choose some form of action. And we take this expectation of this free energy term. And what we get out is fairly interesting, namely, we get this kind of formulation here, which decomposes into an expected information game term and an extrinsic term. And the way this expected information game term works, or what it encourages is to take informative actions, because this term becomes maximal when we take an action, which causes us to learn as much as possible about the current latent state of the world. So for example, if you're in a very dark room, a very informative action would be to turn the light on. It's not immediately goal directed or anything, but it will at least tell you where you are, what your surroundings look like, and so on. And then the second term is this extrinsic term. And here the agent maintains a preference distribution over observations. This is a bit of an unorthodox thing for me as a reinforcement learning person. We usually simply define a reward. But here this extrinsic preference is defined in terms of this target observation distribution. But this is also exactly the point where we can inject a target behavior into the agent by basically saying, okay, this agent should prefer to observe some specific observation. Now, taking a step back and going to the reinforcement learning part of this talk again. So the problem statement that we are looking at is a finite horizon MDP, which means that there are states actions and rewards. We assume that these states fully encode the entire state of the environment. So this is there's no hidden states in the environment. And the objective of the agent is to maximize its expected return for one episode. And it's the challenge here is that the dynamics and the reward distribution, so these two distributions here are completely unknown. So the only way the agent has to figure those out is to probe the environment with different actions, observe the outcomes and then learn some kind of model out of this. I mean, obviously, there are also other ways of doing this. This model doesn't have to be learned. But in our case, it will be learned. Now, we can see reinforcement learning through the lens of active inference. I just wrote down this MDP again from the previous slide here. The first thing we need is to define the internal model of the agent. And we do this in a way that we say, okay, there's two models the agent maintains. One is a model of the dynamics and one a model of the reward distribution. And both of them are modeled as Gaussian distributions conditioned on on neural networks. So these are neural networks, theta being the neural network parameters. And this introduces a latent state now into the environment, namely the neural network parameters. So what was X before in the in active inference is now here theta, the only thing that the agent cannot directly observe, which is what are the optimal parameters that describe what is currently going on in the environment, what the agent is currently observing. And the second thing we have to do is to we have to make the agent desire high reward. And we do this here by making the reward part of the observation. So now it's observing the the environment state and the reward. And then we are setting this desire distribution in a way that it prefers high rewards, namely by setting it to the exponential of scaling factor beta times the reward a time set T. And if we do this, we end up the with the following planning objective. So this is the expected free energy now, but everything puts into its place. And so the planning objective of the agent is now according to the expected free energy to maximize this objective here, which consists for one of the extrinsic reward, which is just the sum of rewards the agent expects to encounter by taking some policy pi. And then there's also this intrinsic term again, which in this case here becomes the mutual formation between the neural network parameters theta and the observation the agent is making. So ideally the agent wants to make observations that are both good in terms of reward, but also have a lot of information about what the ideal model parameters are going to be. I just said that exactly. And now we can do the use this to forge an algorithm out of that. And it works like this. So we take so in every episode, we start by resetting the agent to some initial state. And then in every step, we have to solve this planning objective of selecting a sequence of actions that minimize the objective function I showed here in the previous slide. We execute this action, obtain a new state and reward. We store all of that in the replay buffer. And then after the entire episode is done, we use this data and the replay buffer to perform the inference step of learning our models and adapting them to the data we saw. Now the challenging part about this is to perform this optimization here. And the first challenge here is already to compute this term even for any given action. So computing the expected reward for a given action is has been done before in many works. And usually here Monte Carlo methods are used. So just estimating this expectation via Monte Carlo. But this intrinsic term, which is the mutual information, is known to be fairly hard to compute. And so they have been a variety of methods proposed to compute or to approximate mutual information. Many of them rely on variational inference or amortization. But the issue is that we would have to do variational inference over the model parameter theta, which is a fairly high dimensional vector. And also we have to do this potentially thousands of times in real time while we are optimizing this function here. And so this is simply too expensive to do anything fancy like that. So what we are left with is a nested Monte Carlo approximation. And nested because we have an expectation of a Caelda versions. This means we have an outer expectation and an inner expectation. And we apply Monte Carlo for both of them. And the good thing about this is, first of all, it's very fast. And it allows us to represent p of theta by a set of particles. Because we just we now only need samples. So we can also just keep these samples and treat the entire model as an ensemble. But the disadvantage is that we require quite a lot of samples of theta in order to get anything done here. Because we have this outer and inner loop for every sample that we draw from for this outer estimator here, for every sample of theta, we need to have nj samples of this inner estimator here of theta again. And this means you can quickly calculate if we take five samples on this outer estimator and five on the inner estimator, we already end up with 30 samples of theta. So we have in like a year and a quadratic growth in samples that we need in order to compute this samples, this mutual information approximation. And what is also important to note is that each of these samples is a full neural network that needs to be trained and maintained. So this is simply going to be very, very expensive, very quick. And so what we propose to do instead is something that is a bit illegal in terms of math, or at least it will lose us a lot of formal guarantees. And that is to use the same samples on the outer estimator as we used in the inner estimator, vice versa. So to use in the inner estimator, the same samples of the outer estimator. That means we draw a bunch of samples once. And then we use all of these samples except the one we just now used in the outer estimator to perform this inner estimation. And again, as I said, we lose some formal guarantees because what usually is assumed is that these samples are IID. This is not the case anymore right now. But we found empirically that this actually improves sample efficiency a lot. So I mean, here you see a little comparison we did on a discrete, like a randomly generated discrete probability distribution where we could compute mutual information accurately. And here you see a plot of the error of the sample reusing estimator, the second one compared to the vanilla estimator that doesn't reuse samples. And this is over the number of samples. So the sample efficiency is actually much higher, even though we are losing these formal guarantees here. Now, this means we now know how to approximate or get at least some approximation of this objective. But what is still unclear is how do we actually maximize this objective, like what kind of optimization algorithm we are using. And the first thing we tried here was to simply use a vanilla cross entropy method. This is a fairly common choice in robotics to use this method because it's very robust and doesn't require gradients and anything. And it's also considerably fast. It works in a way that you initialize some Gaussian parameters. So you always maintain this Gaussian distribution of your current action distribution. You initialize it to mean 0 and 1 variance. Then you sample a bunch of action trajectories from this distribution. For each of these, you evaluate the reward. And then you collect the end samples that have the highest reward. And for those, you fit the parameters of the current Gaussian distribution you have over actions in order to gradually shift this distribution towards areas of f that have a high reward. And if you iterate this for a while, you usually end up with a fairly good plan. Now, the issue is that applying this delivers fairly poor results. And the reason for this, we quickly found, is something that is called detachment or has been called detachment in prior work, I should say. I don't think this is an established term yet. And the problem works a bit like this. So consider you are in this completely reward-free environment. The only thing that there is is intrinsic reward, that leads you out to explore this environment. And you start in the middle here of this maze. And you can either go left or right. And in green is everything that is intrinsic reward. So everything that is not explored has a lot of intrinsic reward left. And so your agent decides maybe to go for the left side first. It explores it for a while, but at some point decides that now the more immediate intrinsic reward is to watch the right side, because this part has them explored now. So on the right side, it's easier to get intrinsic reward. So it might switch over to exploring the right side first. And remember, this is episodic. So after a couple of steps, this agent always gets reset to the middle and has to do this over again. And so at some point, the right is maybe completely explored or whatever. And now we are in a bit of a tricky situation, because we are again starting in the middle, but the entire right side is explored. And the left side is so much explored that in the immediate vicinity of this state, there is actually no intrinsic reward left. And because many planners and or many planners that work well on high dimensional problems rely on very local optimization, we found that that these planners are unable to find now a trajectory that leads all the way into this zone with high intrinsic reward on the left. And the reason simply being that in cross entropy method, in order to reach this zone, you would have to randomly sample a complete trajectory that goes all the way around this maze and ends up somewhere here in the middle, or like at least somewhere close to the border of the intrinsic reward here. And this is usually not happening. The solution for this we found is to realize first of all that the only reason there is no intrinsic reward left in this area here is because we already explored it. So at some point in the past, we must have taken a trajectory that led us all the way around towards the border here and stopped somewhere here. And so what we can do is we can simply remember all the past trajectories we took and use those as an initialization for the cross entropy method to start off the planning from. So the way this looks like is that in the end instead of simply returning the plan, we store the entire plan together with the current state in a memory buffer. And then when we plan for the next time, when we start the planning, we not only sample from this initial distribution, we also sample from this memory buffer, take into account all the previous plans we had and start optimizing those as well. And we found that this usually leads to behavior that is where we were able to escape this area of low intrinsic reward. Yeah, and that brings me to the experiments we did. So this is a task we designed to be specifically hard to explore. So the task is, as I said before, to move this ball into this target zone. The tricky thing is that first of all, the table is tilted. If the robot loses the ball, it ends up somewhere here on the bottom and it cannot be recovered anymore. So the robot has to wait for the end of the episode to continue exploring. And also the reward is completely sparse, meaning the reward is zero everywhere except for if the ball is in this target zone. And this means that the robot has to explore this entire environment purely based on intrinsic reward and without any extrinsic signal in the beginning until it discovers this reward for the first time. So if we apply just like I told you before, this classical technique in reinforcement learning of just applying some random noise, we can see that the ball gets dropped immediately every time and even after like 5,000 episodes, you would see we are not able to reach even one third of this table. So here on the right is a histogram of positions the ball has visited so far, and we are completely unable to leave the start of this table. However, if we use our method, we can see that we actually explore this environment in a very systematic manner and achieve a very good state coverage, or at least ball position coverage. And our agent is quickly able to find this reward here and find a strategy that consistently pushes it into the right location. This is also reflected in a learning curve. Here's a comparison with a bunch of baseline methods we ran. One is PADS, which is just a model based reinforcement learning algorithm. There's also SUCK and MBPO and neither of these methods managed to find the reward in the given time. So we wrapped it up a little bit, made this task a little bit harder. So what you see here are holes in this table. So now this becomes a maze. There's still no extrinsic reward in the lower area and also up until here. So that means that again, based purely on intrinsic reward, the agent has to learn to maneuver this ball. This is a fairly tricky task to maneuver this ball around the corners of this maze into the target location. And although it takes a bit longer to train here, obviously, because this task is much, much harder. Oh, it's also important to know once the ball is inside of this hole, it's lost and it cannot be recovered. So although this is much harder and it takes a bit longer, you can see that our method is able to solve this problem and push the ball into the target location at the end. So this is a bit of a slower version of this policy. And you can see that it is actually fairly tricky to maneuver around these corners. There's a learning curve here. Unsurprisingly, the baselines did not manage to solve this task. But I mean, it is expected since it's harder than the previous one. And finally, we also evaluate our system or our algorithm on the real system. So we built this actually in reality and trained it from scratch. So there's no transfer going on, no pre-training and simulation or anything. And you can see here, the behavior is similar. The agent starts exploring the environment more and more, pushing the ball systematically around and ultimately discovering the reward on the top of the table and then finding a consistent strategy of moving it up. There's a learning curve for this as well. But yeah, this brings me to my conclusion. So we presented an algorithm based on active inference that is able to solve very challenging sparse reward manipulation tasks. And at the core lies this augmented cost function that is derived from the expected free energy. And this one like encourages the agent to perform very directed exploration while also maximizing reward. And we demonstrate that this method works not only in simulation, but also on a real system. And so to give you a bit of an outlook, I think what is very important to note is that if we compare to human haptic exploration, we humans rely very heavily on tactile sensing for all of this exploration. We have developed a variety of different strategies for actively perceiving using tactile sensing, for example, like rubbing an object to figure out the texture, or what you can see soon in this video, like grabbing it with our full hands in order to figure out the shape, can measure temperature. There's all kind of stuff we can perceive actively from objects using tactile sensing. And tactile sensing is also something that is being actively researched in the robotics community. There have been a variety of new tactile sensors coming out lately. One of them is the so-called digit sensor. You can see it here, and it works in a way that it has a gel that deforms up on contact with something. There's a camera in the back and some LEDs. And if there's some contact, you can see the deformation of the gel in the camera, which is giving you some kind of like local tactile feedback. And I think it would be very interesting to see what we can achieve with active inference on using tactile sensors, although there is a variety of challenges that need to be solved. I mean, this is a high dimensional image. The contact dynamics are much more complicated. And yeah, there's a lot to consider here. But I think this would be a very, very interesting avenue for future research. And yeah, this also brings me to the end of my talk. I want to thank you a lot for your attention. I want to thank my collaborators as well, Boris, Georgia, Hany and Jan Peters. If you want, you can check out our paper here under this QR code. There is also a project page, which will soon contain an extended version of this paper that has been accepted to IROS. But I haven't uploaded it yet. So I will do this within the next one or two weeks. So if you're interested in more details, I recommend you to check this out. But again, it's not up yet. And yeah, there's my contact details in case you have any more questions. So yeah, if I got that correctly before we do the questions afterwards in the round table, is that correct? Correct. Thank you very much, Tim. You may exit and re-enter the stream when we go to the round table. Okay. Thank you very much. And there will be just a few seconds of a break. And then we will be going to the next talk, which is Tim for Bellen robots modeling the world from pixels using deep active inference. So we'll be right back. All right, here's Tim for Bellen's talk. Robots modeling the world from pixels using deep active inference. Thank you for correcting me. Oh, thank you for correcting me on the blix symposium. So unfortunately, I could not make it live. But I recorded this talk for you. And I hope you'll find it interesting. I'm going to highlight a bit of our research like stretching out the picture what we work on. And with a chart, basically what we want to do is we want to have robots that can model the world they live in from pixels. But basically, we can extend it to any sensing modality using deep active inference. So why do we work with robots? Well, basically, you figure out that if we want to have something intelligent, we have to build something intelligent that is actually doing something relevant to the real world, then you need to do action. You need to interact with the world. You need some kind of embodiment. And that's why we work with all kinds of robots with both manipulators that can scan around gross projects and record objects. We also have navigating robots that can drive around, comply around. So you might be wondering, yeah, why are you looking at both navigation and manipulation and everything there? Whereas all over the world, there are labs that are dedicated on like manipulation as like a single domain and navigation as another domain. And they're like two distinct problems. Well, the thing is that if we take it from another approach, if you look at it from an active inference perspective, then it's basically the same thing. You're doing the same thing. And that's why we're trying to combine all kinds of robotic problems from this single scope and see to what extent we can kind of address it and solve them and to compare where are we compared to state-of-the-art in robotics, let's say. That's like our overall objective. So in this talk, I'll talk a bit on the schedule, what is the general approach we're taking, and then some current results in both navigation, manipulation, and some of the working progress that we're excited about and working on. All right. So let's start off with what is active inference. I guess most of you will know already, but just to set the scene, for at least the next 14 minutes, are the same basic information and what do we mean with the terminology. So basically active inference just means to us, we have our agent or brain that just builds a joint model of its environment, which we call the joint probability over outcomes of actions A and then some hidden states. So your agent is distinct from its environment. It can interact with the environment by doing actions. It gets some observation, then it drives the model with a hidden state, like how is this environment generated by observations and can I model this through these hidden states. I've used the single principle of optimizing this thing by minimizing the so-called free energy, which is like a rubber balance on the surprise of our prediction error. And importantly, not only do you use this to model this instinct to build a model of the world, but you also use this to select the actions that you will hope that will minimize your expected free energy in the future. And like looking at everything from this angle of minimizing your prediction error, not only for the past, but only for the future. This basically gives you a very powerful mechanism to start doing robotics. So just on math equations that kind of relate to this free energy concept. So basically minimizing the free energy entails having this kind of complexity, minimization. So you want to have the simplest model that explains your world, on the other hand, you also want to have an accurate description of how the world works. So you want to predict your outcomes from your state. This basically means that you're having all the information that's out there in the world. You can actually have this in the state of mutation. But at the same time, you want to have your explanation as simple as possible according to supplier. And this Q is basically the variational posterior, which is saying I want to be able to invert the model. Like if I have some observations, I want to be able to infer what is the most likely state that could explain whatever happened to now. That's kind of the game. And then if we look at the future, then of course we don't know the outcomes that will come in the future. So now it's basically the state from which we can have like this. Now the expectation also considers what might happen in the future and so what outcomes I might relate. And then your expected free energy, it becomes again like two terms. One is the instrumental value. It just says, how do I think the outcomes that I witness will actually realize what I expect to be, what I want to be, like if I'm realizing preferences. And in a reinforced learning context, you might see this as a fact, a reward signal, but rather having it coming from the environment, it's not more like something that's intrinsic, that defines what you are in the organization. And then importantly, you have the second term, which is like information gain. Like, how do I think my beliefs will shift for what I think now that will happen versus what I think my state belief will be if I supposedly get these outcomes. And you're basically searching for the observations that will give you more information about what you're doing. So automatically you get this on the one hand, go and direct the behavior, but on the other hand, your agent is also driven to find out the relevant observations, find out information in this environment. But of course, we have this kind of deep active inference flavor where we basically want to learn this model just by feeding it data. Basically, we use these neural nets that we know from the so-called artificial intelligence nowadays. It's basically a function approximator where you give it some data and then you can match any function with them. And we use these then to approximate the densities that were on the previous slide. So how does it go? We have our observations and our actions. We basically go through a neural net, which then represents this approximate posterior. So this outputs like the means as their deviations of Gaussian distributions. And these are there like your posterior distribution. So it looks like a bunch of multivariate distortions. And this is then your state representation like what's happening right now, given the observations I have until now. Then we have a new an additional neural net that then like predicts the dynamics in the state and state space. Like if I do these actions, how will my state evolve in the future? And this outputs again per meters of Gaussian distributions, for example. And this allows you to plan in the state and state space. Like what will happen if I do this or that? And in order to kind of train the model from each of the states, we can then have a decoder model that then tries to predict the observations that you expect to see from the state. And so the training mechanism is then minimizing the energy which entails that here you have your accuracy minimizations. So you minimize your constriction error. But at the same time, you minimize the flexibility which basically pours out through the L-divergent in this thing. And this thing is like what I expect to happen without observations. I want to have it so as close as possible to what I expect has happened, given that I saw the observations. And by minimizing this scale, there is basically a forced model to have like vertical size encoding that allows me to imagine in the state space what might happen. And at the same time, emergent information coming from your outcomes, but not more than as necessary. So you don't want to necessarily model all the details in your pixels, for example, as long as you have enough information to predict sufficiently what you want to do. So that's roughly the idea of the road. And so now we'll get some instances in robotic spaces and we'll start with the navigation. So in this case, we have robots like these, like you can use, you know, but then with some additional sensors mounted on top of them, so we can get all kind of inputs. We just drive them around this lab environment where we have rex that simulate kind of a warehouse setup. And then we just train these kind of models that just have to predict, for example, from camera images, what will I see if I do certain actions. And then after training such a neural net, then this is the kind of thing that we get. So these are basically imagined views from a viewpoint from inside this lab environment. And then you can say, okay, what do you think will happen if I do certain actions? And this is then kind of the thing that it starts to imagine. And these are more different samples from this distribution. So you'll see it from the beginning, they're all imagined kind of the same thing. But as the further you go in the future, the more they disperse in their kind of modeling, all kinds of scenarios, what might happen in the future. And you can also see that even though the model is trained on long sequences, it doesn't really capture like long-term dependencies or like very accurate location information. Like here it thought if I turn around, I would be facing the wall, whereas here it thinks if I turn around, I'll just, you know, part of the area. So here are things I'm in the middle of an area. I'm at the edges, let's say. So it kind of all has this inside the model that these are all kinds of scenarios that can happen. But because we condition these models on the action, we can also generate kind of action conditions. And so we can say, what will happen if I turn left or if I turn right, if I go forward, so it actually encodes these dynamics inside this model. And this is just trained from data, from just having around and learning to predict what can happen. So one of the things that we found out is actually by training such a model, we can then, after fact, at runtime, compare always your prediction, what you think will happen to your posterior beliefs, what's actually happened given your observation. And then towards the yellow version between the two, which is then kind of the notion of the basis size of the model. And if we then put like a new object, like this table here or there, and if the robot drives over it, then this kind of dynamic system has never seen before, which result in like a huge spike in the robot is like, yeah, what's going on here. And in order to show that this is not just measuring the difference in pixels, you also have this scenario where there are people walking by the robots, which also happens when you're recording the data. So this is kind of a normal scenario. And then actually the robot is like, oh, yeah, this is something that can happen. I see something like this. Nothing to be surprised about. So here you can see that it's more than just learning pixel dynamics, actually that is like, this is a normal state versus this is a weird kind of dynamics. And one of the nice things from using these learned models is that you can actually put in any sense. Sorry, we focus a lot on pixels because that's also nice to visualize and you can relate to how it looks like. But we can also give you like a picture, like this, you can see the line right here and there's some ranging. And here we have like a wave art, which is a bit more integrated to me. The range axis, I call it something. And then here the top work axis, it just means is something coming towards me or is it going further for me. And also there we can then have like imaginations of the model, like what's going to happen in this state space? You already saw the pixel, but also relates like, okay, this is how lighter scans will be. This is how radars will be. If I'm looking at the air, I see much more reflections in the radar image pairs. If it looks at the air, it has like only a few reflections. So it actually captures all these dynamics from any of these mental modalities in this in this latest space. So that's one of the strengths of this project. But as I told you, it has very a very narrow like temporal depth that can predict for a few seconds, maybe one seconds, maybe a few seconds, but then it becomes this bird's temporary. So you cannot really use it for like long term planning. So we figured out, yeah, what do we need to do in order to use the system, but then like do something more relevant, long term, and we figured out that what we need is go a little higher and make like a hierarchical model that instead of predicting like the next sensory observation, it can like predict which kind of state we expect like a bit further in the future. And then we found in this kind of model where the blue part is actually what we saw right now, but then the red part is sidebar, where we kind of combine some some ideas from engineering, from SLAM, what is the thing that you expect to be like in a minute or a few seconds there. And it's actually like, okay, I'll do that at a certain location right now, and in a few seconds I'll be at a different location. And this different location will then be more the kinds of observations and poses that I expect to be there. So if you implement such a model, it basically boils down to having on the one hand this abstraction of the sensory inputs, which is what you saw before. And then together it's a representation of the pose that you turn around to keep track of it so you know kind of where you're heading. And you have some sense of part integration. And then these two combine, then build kind of a map, which is then becomes your model of the world is then traversing this purpose. And then we get something like this. So we're here, we get a setup. So here we have the camera, then we have a data representation from the sensory input. Here we have the auto-metry, which is very noisy. You can see it go over the face. But then by integrating both, like this rock sense of where you are together with what you think it looks like. And then combining like if this pose and view is pretty much the same as what I visited before, then it's probably the same location that you merged it to. And so then you see that it actually recovers different nails in the lab. And it actually makes sense out of it. And then you can basically use this model also to give you an imagination like what is this looking like. And then you can use this to like plan longer and longer term basis. So then we go to the more manipulation kind of new cases where we have like a robot arm. And we didn't have a camera on the wrist. So it's actually getting information what it's working looking at. And you might think, yeah, this is like a very different use case. But again, the way we approach it, it's just an agent that can move around. And its main objective is to predict what will I see if I move around and then use that to kind of figure out how this world works. But of course now it has much more of a freedom, but other than that, the concept stays the same. And so here you basically have a similar thing where it basically learns other viewpoints from the information in government. So first we give it like very, it's straight looking at an important table to learn something. And then it's legal structure for all kinds of this is like construction for all kinds of poses in the workspace. So it just figures out and it was trained on like simple cases with some blocks in front of it in simulation. Then if we start adding observations to the system, you can see how all the different poses are kind of imagining, ah, there was a yellow cylinder there. So in these viewpoints, I probably need to imagine a cylinder. And this way you can kind of start building the world and reconstructing the world from all kinds of viewpoints. So it's always predicting the future observations from the information it got until then. And the model is improving as information. But then you can of course also use the model to kind of assess which would be a good viewpoints that I haven't seen before, but give me some new information. And then basically what we do is we use this expected energy term to then drive the action. And what we found is then by just doing this mechanism in this case, we also give it like preferred observation, like we do this blue cube, the current observation just looking at the space. And what we found in the bit is actually it first goes all the way up because that's what it figures out to give it most information. And then it will start zooming in and scavenge for the preferred observation and then see there was a really cool effect to see from just its principle that it starts like now goes up to have an overview. And then once it's explored the workspace, you can then kind of where we get a new novel viewpoint. So this is again imagination states of the robot arm wandering around like we had in the navigation scheme, but now with the robot arm action state. So it's again the same principle, but we get some similar results out of it. But then of course we were thinking, okay, we need to throw a lot of train data like different scenes or different objects at it before it could kind of start imagining these kind of scenes. But if we look at how we learn how the world works, how the world around us is, then probably we're actually looking at objects and manipulating them and looking at a single object at the time. And that's how they are. This is the better way to learn in the manipulation scenario how the world is working. So this is then what we did. Instead of having a trajectory of random scenes, we just made the robot look at particular objects and then predicts other viewpoints but still from that same object. It's always looking at kind of the same location and predicting how will this thing look like from this side. And then we basically get something where you can imagine for this particular object, this is how the dynamics are for this object. And then we take a similar model, but then we trade it on a different object. So now instead of having a huge deep learning model that has to learn about any object around, we can basically compress this into a very, well very, it's maybe a bit over a bit, much smaller real net. And then we initiate a new instance for every new object that we created. So we can give it mugs or a can or a banana. And it can start imagining how all these objects sleep like from other views. But then again, we can give it like a preferred view and then also for which would be a trajectory that brings you to the preferred view. And that's what you can see here. So then the top view is like the target. You just randomly give it a viewpoint and it figures out how to move in this space just by imagining how all these kind of objects sleep like from different views. Then you can also use this to kind of specify objects because if we then have a random object that it either had seen or never saw before, it can use all the models and just very like which model is actually matching my prediction for what I'm seeing right now. And this then gives you an idea like it shall be that object. And you can see here that for an initial view, it's already pretty good at knowing what it is. But in some cases, you have like a very ambiguous view point. You cannot really distinguish the two objects. And then you can just have another view and that will then resolve the ambiguity. That's exactly what we see here. So the energy agent will look for the most informative view and then for the normal objects, it reaches like 100% accuracy. And then for objects never saw before, it then says, yeah, I don't know this object. And for this particular case, it's not 100% accurate. And why is that? Because there are objects that might be very related. So for example, it knows about spoons. And then you give it a four and it's like, yeah, it's kind of looking like a spoon. So I'm not sure, but these are then the errors it makes. But these are like sensible errors because actually the dynamics of the object are very much matching an object I know before. And then if you vary it like what would be the next view you would like to see, you typically want one with a low expected free energy. But if we ask it like what would be a view that you don't want to see, then you get these kind of imagination in the rightmost flow. For example, if I just look at the top, I just see a certain it can be any can. Or this particular view, it's very dark. So I cannot really distinguish it from anything. And the mystery bottle in the bottom right, it's like, yeah, this is kind of looking like a banana. So this is not the viewpoint. I'll choose to go next. But in practice, we saw because these objects are so distinct. And a random agent is also good at just integrating information because the chance for a low that you actually end up in this, this ambiguous view point. So in this scenario, it's equally well just randomly looking around. But the more your your objects become ambiguous, the more benefit you have from actively sampling, particularly a few points. Yeah, and you stick to that. Yeah, for sure. Yeah, exactly. So so immediately, like the same thing as using radar and camera for navigation, it's like using tactile and visual information for just inferring what this object is like to me as a robot then. But we have done a particular experience yet on tactile because it's also a bit more difficult to kind of just simulate and but yeah, it's on our minds to do these kind of things. But yeah, it's a very interesting route to take. And so finally, we can also do then is look at a scene with different objects and say, okay, I want to open the can. So I need to have the top key of the can. And normally, if you read this to a reinforcement learning agent, it would just say, okay, I cannot see this type of circular gray thing inside the goal image. So the only thing it will do is like randomly search around until it sees something alive and then go to the target. But in our case, we know that the system inverse like this thing is probably more related to the can rather than instead of the sugar box, so it will draw its attention to the can. And then also, yeah, this is more like a top view and this is more like a side view, but it can imagine like the thing it should, the movement it should make to go to the top view. So this is exactly then for our system. Thus, we just give it the target view, it directs its attention to the correct object and it finds out the movement it should make to kind of get to a similar approach. So that's kind of the real powerful system. And you can see this kind of like adding another layer on top. It's similar to the navigation where you then have like an abstraction of locations. Now you have like, okay, I have an abstraction of objects being somewhere spatially arranged and I kind of infer where they are and how I should move in this space in order to place them. So then to add is something that we're actively and actually, yet it's actively working on this is basically, okay, we now have this system where we can build a model where we can relate action to what is happening. But one of the difficult thing is still like the further in time you need to plan, your potential of trajectories explodes. It's like, yeah, you cannot plan using all these fine grained actions. So can you figure out like new sensible actions or skills that we can use to explore? And so one thing in order to avoid this explosion of options is to amortize a policy. So now we train again a function that is then more kind of habit policy that gives you the action, but instead of using a reward signal as done in reinforcement learning, you basically give it an objective that's more like related to the expected energy. So in particular, here we use what's called late invasion surprise, which was like terribly invented to relate more to the Oral community, but an ethnic is just like the information game on your expected states, basically. And then we found that it actually found out that these video games just purely from an intrinsic motivation. And also if we compare it to other like terms that they add in the RL eighties to get this implicit drive, we actually found that our method was either on bar or slightly better than others, and especially in cases where we add some noise in the state. So if the observations are kind of noisy or ambiguous, then our method still out of the was it was more easy to out perform the others because they were like, hey, the noise is like very interesting. So we were drawn to noise observations, although like maybe they don't give you extra information. So that was something interesting we found. And now we're actually expanding that like instead of having this is more like having an exploration policy. But now we're thinking more like, okay, if we want to plan, maybe we just won't have like a few distinct options that make sense to explore, like can we kind of find some elementary skills that are worthwhile to explore. So in this case, we have again a robotic R and here we again, explore using this intrinsic drive for information game. But then we give it like 32 potential slots to learn policies that just have to find out which ones are good trajectories that in total allow me to explore the state together. And then we end up with like a manageable set of skills that we can then find for particular tasks. And that's actually one of the things that we're working on right now. So with that, thank you all for listening. I hope you enjoyed it. Should you have any questions, you can always reach out via email or via Twitter. And hope to see you soon in the next location. Thanks. Alright, that was Tim for Bellin's presentation, robots modeling the world from pixels using deep active inference. The next presentation is going to be by Ben White, artificial empathy, active inference and collective intelligence. Hello, everybody. My name is Ben White. I'm a first year PhD philosophy student at the University of Sussex. And I want to start by saying a really big thank you to the organizers for inviting me here today to share this research with you. It's a real pleasure to get to do this even if it's just in a pre recorded format. I want to say right now at the outset that this is very much a collaborative and ongoing project. It's work that I've been doing with Mark Miller and Daphne to make us and it's very closely related to the kinds of things that I do here at Sussex. So I'm interested in looking at the relationship between human human wellbeing and material environments. I'm interested in things like ambient smart technology, social media, augmented and virtual reality and affective computing, which is what I'm going to be talking about today. Because this is an active inference symposium, I'm going to afford myself the luxury of not going over the basics of the framework. And instead I'm going to jump straight in and I'm going to tell a fairly broad stroke story about how we think active inference might be able to shake things up in affective computing. So affective computing is a research program which as many of you will know aims to build computing devices capable of interacting with human users on an emotional level by identifying, categorizing and responding appropriately to emotions in human users. And just to give some examples, we have the paper in the top right from the MIT computing lab and this project aimed to put affective interfaces into cars because any of us who drive know that there are certain emotional states we can sometimes find ourselves in that probably hinder our decision making process and that can have some pretty negative outcomes. And so this project was really geared towards increasing road safety by intervening in the emotional states of drivers. The two devices at the bottom, so the screen and camera in the bottom center and the rectangular headed guy who's yellow, these are GIBO and Wobot respectively. And these are therapeutic interventions that use facial recognition technology, emotion recognition technology to learn about their users and then make suggestions for certain tasks or games or even clinical interventions that are geared towards supporting the emotional well-being of the user. But as you can imagine, these are not the dominant deployments of affective computing. We mostly find affective computing now in industries like recruitment and marketing. So for example, companies like Unilever and many, many others use emotion recognition devices in their hiring process in order to analyze certain nonverbal responses and facial expressions which they say are indicative of certain desirable or undesirable character traits relevant to the job. However, you won't be surprised to know that this has come in for fairly heavy criticism. So there are certain kinds of worries about this technology. The first worry is that it is simply not functional, that it's based on bad science and that it doesn't do what it's supposed to do. And closely related to that is the various kinds of ethical concerns, mainly that these systems are in danger of propagating certain kinds of biases and prejudices. So Lisa Feldman Barrett, for example, who is a major leading light in affective neuroscience, has come out fairly heavy against this kind of technology. She's labeled it neofrenology and said that there's simply no way that it can do what the people who make it say it can do. And this is because she says it's based on a very outdated theory of emotion, which states that humans have six to eight basic emotions, things like anger, fear, disgust, surprise and so on. And that these emotions are expressed through sets of facial expressions which are universal across different cultures and different contexts. And anybody familiar with Lisa Feldman Barrett's work will know exactly what she thinks about that theory of emotion. She's not a fan of it at all. And the ethical concerns have been raised by A.I. ethicists like Berber Behani, who's argued that basically the fact that these systems are trained on very, very large data sets mean that they are inherently conservative, and this is why they propagate certain kinds of biases. So the two pictures of the hand holding a device you can see on the right hand side. In this case, it's Google's Vision Cloud. In the top picture with the dark skinned individual, that device is identified as a weapon, and on the bottom that light skinned individual, it was identified as some kind of electronic device. And there are other studies as well that have shown that emotion recognition technology just works completely differently on non-white individuals. And so with technology like this making such consequential decisions for people's lives, it's really, really important that we start to get this right and we don't have these kinds of outcomes that we have in the current program. Okay, so we need the strongest, most up-to-date theoretical underpinnings that we can get. And we think the place to start with that by the place to start in updating the science is to recognize that these devices are very problematically disembodied, they're very superficial, and they are inactive in the sense that they don't perform any actions. And this, of course, is a million miles away from the way human beings interact socially and emotionally. And we've known this for a very long time. So we've known, for example, this example from Louise Peugeot's work, that emotions are not simply partitioned off from the way human beings think or act to actually cognition and affect are very, very closely intertwined. Okay, there's no discrete separate brain areas that only do emotion and only do cognition. And furthermore, research paradigms in cognitive science, so embodied cognition, for example, tells us that actually we need to go even further than that and recognize that embodied action is an integral constituent part of how we think and feel. And then the other ease that make up for e cognitive science, so inactivism and activism, extended cognition and embedded cognition have come together with niche construction theory and really interesting ways to tell us that actually we also need to consider how elements of our external environments can scaffold the way that we think and feel. So this is where active inference comes in, because we think that if we're serious about designing these kinds of devices, take into account these most up-to-date theoretical developments, then active inference basically brings all of that in with it. So active inference is a theory that really elegantly kind of intertwines action, perception, thinking, cognition and affect together under the unified imperative of minimizing an agent's surprise. So that's the first thing that it gives us off the bat. It gives us this unified computational, unified conceptual framework that can be shared by researchers in different fields, but mainly as the name suggests, active inference really puts action front and center. It's not some afterthought or just some kind of bolt-on gimmick. And I think it's worth reflecting for a second just on how central actions are to the way that we interact socially and emotionally. So human beings are very far from being a passive classification device. We're constantly sampling the world and probing the world in order to get more information so that we can update our models of the world. So imagine the following scenario, which happens to me fairly often. Imagine you're on public transport and somebody's giving you a weird look or maybe they're scowling at you in some way. We tend to not sit there and just look at their facial expression to try and work out what's going on. We have other avenues open to us. So we might look behind us to see if they're actually looking at someone else. We will probably look at the broader scene for some context to see if there's something going on that can tell us more about that scowl. Or depending on our mood, we might even scow back at them or flash them a smile and see how they respond to that. And I think another scenario that really brings this intuition out very strongly is to think about the peculiar kind of tension that comes in a job interview. So I think that tension is the result of a confluence of two things. Firstly, very high uncertainty, uncertainty that's really important to us. And also the fact that our usual embodied epistemic resources have been straightjacked by social convention. Because it's the case that in a job interview, even though we want to know a lot about what the other people are thinking and feeling, social convention dictates that we can't ask them, we can't prod them, and we can't really sample the scene in ways that are going to give us more information. We're kind of stuck to the chair, we just have to wait and see. And that's an unusual situation to being because so much of human social and emotional interaction relies on active states to use active inference terms. So speaking, listening, prodding, smiling, scowling, raising an eyebrow, all of these things are embodied actions that we take to learn more about social setting. And I think the thing to emphasize is how important context is as well. So the ability to kind of actively survey a scene to drink in context. But also the way we learn about the relevance of that context is something that's built up and scaffolded through action and different kind of patterns of practice. So if you think about the scholar on the bus again, there's a high degree of uncertainty around that facial expression. But if you imagine that same facial expression transported into the onto the face of somebody on the other side of a boxing ring, all of a sudden, the uncertainty around that facial expression is minimized because the context of boxing ring tells you everything you need to know about why that person is scowling at you. And this emphasis on context is something that's badly missing in current iterations. And the active inference community is already producing really cool work premised on these kinds of insights. So there's this paper, Thinking Through Other Minds by Samuel Vassier and colleagues. And this paper highlights just how it is that we come to understand our socio-cultural niches through precisely this kind of active social foraging. So it's really emphasizing the importance of context and the fact that we come to learn about context through action. And there's an example of where we see a gap between artificial systems and humans when dealing with context is how artificial systems and humans compare when performing selective attention in regard to some tasks. So it's a really pervasive problem in artificial systems that they don't tend to look at the same places that human beings do when human beings are surveying a scene for some kind of task relevant information. So the question is how do we get artificial systems to drink in context in the same way that humans do? And then how will that improve the performance of emotion recognition devices? So selective attention is all about filling in epistemic gaps. It's about filling in gaps in your knowledge with information from a scene that may or may not be task relevant. And one of the reasons humans are so good at this is because obviously we have this huge knowledge base of what different contexts mean. We live in the world. We've always inhabited socio-cultural niches. And so we have a lot of experience. But as I said before, it's important to emphasize that the way that we learn about context is about sampling different contexts. It's about the fact that we have our entire lives been actors in the world, not passive observers. And I think this work by Mercer and colleagues that you can see on the slide here, it's really interesting because it shows that active inference is capable of modeling selective attention in ways that give us much more human-like results. So they used certain kinds of internal precision dynamics and they demonstrated that these precision dynamics can map accurately, covertly, task relevant and task irrelevant features of a scene and then update precision estimates in relation to that information, which then drives overt actions, which then serve to update the system's model. So it's this very close relationship between covert attention and overt attention, which I think is really interesting on this account. And I think active inference is a really powerful framework for recognizing and addressing context generally and for the importance of action in learning about context. So the models that I just outlined, they provide the tools for this very elegant top-down, first principles approach to selective attention, which is based on these internal precision weighting dynamics, but also on embodied action perception cycles. And of course, one nice side effect is this, side effect of this is that systems based on this would be able to autonomously select the data from a scene, which is going to give them the most epistemic payback. And this means that they can do away with the very, very large data sets and long training times, which AI ethicists have said are probably the root cause of a lot of the ethical concerns that I talked about earlier. And so from a practical standpoint, this means that we need to think about building affective computing devices, which are not merely inert lumps of plastic. We need to start thinking about approximating something much more like a fully embodied agent. One consequence that's really fascinating about the kind of active social learning that's scaffolding, scaffolded through other minds that I was talking about earlier with the Thinking Through Other Minds paper is that active inference agents can come to enjoy a degree of synchrony between their internal states. So this paper by Carl Friston and Chris Fifth, a duet for one, it utilized simulations of songbirds to show that quote, generalize synchrony is an emergent property of coupling active inference systems that are attempting to predict one another. So in rough terms, what they demonstrate is that according to active inference, meaningful communication between two agents requires that they are sufficiently able to model one another in a kind of infinite regress. So what it is is me modeling you, modeling me, modeling you, and that by doing this, by making and testing these kinds of predictions, we essentially ultimately converge on model synchronization. And this is a core part of the paper, the original paper by Daphne Demeckis that she did with Friston and Parr, which has already suggested that if we take active inference as a starting point for building affective computing devices, then what we have is the prospect of an artificial system which can potentially sync internal states with the user. And that's obviously going to hold an awful lot of promise for certain applications of affective computing. And I would say to anybody interested in the kinds of things I'm talking about now to go and start with this paper by Daphne, because I think it's a really interesting and wonderful starting point. But one thing that we want to say is that for this sort of deep affective synchrony between artificial devices and users, it means that the artifact itself will need to have some kind of interoceptive signals, some kind of internal affective dynamics of its own, and it needs to be able to act in ways that expresses those signals. So so far I've been talking about acting in ways to express those signals and acting in ways to sample the environment. But I want to say something now about the prospect of active inference devices, which actually have their own internal affective dynamics, because I think that the active inference framework has already shown the potential to provide this. And what I'm talking about here is some fairly recent developments in the framework called aerodynamics. And using aerodynamics, we can start to understand how embodied affective states are an intrinsic part of the motivational drive for curiosity and epistemic foraging. So one of the really elegant famous strengths of active inference is that it has the power to dissolve this opposition between explore and exploit. And while there are there have been numerous strategies for building the motivation to explore into artificial systems, active inference has the potential to put embodied activity and emotion right at the center of solving that problem. And so this is obviously going to be relevant if we want to build emotional recognition devices that are intrinsically motivated to probe the internal states of their users. So I think it's worth taking a second to refresh how active inference accounts for emotion and affect. So the first attempts to understand interoception in active inference were they bared a lot of resemblance to the way that we were thinking about perception under active inference. So it was about predicting signals, hidden states in the world, except that the signals that the brain was trying to predict were internal signals, they were coming from inside the own body. So gastrointestinal, respiratory circulatory signals, and feelings like hunger, thirst, temperature, pain. These were seen as top down predictions about the hidden causes that underlay those physiological changes. But it was the case that researchers that were working in affective neuroscience. So people like Lisa Feldman Barrett, Neil Seth and Mika Allen, they were quick to add that these interoceptive predictions probably held a kind of special prioritized place in terms of the overall system, because they were likely to ground other predictions, predictions about the external world in terms of what really matters, which fundamentally is maintaining the homeostatic states of one's own body. But more recently than this, affective states have been hypothesized to fill another role within the active inference framework, which is essentially says that felt bodily states, things like mood and other affective states, valence bodily states, they reflect a kind of second order information within the dynamics of the active inference system. And that information is essentially tracking the rate at which surprise is being minimized relative to the expectations of the system. So according to aerodynamics, affective states are essentially just the subjective level feedback about how the system is doing at minimizing surprise at keeping itself within expected bounds relative to the expectations that we had going into that scenario. But the second order information doesn't just it's not superficial in the sense that it just reflects that kind of information, but it actually plays an intrinsic role in modulating the internal precision dynamics over action policies. So from a phenomenological perspective, I think this makes really intuitive sense. So when we're doing better than expected at a certain task, we tend to gain confidence, we might take more risks. And when we enter a certain scenario or task with a particular action policy, which doesn't work out the way that we expected it to, we will be very quick to switch things up and try something else. And in this sense, aerodynamics can be said to keep agents flexibly attuned to the opportunities for success within their environment as they learn and develop new skills and abilities. And the thing to notice is that agents that are outfitted with a sensitivity to aerodynamics and naturally curious, because finding new surprise in the environment, which can be successfully minimized, it literally feels good to us. And the places where we find surprise that we can minimize in the greatest amount is at the edge of our skills and abilities. And this is why we like to find scenarios that are kind of maximally challenging without being frustratingly challenging without being too hard. So we like to occupy areas that are neither too well known, nor too complex. And aerodynamics also plays a role in helping us to direct and enhance learning. So surprise and its reduction rates signal the expectations about the learnability of particular situations. So that helps to guide our attention and prioritize certain areas or certain tasks where we know we can find the most success. And we've already seen this kind of optimal surprise minimization show up in robots in terms of curiosity. So there's this work here by Odoya and Smith where their robots were trained to seek out optimal levels of complexity where the most learning can take place. And specifically now there have been active inference approaches that have begun to use aerodynamics in real-world robotics. So this is really exciting and this is kind of real-world proof of concept in the work of Skilasi, Lara and Syria. And these researchers have actually built robotic systems that make use of this internal aerodynamics machinery. So their work has shown that robots that are equipped with aerodynamics are actually better able to manage uncertainty by fluidly selecting adaptive actions in an environment compared to more traditional approaches. So artificial agents equipped with internal aerodynamics are better able to learn and then autonomously select the proper surprise minimization strategies in any given situation. And they do this by allowing their valent states that second-order information about performance relative to expectation to weigh the selection of the most suitable behavior. So in other words by allowing that second-order information to have a direct impact on the internal precision dynamics over action. And this work showed that this kind of internal aerodynamics also provided a way for artificial agents to navigate the temporal aspects of goal selection. So basically what that means is these agents were very knowledgeable about how long they should persevere with a certain task and when they should give up, which is obviously something that even human beings struggle with a lot of the time. So thinking in terms of aerodynamics, it shows us that affect is intrinsically linked to goal selection. And we want to suggest that by introducing these aerodynamics into affective computing devices, we would start to see devices that are not only motivated to kind of exhibit a kind of curiosity in implementing new policies for action, but we'd actually start to see a real paradigm shift in the affective computing program to a much more biomimetic approach. So instead of just having classification devices in lumps of plastic or in smartphones, we'd start to see embodied devices that can actively engage with the world and that have their own internal affective dynamics based on what we think is going on in living systems. So this is really exciting. And this new wave of affective computing devices would not only be able to perform much better, but we think it might go some way to addressing some of the ethical concerns that I was talking about earlier. But the thing to be really clear with, we don't want to, a kind of disclaimer at this point is we are certainly not saying that an affective, sorry, an active inference approach to affective computing is a replacement for thinking about all of the kinds of social justice issues that come with the implementation of this technology. We are just kind of speculatively saying that it's on first glance, it certainly appears like some of the ethical concerns might be addressed by this new approach. But we think there are going to be a lot of benefits of this new approach. So first, if you think about the kinds of model synchrony that I was talking about earlier and think about that within the context of therapeutic intervention, we think that when we get this degree of model synchrony, any dysfunction in the user's internal dynamics is going to be mirrored in the internal dynamics of the artifact. And so this is going to make the device very well placed to make suggestions about potential interventions. And this is essentially what CBT already attempts to do. So this will be building on approaches that have already been proven to be affective. And the next thing is to think about the fact that so far when we've been talking about action, I've been talking essentially about epistemic foraging. But once you have the possibility of humans and artifacts established in this kind of model synchrony, and you have these artifacts that are properly embodied and able to act in the world, it might be possible for the artifact to begin to install prior preferences about what states in the agent are actually preferable, such that the artifact may actually be able to steer with a degree of autonomy, the emotional synchrony between it and its user to specific ends. So when we have these active inference theories begin to emerge of depression and anxiety and other kinds of disorders, with that full understanding coupled with the kind of model synchrony I've been talking about, we start to open up avenues for the device itself steering the user away from these kinds of dysfunctions. Now it might be the case that these active inference devices come with their own set of worries and ethical concerns. I think it's very plausible that they do. And it's something that we're going to be thinking about as we go forward in this research. But I don't have time to explore it here. But I think the last thing to say is that most speculatively is that this active inference approach also sets the stage for beginning to address the well known value alignment problem between humans and AI devices. Basically what we have here to our mind is a initial and very speculative building blocks of building artificial systems that have a degree of genuine empathy with their users. So these devices would not merely be simulating empathy or passively categorizing human emotion. They would genuinely have their own internal dynamics that would synchronize a match with the internal dynamics of a user. And this is going to be a bedrock for much more interesting and much more rewarding human AI collaboration into the future. And that's the end. I wish I could be there to answer questions. Unfortunately I'm not. I'm pretty sure Mark Miller's going to be there with you. So maybe he'd be happy to answer some questions. But I would also encourage you to get in contact with myself. I'd be really happy to hear from anybody that's interested in this kind of stuff. You can see my email there, b.white at Sussex.ac.uk. And you can get in touch with me on Twitter as well at Midnight Biscuit. Thank you again for listening. It's been a real pleasure to get to present this work. Thank you very much. All right. Should I start? We're back. And yes, please. This is the presentation of Nor Sujid, Learning Agent Preferences. Thanks Nor for joining and take it away. Go ahead. Thank you, Daniel. So just before we get started, I just wanted to thank Daniel and the Active Inference Conference, I guess, for inviting me to give a talk on this project. I'm really excited about that. So just to introduce myself, I'm Nor. I'm a current PhD student at the Welcome Centre for Human Urimaging with Carl Friston. And this is some work that we've been thinking about over the last year or so. Feel free to, I don't know how it works. Do we have questions that, sorry, do we have questions throughout the presentation or are they at the end? But if there are any questions? We'll be taking questions in the live chat. And then the round table, we'll be bringing them up directly during your presentation. Okay. No worries. Okay, so the project is focused on learning agent preferences. And it's, from my perspective, it's super interesting because that sort of changes the dynamics of how you consider the problem setting. And that's what I'm going to start off with. But before I do that, I just wanted to highlight my wonderful co-authors that I'll be sort of presenting the work on behalf of. So we've got Hanna's, Alexi, Zaaf, Lance and Carl. And the project, so the work I'm presenting is based on two different projects. And I'll highlight the different work as we go through. So my, okay, so I'm trying to flip. Okay, let's, yeah. So the way the presentation is going to be structured is just I'm going to briefly motivate the problem setting and then describe the problem setting in a bit more technical details. And then really drill down into exactly how we can learn these preferences that we can equip the agent with. And then some experiments and remarks. So what I wanted to highlight is that when we learn agent preferences, there's usually a bi-directional association between the agent and the environment. And what I mean by that is something that you can see in this graphic really clearly. So you've got the main part that the agent would have taken as it was walking down this particular route when hiking. But as perhaps many other people are joining along, the agent ends up walking along the shorter or maybe the smaller path. So what this highlights is that agent preferences are essentially dictated by the environment that it's surrounded by. So depending on the constraints, so for example, other agents, maybe an animal or something else is happening on the road, would mean that the agent ends up taking the second part instead of the first one. And as a consequence of that, it changes the environment. So as more and more agents do the same thing, the shape or the construct of that part becomes more prominent and it becomes part of that environment. And what I'm really interested in as part of my work is this sort of bi-directional association between how the agent changes the world and how the world changes the agent's preferences because it constraints the actual state space that it exists in. But as part of this project and the work that I'll be presenting, we're purely focusing on how the agent's preferences or how the agent's, I guess, objectives or changes of consequence of the environment constraints. But before I do that, I really wanted to highlight what preferences actually are because a lot of the time we're not really aligned on what that means. So I'll just briefly describe how we are considering preferences. So preferences here are usually a subjective assessment of what agents would like to experience. And this can be continuously land or modified, even in the absence of external feedback. So there might be some internal motivations or some objectives that the agent is learning internally that shape exactly how the agent wants to behave in the world. So in the previous example, when we were looking at the case of going between the two parts, depending on the constraint of the environment, it's essentially the agent's subjective preference because it could have taken another route, which, so for example, maybe here. And that could also shape the environment as well. Okay. So let's motivate this problem setting slightly more formally. So we're going to be working as part of the idea that these agents that we're interested in have an internal model. And this internal model is composed really briefly of three important components. So three important random variables and then one deterministic variable. So let me just qualify that a little bit more. So we've got our outcomes. So this is something that the agent is exposed to in the environment. And for example, the actual, I guess, constraint in the instance of the hiker being having to choose between the two parts. For example, it might see a hindrance or it might have some graphs or some other agents that are exposed to and that's the data coming through and it needs to then identify whether it wants to choose one part or the other part. So that would be based on its own inferences about what that outcome actually means. So that's noted by S here. And then based on that, it needs to then decide what action to take. And that's denoted by the A here. And at the same time, the agent is keeping track in this particular formulation of the agent's model or the gender model that we're interested in, which is denoted by this deterministic variable h2 or h3, depending on which time point we're interested in. That is essentially this deterministic recurrent model that we have that's encoding all the prior history, the actual states, sorry, the actions and the states that the agent has been exposed to and sorry, has selected in the past and that encodes what is the updates that are then used to select the posterior estimates for the next time point. Then you've got your, in this particular model, you've got your latent state and prior and we're just calculating them in really a specific way. So your prior is defined as categorical distribution. And this becomes really important for us because this allows us to use some conjugacy rules to update the way the agent's preferences are learned. And I'll come back to what I mean by preferences in this technical setting a little bit more on the next slide. And the state posterior here is, again, a categorical distribution that the agent is estimating based on the history and the current observation it's been exposed to. And we've got the standard formulation if you're working within a POM-MDP formulation where we've got a transition function. So this is denoted again as a categorical distribution conditioned on the history of the agent, the agent is encoding. And then we've got an image predictor that determines exactly what would be the next time point, sorry, what would be the next observation given the history and the state. So the idea with this gender model is that you have an encoding of how the agent is representing the world. It tells you exactly how the outcomes are then inferred as particular states that then allows the agent to evaluate particular actions. But I haven't really defined how the actions are selected. So we work in a standard active inference setting where the actions are defined as being sampled from some probability distribution A. So the actions we saw before, which is calculated as the arg max of minus g over A. So what is that exactly? So for our work, we were interested in essentially extending the minus the expected free energy with a conjugate prior. So the expected free energy in standard terms would be something where you have an extrinsic imperative, you have a salience formulation, and you have novelty, extrinsic imperative is something that's constrained by the environment. Salience is when you want to have acro belief updates and novelty is when you want to be able to estimate your world appropriately, given the parameters of the model that you've been able to learn. And when we extend it for the preference learning setting, and this is something other folks have done, maybe introducing it as part of prior over the outcome space. So we're looking at as part of a prior over the state space. So let's see if I can get the color to change. Okay, so this is the prior and we're conditioning in on a categorical distribution D, which I highlighted before. And that allows us to take into, I guess, use some of the conjugacy rules of interest. So now I'm going to go to the next bit. So sorry, just going back here. So what do we have so far? And before we move on to how do we actually land this D that we were interested in the previous slide. So we have an agent who's equipped to the model. The agent is interacting with the world. And based on this interaction, the way the agent is learning its preferences can shift. And the way it's shifted is a consequence of the type of actions it's making that we had using the expected free energy. So how do we learn preferences? So in the literature, there's been, well, at least in the psychology literature and some of the, the spiking neural networks and other formulations, there's been a few different ways it's been proposed how agents are learning preferences. So one of them is mere exposure effects. So the idea that when you're seeing something quite frequently, that that pairing is more absurd, sorry, more preferred than if you want seeing that. And this can be categorized as a heavy processing learning rule. Then you've got attention mechanisms where when you're attending to an option, it becomes more preferable. So maybe you're selectively looking at X or Y. And based on that, you're filtered out all the other data that you've been exposed to. And that's what becomes more something that you as an agent would prefer. And if you were to think about a more, I guess, biological construct of that, that might be a consequence of some synapticating that can encourage the enhancement or some sort of suppression of data or the noise that you are exposed to. Another formulation could be the contextual effects where an option is only preferred when it's compared to some other options. So there's this relative comparison happening. And based on that, in particular, in certain settings, you would want to do X instead of Y. And here X could be taking the part, the longer route, sorry, the wider route in comparison to the maybe the tinier route that we saw in the hiker picture. And that could be you only prefer the wider route when you don't see an animal or some blockage there. And this encodes some behavior relevant signal selection for the agent. So now what I'm going to do is I'm going to go through some rules and some formulations that we've been thinking about in terms of encoding preferences. And they are the way we've sort of parameterized the learning of those preferences is aligned with some of the, I guess things people in psychology have also been thinking about. So the first one is learning preferences from your exposure. So we start off with extending the agent's gender model with a conjugate prior over the prior police. And what that simply means is that we take the category of distribution that we've conditioned our prior state on. And then we introduce a prior over that. And that allows us to take into account some of the the conjugacy update rules that we're interested in. Because our categorical distribution is, sorry, our prior distribution over the state space is D, which is a categorical distribution, we can now define a durational distribution as a conjugate prior, which is denoted here. And we've just got two different formulations here of how you can update that. So essentially, as you're exposed to more data, you are introducing your counting your suit accounts over the summation of all the suit accounts in that particular factor that you're interested in. So taking this into account, we can do heavy learning rule, which we do using online interactions through preference, prior preferences. So the way it works is that given your hyperprize in the time point before, you can update those based on some alpha rule, sorry, some learning rule alpha, and also the belief updates you've had in the past. So this is denoted by the S with that particular suit accounts or the tertiary parameterizations for this current time point. And you add all of this together, and that gives you the updated preferences. So the way this particular formulation works is that the more you see something, the more you're going to prefer it, because there's a very simple learning happening here. And in the simulations and the way we formulate it at the moment, we've got alpha set as a static parameter equal to one. So you can manipulate that and the way you manipulate it where if you have it alpha going to greater than one, you will then weight the new data coming in a lot more in comparison to if you had it alpha less than one, then you're not taking into account the weight of the new data coming in as much. And what this allows us to have is accumulation of particular contingencies, or the way we are parameterizing our prior and these dictate the land preferences. The next formulation of learning preferences I wanted to speak about was how we can learn by attending to preferred options. So here we're going to slightly change the model by introducing an additional preference learning component. So we again are extending the agency end to model with conjugate prize over the prior beliefs or the hyper prize. So this is exactly the same as before. But the different thing that we're adding here is the synaptic gates to encode preferences. And these just are computation homologues that we've introduced in terms of the actual biology of like the formulation, I think that's interpretation, but these allow us to have an attentive mechanism, which is what we're interested in. And we do this to a two step procedure. So first we encode memories. And how that works is we take a sample of all the data that the agent is interested in. Sorry, the agent has actually okay, sorry. So the way the encoding of preferences is working is we have two components, which we're then combining together. The first component is the online exchange of the agent has had. So by online exchange, I mean the agent is selecting its actions based on the expected free energy. And that's allowing you to essentially gather data about all the different trajectories that follows depending on what action is taken and it's got the data and its posterior estimates given that the next. So this would be the on policy. The second bit that we have is the imagined interactions. So this is when the agent is offline in the sense that it's no longer being exposed or given outcome outcomes based on its imagined or the way it's interacting with the world. And the only thing it's getting is the updates in the states, the latency depending on what actions it's selecting. And this is the imagination part. So what we're doing is we're combining this together. And we take 30% of this and all of this. So these are 10 steps into the future. And then we're essentially interleaving them together. And that gives us an encoding of the memory. The reason for only using 30% is to allow for the imagined interaction of the new things that the agent is considering to be taken into account. And the idea with the interleaving is that we're allowing for but the real experience and the imagined experience, the real is here, the on policy experience to be used to shape the way the agent is encoding its perception of what has happened in the past. And then using the encoded memories, we then encode the preferences using a selective attention process. So this memory buffer that we have here is what I showed in the previous slide. So this is the memory buffer. And then using this, we essentially encode the preferences using two gating mechanisms. So the first one is an attention block. And the second one is a gating block. The attention block essentially weights some part of our distribution slightly higher. And then the gating block constrains or restricts that data out by filtering it. And we are optimizing these two blocks using maximum entropy. And the idea with optimizing through maximum entropy is to allow for some shifts in what's happening. And by shifts, I mean to have a more flexible representation because we're trying to maximize the entropy of the distribution. Okay. And this formulation allows us to encode filter contingencies that can dictate land preferences. This is a slightly different formulation built on the same Bayesian updates. But we're introducing this selective memory component. Okay. So I'm going to quickly go through the experiments. So we evaluated this, the two algorithms in a 16 by 16 by 10 grid world. So an example grid is here. So the agent is presented with this image, including its own location at each time step. And we've got four distinct states. So we've got red, we've got blue, we've got the light green and the dark green as well. And in this particular formulation, we have no reward or score outcome modality. So the agent is learning purely on its motivation to understand the world. And if there are questions about that, we can talk about it. And the grid is changed every K steps and the K determines how volatile the environment is. And at each episode, the agent is initialized in some random locations. So maybe here or here. And that constrains how it interacts with the world. So for sake of time, I might skip this bit. But essentially, the tables are highlighting exactly what the training parameters were, and they were fairly consistent. Actually, they're exactly the same between the two algorithms. So the pepper formulation, where we have heavy plasticity and the north formulation we're doing non reinforced updates using selective attention. And then the preference learning parameters that like how long the planning horizon was. So it was 15 and we have an episode length of 100. And we do this for 50 episodes and we reset. So K is reset every one 25, 50, 75 and 100 steps. And this gives us a nice way of evaluating what's happening. So then the first thing we were interested in was evaluating how the preferences are shifting in a static setting. So this is one where we have heavy and so this is heavy and learning. And this is where we have the attentive gating measure here. And with the heavy and we can see them. So this is the on the x axis, we have the states dimensions and the y axis, we have the epochs or the time consideration. And this is the same for both the figures that we're seeing at the moment. And we can see with the heavy and one that as we go further down, it becomes more concentrated and there's no really shift here. Whereas for this, for the attentive preference learning measure, we've got these random blocks that are paired at once there before, but they also disappear. So it's quite interesting that over time, the preferences are shifting as a different measure compared if you're doing a qualitative assessment of the comparison between the two. And then the content and our post talk analysis of trying to actually understand what is happening quantitatively. We compared the heavy and learning formulation with the attentive selective attention preference formulation. And then we compared it with the baseline, which is the expected pre energy of g here. And on the y axis, we have the environment volatility again, this is the k percent k, but just denoted as percent. And then we on the y axis, we have preference satisfaction and exploration trade off that we're using as positive distance. And this is just evaluating how far particular trajectories have shifted. So what we're comparing is whether there's an increase exploration or not depending on which preference metric we're using, because based on our qualitative assessment that we saw before, there is this shift between the different encoding of preferences given the preference metric, sorry formulation that you're using. So we can see that when we get to 50 percent volatility, there is this shift from exploitation to exploration for the pepper algorithm. So we see that here, based on this nice mode of the distribution that we're seeing here, whereas for the new formulation, it's slightly expanded out, but it's not as exploitative as the pepper formulation. And then when we are looking at extreme ends, we can also see that there is this shift from exploration to exploitation for the pepper formulation, but we don't necessarily see it for the more formulation. So at the moment, there are some quantitative differences in the way the two agents are encoding preferences and how that shapes the behavior. And this is just an example of the new formulation in terms of how much it's exploring the path and the ways interacted with the environment. So these are the grid worlds, and this is just a heat map of that exploration trajectory. So I'm just going to quickly do some takeaways. So both of the formulations that we have, even though quantitatively and qualitatively they are providing different ways of encoding preferences, they do have a tendency to influence the agent behavior that's different to the baseline of the expected free energy where you didn't allow for this change in preference assessment. And if we are to compare with the standard reinforcement learning setting, we are here casting what is preferred to an agent instead of the environment or the designer. And it's particularly important in a robotic setting where you want to be able to go back and allow the agent or a robot to be able to shape some of its own goals and objectives, purely if it's working in a more creative setting. Maybe in settings where it has to do a really specific task, this type of formulation might not be the best. But this just provides a flexible formulation where we're only modifying the preference learning component and using the same gender model. So the initial gender model that I defined for the agent is kept consistent for both the pepper and the normal formulation. But the behavior that we're getting is through this additional component that we add. But the key thing to note is that these preferences are a consequence of learning a suitable gender model. So if your gender model isn't that good, then the way you would learn the preferences themselves might not be the best. So there's a slight trade-off because whoever's designing this formulation or working with this formulation needs to take into account how well the gender model has been encoded or learned by the agent. And on that particular note, I'm just going to end the presentation. So thank you so much everyone for listening. And I want to thank my quarters for this great work and my funders. And I've got the QR codes for both the papers if you're interested. And that's it. Thank you. Thank you. Awesome talk. Okay, I'll just stop sharing. You can depart the room and you can rejoin for the roundtable if you'd like. Otherwise, thanks again for joining. Thank you so much. Have a good day. Bye-bye. See you. Bye. Awesome. All right. The next talk is going to be by Wen Hua Chen, Dual Control for Exploitation and Exploration and its Applications in Robotic Autonomous Search. Just one second. All right, we're back. We're back with Wen Hua Chen. Thanks again for joining and please take it away. Okay. Thanks for inviting me to this particular meeting. It's quite interesting for me. As a way I'm not always operating in this particular community. So thanks for providing me this chance to share my working experience with you. So my talk is about how to develop an autonomous search strategy for robotics and investing in a context about a chemical, biological, and other interesting dispersion. So the approach I'm developing is called Dual Control for Exploration and Exploitation. And as you will see, it's actually a lot of similarity with the active inference theory. So I came from Lafayette University. I'm working in the electric space and automotive engineering department. So I have quite a strong engineering background, but not on the neuroscience. So this is the basic outline of my talk. And I can show you some background about the application and also discuss design method and then talk about the simulation and experiment results. And also particularly, I'm interested to share with you about my thinking about what are the relationships between the approach we discussed here and also the active inference, reinforcement learning, and other similar kind of area. And then talk about what is the way, move forward. So let's talk about the autonomous search as a case started. So this is a, basically it can be widely found in the natural environment. And for the polar bear, they try to find the prey or the food, and they need to use smell, odor, and also similarly, you can find the insects. If they are such a mating and other food, they can use similar kind of strategy. The idea here is by making use of the senses, they need to reasoning about where the food or the sources might be. Then think about what is the best strategy in order to find it. And particularly interesting for us is how to convert this kind of intelligence from natural world into the engineering area. So we are able to teach robots or UVs, try to search chemical, biology, and resources. And also in the future, you can be using for environment enforcement, for example, and try to find the protein where the sources and many, many other application. So basically, you can think about the idea here is somehow like a try to develop a smell dog, which can sniff around them, try to find the drugs and other dangerous materials. This is not a new area. There are lots of research in the area, and particularly bio-inspired, lots of research. People can be using the chemical taxes or the other reactive strategy like you fly down the wing. If you found something, you try to follow the trace in order to search the sources. Another mainstream work in this area is based on we call the information theory approach, which is treat this kind of process as an information game process. Somehow at the beginning, you don't know where are the sources and you have a high level of uncertainty. Then during your search, you drive the level of uncertainty lower and lower. So this is actually, you can think about this is an information game and then you can use the reward function like entropy, like the k or divergence of many others to measure the success of your search. And then based on that, you can derive the strategy to drive into this. But now we're looking to another angle and we should have a control problem and then think about how to link this with active inference and the same kind of work. So when we try to search the source and you don't know any information, so basically on the robot, you have a chemical or bio agent gas sensors. And then based on that, every time you need to really, where you need to move your robot, in order to have a best chance to find the source. So there is a strong interaction between the robot and the environment and the environment robot. And also they have a strong interaction, coupling between the perception and planning or deletion making. So basically if you decide to go to different location and then you will take a different kind of chemical sensor measurement, this will affect your belief about where the source might be. And this will also change your course of action because based on that, you need to decide where you want to move in the next step. So there is a strong coupling between the perception and the planning or action. So and this is also a typical example of like the trade off between the exploitation and exploration. So you need to maximize your chance. So a lot of people in this field must be familiar with this kind of trade off, I will not go into detail. Now I will try to explain to you about the strategy whether we have developed. So and now we try to formulate a problem and there is a math here and you can ignore that but you try to understand a high level understanding. What are the math here is that each time we have x which is the state of the agent of the robot and also you have a variable set of the action you want to take and which is basically try to make the robot move forward backwards or left and right to move to different direction with different step size. And also you have the measurements. The measurements in this case is your first is your gas measurement on the sensor you can have chemical sensors on the robot and also you have send the measurement about your position or your location with respect to the environment. And the other things that we have is the unknown information about the source and also about the environment and which will include the location of the source and also include the uncertainty environment like the wing direction speed which will significantly affect the dispersion of the chemicals or any audio. So the idea here is you take the connect all the data during the process and which is include any time the action you take which is you and also the sensor measurement which is that here. So you add all together give you something we call the information state which is a connection about all the data you have so far and then we decide what is a cost function or reward function it should be. And in the simplest way what you think about okay and my aim is try to move my robot position close to the source location as close as possible which is quite sensible the measure. But the problem now is I don't know where the source is location of the source is. So what is this is why the conditional on all the data we connected so far. So you condition on this one try to minimize this cost function which is a typical way and in our control community there is a particular name for the stochastic MPC model for due control. But what we do want to do is move away from this just doing this we want to do further and because this we move further this is actually built up a link with the active inference. So what do we do is now okay we not just the conditional on the all the data we connect so far but also we connected on some virtual data. So what is the basically is that we also add the future all the actions and what are the future outcome. So somehow because we have a model is that if I do this what is going to happen what kind of same measurement I'm going to get. If I do another thing what kind of measurement I'm going to get and how this manner will affect my belief of the world of the environment. So that means I now conditional on not only on the IK but also conditional on IK plus one and here and I we need to have taking into account the contraction on our future measurements and how future measurement will affect our belief okay. But you know before I introduce the detail mathematics a little more method I try to give you some definition of mutation. So suppose any time you have the probability density function of any unknown parameter theta you want estimate this is a PDF probability density function of that and if you take the mean the expectation of that this will give you something we call a nominal estimation okay use a mean of that as your nominal estimation and you try driving your robot maybe to the nominal estimation location estimate location but also we have to quantify how the uncertain level of this estimation. So how reliable this is your estimation about the environment. So basically we defines the error between them and the nominal one and from any measurements then this will actually give us the variance if you define the variance of this one. So and the one so we do have this notation you can simplify the cost function we had before like a like here but now is conditional one i k plus one and it can simplify it into two terms the first term is about okay so we maybe the active inference we call extrinsic value the x one is the intrinsic value the first part is okay I want to move my robot location to the believed source location which is denoted by the nominal estimation of your target location you want to make this error as close as possible so this is the task you want to perform something so basically move your robot close to where you believe it might be. The second is how reliable this is believed and we quantify here by using the variance and you shouldn't remember those two ranges is derived from the very basic the cost function there's no weight between those two is naturally this is the optimal way if you minimize this naturally the optimal way to do it so that this the cost function is consists of two terms here which is the task the object you want to perform and the uncertainty of your belief so you have that extrinsic and also intrinsic values and optimally combine them together okay so now we can go to a little bit more detail about equations and I will not be into too much but it basically is somehow like this is the best agency equation like the next time the robot position depends on your actions and also you have the dispersion model which is somehow like a Gaussian model but it also depends on a number of parameters which is the wind speed the wind direction what kind of can because you have they have some lifetime in the in the in the state air under them some other parameter associated with location of the source but also we try to model the sensors and you should we shouldn't know that in the in the environment the chemical and also those borrow agents their conditions are really low and many times you may be young not able to detect anything but also you should also know they have a lot of local turbulence which upset your sensor so that means a lot of time you couldn't follow the gradient method try to say oh I followed the gradient I found the maximum the concentration no it didn't work this is maybe what it looks like after the true dispersion fields you can see the concentration change quite dramatically so that's the challenge so we also need to model in the sensor behavior somehow like when the sensor you can have a reading so it's true reading plus some sensor noise whether many times you don't have sensor reading so you just have purely have a background noise of your sensor so that's another kind and once we do that and we have the try to try to see what are the unknown parameters about the environment about source one estimate so basically about include the wind speeder and the direct direction and and others associated with the chemical components or I know them that parameters associated with the target I think it is the source location the release rate and then we can be using in our framework we're using basic inference to do to do it try to estimate those parameters so this is what the diagram looks like so basically you have two parts one is about the reasoning another part is about the planning or control action in the reasoning every time you take the new measurements and you based on the prior information and the model you have this one more and you try to update your parameter estimation and then you build up a previous map about your local environment and then you fit this into your planning here you try to estimate if you take any action what's the inference about your belief and also how to make your agent close to the source so you do the planning here and this somehow like this rate box is somehow like that you try to using your virtual or a measurement to do the reasoning and repeat this again and give you a division and then just keep doing this and so you can do some simulations study about this for example if we start from here this is source this is concentration and then we can put the source in different location and as an agent in different location to see how they are performed and then try to understand the behavior performance and what we can see is that if we quantify the performance in two ways one in the way is we call the successful rate which is if we wrong the simulation 100 times or x100 times what's a successful rate so for the new approach you can achieve about 100% and for the some entropy based original our method you can achieve about 80% chance found that if we're using a classical model pre-control yours have about around 80% of chance to find this another the measure is about how quickly your conversion moved to the true source location which is the denoted by distance from the agent to the source in terms of the root mean square because you're wrong it many times and what you can see the our approach is actually can decrease very convergent to the source very quickly and but others is all can slowly conversion to that and then we also can do the experiment this is excited beta you can put on the physical robot to try to let it run and then you can do that using the software to implement your high level algorithm but in low level you have also have a control loop try to command your robot try to follow driving the follow your high level the decision to see where you want to move the level follow that we're using the sensors and also and some level control to do it so here's some experiment result you can see it's very turbulent and there's a condition change quite quickly the robot start from any points and there's a purple points giving you idea about the belief where the source might be represented by particle filters you can see with the time goes they're concentrating to closely and then to the location of the source and this is after that you can map in the area give you the idea about a concentration in this particular area and also we can do that in flight outside of using your own implement all the strategy on the your base and the real test you can have the chemical sensors here this is all the gps the cameras and all the sensors and you have a low level the ground control station are here and then you can do all the experiment outside it can this is a trial on the chemical prompt they have a leakage of a gas and then what did that happen and you can see the test and we did and the industrial side and the UAV taking off and they got to the chemical sensors on that and they are they don't lose and they where are the sources that based on that they give up some search strategy but first started with we didn't do the intelligent search we just let the UAV to fly around try to pick up the data and then we move to intelligent search and try to demonstrate that so that pretend someone died because the the chemical leakage and then you need to send the UAV to identify where the the problem and then first response can take a proper action to this so then we can map in the area this is not the intelligent search one but it just fly around and they connect the data beautiful understanding you can say where are and they are but then we move to a fully intelligent search and then it's somehow like hands off you don't have anyone actually that is fully that UAV to decide where to fly to connect the data under them and and they try to understand where their source might could it be and this is another scenario I can see here and you have the car here maybe this is similar to the environment you have vehicle involving the accident where they have any petrol leakage that because this might be it means you will have a risk of the explosion under them if you send the first responder to this area that were under the high risk so if you could have an agent to be there try to search where have any potential leakage of gas under the where there might be then you found found them so that's and there's some idea about and doing the this kind of search and and then we also develop this and this one to more broader applications a particular one we call the self-optimization control that in the idea is for any or most of the system you want to maintain it operating best possible way which can change with the environment because when the environment changes the best possible operation way for the change it however you want to you have a system and by taking the same idea as we already talked about they're able to explore the environment influence on them they understand or how to best operate the cell so somehow like here they just change the environment and reward function you just try to follow that and then try to do the best so there's many many applications I just talked about a one this is a situation where we have is about renewable energy you have a PV farm the energy the the end is very simple you try to harvest as much energy as possible but the problem is the optimal operation is changing with the environment with the sunshine the temperature the the the solar insulation have changed your operation optimal operation so what we develop is a strategy no matter what kind of weather condition you always can maintain the solar farm operating the best possible way it could be so the red one is ideally one the green one is actually what we the blue one is what we did is try to follow the optimal this is optimal operation we always try to follow so that's some examples not only for the autonomous search but we can use the strategy to solve much wider range of problems now we talk about the relationship with some others uh existing work one is and there is a the due control concept why we call due control because the control action is not only changing your physical behavior of your system but also changing your belief of the world of the environment so this is why you call the due control but however it's not complete new concept in the control community we did this before but then when it on the dynamic system or how to estimate your own state how to understand some your own parameters the idea now you try to extend it from uh UAV for example yourself understand yourself is more to the environment because we think about it for autonomous driving for the UAVs we have all those information about ourselves what we really need is the information about the environment also about the environment another link is with the active inference the community and this is a very interesting surprise finding from me because when I develop those strategies I don't have any idea about this community and is that somehow like you can start from a completely different area a different angle but however you found you're landing on the similar kind of idea or area so that's the I think the one of the most exciting things in this research so so basically you can think about is the the due control the idea is also about the your action will change the your belief about the the environment and then you then connect the information this will again change your belief about that so there is an interaction between the action and the perception and how they're connected with each other so I will not explain too much because basically in this country people maybe much understand what I'm talking about and another is about that it's linked with us yeah it's linked with the the world by the way there are some other researching this area another is about the reinforcement learning and this might be about the link with the reinforcement learning so basically the reinforcement learning is trying to make an optimal decision for a different dynamic system and subject given a reward function and then problem itself converging to a found a solution for a Bellman equation basically and then what a reinforcement learning do is how to solve this problem by approximate the optimal value function and also try to find the optimal a personally found the optimal policy so idea is try to learn the study by through the iteration but what we do is from the work I'm doing is developed from something we call that model pretty control we try to solve the same problem but however with a truncated the infinite horizon problem into a finite horizon problem so every time we try to solve a finite horizon optimization problem found the the optimal solution but the reinforcement learning try to do using the iteration like learning try to do that so basically and and the whole idea about reading between them is actually explained this paper and I just published if anyone in the community interested in this and they can please have a look this is my view about link between those two and also it's a link with active inference so and because of the approach are different I just given the conclusion for the reinforcement learning they have a free major problem I think about one is they need a lot of data to help them to learn the optimal strategy optimal value function and the second is once if you learn from the simulation environment when the environment changes and all there is a mismatch between your simulation environment real environment the optimal strategy you learn in the simulation environment will not work very well in real life and in particular in this case for example if I learn that try to search the sorting based on this environment okay the wind comes this direction speed I learn how to do it but if you come to real emergency the wind is actually blue to this direction the strategy you learn maybe doesn't work at all because it's not optimal another problem is actually that being for the rain you like a black box it's very difficult to prove the stability or safety this kind of thing but we don't have it in the control we have a rich body of tools we're able to prove the stability safety and other things I think the first tool is able to solve it by active inference as well like reduce the the number of data required and also the have to deal with the unknown environment but however the control you also have another office which is about safety about this so this is a quick try to think about how the link between the approach to talk about here and reinforcement learning and then we come to discuss more himself for this particular topic and then the the kind of work what we are doing we try to move this from single step to multiple step looking ahead and find a horizon but this will have a problem about the computational load because now we have to deal with the much higher computing the how to reduce the combination of load is quite difficult and last time when I talked with uh Carl Friston and given me some ideas about the team how to do it we are trying to learn from your community about how to reduce the computation alone another thing is what we are doing is so we try to prove that uh send some regular properties for the approach somehow like try to prove they can converge into the true sources and there if you do anything there your belief can converge into true as the external environment under them and and also and you are able to prove the safety of that which is particularly important for our area when we deal with cars and aircrafts we have to prove it is safe to do it so we are working on in those areas yeah so yeah conclusion is we developed some new framework in our way and we tried to make system can operate in our own new environment by somehow like a trade-off between the exploitation exploration and try to understand the future action is influenced on our belief so this is uh can deal with the coupling between the action and the belief and and then our approach is not somehow like as many other approaches they are saying about uh if you want to trade off your artificially stitches uh intrinsic or extrinsic values together add some weight and is actually this naturally it happened is is uh it is optimal in some way and the particulars and therefore as i think about it is we have a particular way for the community of active difference we can learn a lot from there because they have a community here people working in this area develop many good ideas and the question is how to send it to that how to promote more collaboration between those two uh community i would think is one of the very interesting to do so yeah that's all of my uh talk i think very much come to the end of my time yeah it's under nine minutes now yeah so that's good okay yeah thanks yeah so i will join the round table yeah thank you yes okay thank you i will leave it first then we'll rejoin okay thank you all right well welcome back everyone so as it turns out we may or may not have any people join we will have just one join right now and then we'll see who else joins and so welcome back Wenhua um was just saying that anyone who's in the live chat please feel free to write any questions in the chat and we'll be looking at them Wenhua i'm not sure who else will be joining us but i think actually this will be a great opportunity just for as shorter as long as you'd like to talk a little bit about a little more about what you presented on and also connect to some threads that were happening earlier in the presentations how does that sound and we'll see who else joins sound good i actually don't hear you anymore okay one second we'll figure this out yes so we can hear you now um yeah okay sorry yeah welcome so um we'll talk about some of the previous presentations not sure which ones you've seen and otherwise i'd like to pick up on a few threads in your presentation um and then we'll take any questions that people are asking live how does that sound yeah um awesome so one point that you highlighted was this notion of dual control and how it brought you to some of the similar places that active inference has arrived at so i'm curious what you think led to that and what led to those vectors intersecting in a dual control and what is at that nexus or why does it exist that way why is it coming clearer to the forefront at this time yeah thanks for this very interesting question and also i i also think about it myself from time to time i think that there is a send those two areas i feel original was quite a far away we have different strategies different way to think about the the world typically control we normally we are interested in as i said the dual control concept is not entirely new and the people in the in the control community already realized that many years ago somehow like if you take any action the action will not change a behavior of a dynamic system but also maybe you could have some influence on the variables you are interested in in strategic traditional control this could be for a physical system there are some parameters yourself like a mess or a damping or other things you don't know you you buy doing that will help you understand what you really are just like when you're driving your car you have some movement you know what are the mess or inertia my car have by doing some certain action so the people realized that before but the difference is now because we control the move from typical maybe can talk about low level automation now move to high level we call the intelligence or automation or autonomy okay now we move to the more high level so then that means that you cheated the dynamic system like a more like an agent in our way in our language in the computer science or neuroscience is an agent so now we're more interested in not only about our self more interested in about the environment surrounding us the outside so my the work i think is now progress like what i did is move from your concern about your agent yourself more about what are the environment surrounding you how to explore the environment how to understand your behavior in the environment so by doing this i feel we bring the typical engineering heart of sorry the control in fit into this kind of natural water in this kind of biological kind of sense because now you're more interested in how to deal with coexistence with the environment how do you so this is why i view on this but on the other side this is also what i i i reading on the on the active inference but suddenly this community have more say i feel traditionally before that you control people already have some idea about treat human brain as a basic machine so now we said okay i have prior information about the environment i have a new data coming helping you understand that what is happening around me about the environment but i think what an active inference for me i think about certainly there's one side of how you define the water function and in particular talk about using free energy to quantify that this is really wonderful but also i think about it is key things here is about you now think about action the action is also linked with my understanding of the environment about a perception so the carbon so using the action to explore the environment give you a better understanding so that it means you move from just to treat the human as a passive way to receive the information from the environment now is that i can actively do something helping me to better understand what so i feel this may this is my understanding that the key feature of active inference so because this is the reason you can see the naturally will link with the control we talk about it because the control normally talk about it what is the consequence of your action how to take the best action so this is my view of those two areas they come from quite different area gradually because the need of the research and also the uh the something like a trend in the field they are now moving to together in this particular area they come a conversion to each other so this is all what i'm thinking about awesome if i could just like pull out a few threads that i thought were really insightful there so you cited a paper from the 70s about this dual control notion the idea that we had to have not just like a single imperative or have an imperative that contained epistemic and pragmatic components and all that that entails like the unknown consequences of action so what we would have is like expected free energy as opposed to the variational free energy and then you you described how there was a movement from lower level automation control systems the classical thermostat and other yeah like um single variable yeah single fixed point implementations and then as the implementation complexity approaches the multi-joint and way way way beyond how that multi-joint is going to be in a social context or um a changing environment that system that's being designed comes to convergently require the same imperatives of a nervous system which is to say like the real-time integration of sparse sensory data and also that is what authorizes the ecological stance and the embedded cognitive perspective which ties in with some other recent threads in cognitive science and neuroscience like the pragmatic turn and so it really is interesting how like from the technical capacity side the questions that were converged upon in terms of memory and forgetting and all these things are like the challenges that natural systems have been involved in solving for a long time yeah and that's absolutely right that's a very very interesting part of the technology developed but also there is a maybe demand from the society and because a lot of things like the traditional mechanical or thermal kind of a simple control already while existing they help us to boost our productivity for many years but now we reach to the stage that we want to further increase our productivity increase our efficiency and you have to have a highly automatic system and which is able to deal with some unexpected events uncertainty and so in order to have this kind of like capability they also require a high level of intelligence like our human or animal so that is actually another reason the technology that move is because the society and economic demand for this kind of thing to happen okay I have another question about your scenarios that you explored how is explore exploit balanced automatically how can such a large statement be made and how is it balanced when risk is in play like risk of going into a dangerous area so within the search task how is explore exploit mediated and then how does that become more complex or how is it integrated in the model when bodily injury is on the table yeah thanks for this yeah that's a very interesting question and also we try to think about this along our self for a while and the first I would like to say for just using the autonomous search this particular examples to illustrate the key ideas and the fundamental objective for this particular task is very simple or clear somehow like a based on my understanding I try to move our origin try to move close to real life to real sources okay so and then then we formulated this as a reward function or cost function somehow like you want to move your next time so agent the location to the source location as much as possible but however because you don't know that so then be conditional on all the information you connect so far so this is a reward function your origin defined but from there we derived this particular function if formulated in this kind of way and in and also try to introduce the the note about the action will affect our belief and then we naturally you find the cost function or reward function become two terms one consider the two term one term is about the exploitation another term is about exploration so there's a naturally happened there's no like you need to introduce some terms or they just naturally come together in this kind of way so this is why we think about is maybe that is the best way try to do that but also you you you mentioned a very good question about the risk for example if they naturally you're not only something like a target they want to follow but also during the process you need to aware of kind of risk there so normally you have to deal with in our framework in two ways one way is we have to add some constraints in our action generation it means for example one example is in our search if there is an obstacle in some way if you command your vehicles or aircraft move to that direction suddenly this is a risk because the client is going to collide with the obstacles we add the constraints in the search domain you said in this area you couldn't go this is one way to do that but also you can add some we call the software constraints that means in the cost function you add some penalty in your cost function said okay there is some area may might have some risk you try to avoid if possible and then when you generate or optimize your action you try to take this into account all right awesome I want to connect that to active inference and then ask a great question from Dave so you mentioned how various constraints could be provided to prevent like situations of risk or hazard to the entity and so that is what may work in practice and so that's why it's been so interesting in the presentations to see that most of them have featured at the very least a laboratory robot context and that kind of makes the full stack make makes increases our confidence that the full stack kind of touches like there's like a tesla coil like there is a path through some implementation and the question is how deep in the specifics do you have to go versus how far up in the generalities and then there might be some situations where the constraint is applied in a very situational way but also as you laid out dual control and reinforcement learning you said reinforcement learning is not really amenable to being formalized analytically in practice whereas one active inference strategy that we've seen to balance like a task performance with survival is you say I want to reduce my surprise about the gas cloud and I want to reduce surprise and have a high battery percentage so then it will just like balancing explorer exploit within a drive like to detect the gas or to stay high battery it also can have a nested model that's balancing those drives and so there's also an analytically simple and first principles way to bring homeostasis and risk avoidance into very generalized framings of the active inference framework and then again in the specifics it might be useful to do different model variations closer to the edge but then it's very interesting to think about what it looks like when also there's something in the center that has a simplicity to it yeah yeah thanks for this very useful discussion Sunday there are different ways we can deal with the balance between the you want to do what you want to do and also try to avoid any risk or be the force and then try to balance try to deal with particularly I think the problem here my feeling is called uncertainty because uncertainty will have an effect all those things because if you think about risk how rely how possible reliably this risk might be so there is a lot of things like this and my this is come back to my my fellowship my fellowship was funded by EPSRC for five years I concentrated on this particular area the goal of my fellowship as we already somehow indicated we try to increase the level of autonomy by this more intelligence algorithm into the control and and and ask and field but one of the idea for me is I we could want to develop something we call the goal-oriented control system it means we want to promote because high level of intelligence like us people or animal they are more have a goal-oriented behavior you write them just people you said oh I give you the goal you try to figure out how to do it if you have a less intelligent one you said oh I need to give you instruction every step about how to do things properly so first is the promote goal-oriented behavior said okay for this particular one what is the task what we want to achieve what's your requirement so then second things in the key integrating my framework is about constraints the why the constraint is so important because you try to avoid any like before any risk things or for example if you want to develop a autonomous driving cars you have to follow the rule of the traffic and the road and they have to follow that and they have followed the traffic light and other things and they and you don't want to collide with any other vehicles or or hit any pedestrians so those are the constraints you have safety constraints in this case but you also have a lot of physical constraints because if they have a maximum power maximum like temperature or pressure your system could have otherwise you are going to destroy yourself so constraints play a very key role here but the the suddenly the question as you said for different scenarios how to abstract formula these constraints is a question but the whole idea is you have to for any decision or action you made you have to respect to those constraints before you try to think about our one that she really meaningful objective the another key part of this one is uncertainty the uncertainty came from many many ways could be the environment change or uncertainty we already talked about but also could be uncertainty about your information because you have a sense sense could have error or in the sense that range is not enough pick up all the information you need and there is a yeah so there so and also your uncertainty about your belief of the world for example we talk about the risk but how reliable this risk it could be so the I will think about it and this in my belief in the whole framework for due control is a key thing is about how to quantify the uncertainty about your belief of the surrounding environment this is driving you are you are rewarded somehow like a explore exploration so those are the key things in my view and basically if what our idea is to think about you talk about in the active inference we are able to formulate that in an electrical way and then if I formulate it in an electrical way we are able to using some tools or theory people developing the last 30 50 years in particular in the control community and other community we're able to formally prove if we're doing it in this way we can make sure you are able to satisfy certain safety requirements we will not hit the some obstacles you're not for yourself in the danger in the for example like you eat in the natural world eat by other animals for example so so you this is the dream and we have some progress in this area we are really able to do some very simple system we are really able to prove stability or safety of the the due control but however is and we want to do is try to tend to much wider world so therefore the free energy principle is much more difficult because the energy the function level is quite complicated how to understand that is is another level of difficulty but however in principle we are able to work together to push this thank you there was really a lot in there I'd like to make one remark on active inference and then bridge to quite Dave's question in the chat so you highlighted uncertainty and that's of course a core aspect of active inference with bounding surprise using free energy and then you mentioned goals and constraints and that really seemed to me like a common point with cybernetics and goal orientation constraints general systems theory cybernetics branch but also engineering and entrepreneurship and innovation as your colleague Stephen Fox has worked on and those perspectives on goals and constraints of potentially nested systems of organization like projects within an organization or a firm within a market or a cyber physical system nested within a firm within a market yeah to have interfaces that can even just be described in an information partition way and then you brought it back to safety and to be able to have certain like probabilistic or formal guarantees on those kinds of arbitrarily composed systems is very exciting direction um any remarks on that or I'll ask the question from Dave I just have a quicker remark I think it really for me I'm more on the technology side but however it's absolute right this kind of goal and the behavior you can be using for the social organization for using for the biological or many many other things because basically when you survive you have a goal this goal you try to survive on the food and then the society you want a new organization the goal is to try to find a more a profit constrained by the kind of the marketing environment and many other things yeah yeah I absolutely agree with you I'm just gonna make one more comment to refer back to an earlier talk not sure if you saw this one but this was Tim Verbellen's talk and he was talking about video games and about how just with a only a epistemic drive some of these video games were able to be played very well like with Pong and with Mario and then it made me think about how in a lot of video games staying alive is an imperative whether you're just tapping something or a maze or something that's growing or shrinking it's like staying alive bridges the gap between search and exploit and all these different behavioral modes it's like if you're not staying alive if there's no like idle process for your CPU then it's over so then yeah there's no point in future epochs so that's um really interesting so let me go to Dave's question so Dave wrote Professor Chen I didn't notice discussion in your presentation of of affect or emotion or drive what do you think about adding such mechanisms to the dual control model could this amount to explicating internally generated motivations or would such a trick be a mere disguise for inserting arbitrary experimenter specified rewards yeah that's that's a very interesting question and when my reading the literature in active inference and I'm fully aware of that the focus of our research is quite a level and we focus on do something specific or useful and in this case for example in the case the atomic search is just tried to find the location of the sources and in the in the PV farm for generating renewable energy the object is quite simple said okay we want to harvest as much energy as possible given the condition so in that one means the focus of our one is very specific but however when we come to the biological and also human and others they have lots of things like the the emotion and many other things we didn't consider in to in our current work and I can see this is the gap between the dual control and the active inference is actually you can deal with talk about a more general scales like in power or confidence or some other more things emotional things and and this is actually the direction of the maybe the future of the our work should move try to learn from this community about that but in principle and as I said if we are able to formulate those kind of emotional many other things as kind of reward function for example you you you encourage people to be happy to but you need to have some way to quantify that if you couldn't capture that maybe you will not promote the behavior try to get make you happier so that in the key things for us is how to formulate interesting reward function and and this is also what you can learn from active inference in principle and we didn't have a restriction on what time what type of reward function should it be what type of constraints should it be the framework is quite a in some ways quite a general but however for more complicated systems still did lots of lots of work yeah that's very interesting if I could build on that so I was imagining the setting of firefighters and there's some different chemicals that are being sensed and about the way that the extended cognition paradigm and seeing the cyber physical team as being just qualitatively whatever tools are using even if it's just the walkie talkie and their bodies it still is like an extended cyber physical team and then as we move forward into futures that may in different areas like include all kinds of tools that we do and don't know about so then that's what's interesting about a framework that can start and pick up in the like qualitative zone and then also take it all the way to the application and then as to value alignment I thought that what you just added had some seeds of new ways to think about the human robot or just more generally the multi-entity alignment question and that is their preference vectors or some of their preference vectors could be about the same thing and in the same direction like the semantics of what the firefighter on the survey says I want I want to like balance x and y and z in this ratio and then there can be an entity whether that's another person or some kind of explicitly structured cyber physical entity that also has that same like preference distribution and then that is like a formal way to find that probabilistic alignment because the question has to be addressed one way or the other and how will those distributions align accidentally or using some other simple heuristic yeah that's that's absolutely right it's actually when we talk with our end user and also firefighter and also the the Ministry of Defense they're interested in the for the terrorist attack or some other kind of things in the in the for even learning the tuber station there's some scenario like this and here they are they are talking about is very rarely just sent a robot out and let it do off do the other thing these actually always have an interaction between the first responder and a robot robot pass the information about where I thought it might be and then the operator first respond to provide some extra information and they always work together the team and also there's some preferences for example in some areas based on the first responders experience they knew where they are more likely could be the women they see the picture or see the season so there is always the interaction that you upside this is actually our future step it's just somehow like to promote this kind of interaction between the human that in some in preference on many other emotional other urgent things because I'm here if you close to maybe exit or some other you maybe have a priority to search or find whether there's some problem there so okay a few remarks on that so it made me think about how the response network like of an emergency could be seen as embodying a prior like where I am in California maybe firefights are a certain likelihood different seasons so then a phone call to emergency services might have a different likelihood of something being the case in one region versus another or it can be kind of used in like a Bayesian filtering way where are we getting 15 reports of the same thing and then where are we like we're already working on that situation and there's something else that needs attention to be brought to it so we don't want to like have a sampling bias and then that question of how training data in a static learning context results in basically like biased implementations in the real world was was a theme of several of the talks um a question I wanted to ask was about the turbulence of the flow it brings a chaotic multi-peak landscape because what I saw in the cloud was there's like a it's like island chain it's not just one ridge of sense and so how does a smooth Gaussian variational approximation make sense of something that otherwise we might think we would use like complex simulations to resolve I think you are a real expert that's a really good question and we struggle with this for quite a while and that and that and that you can think about that you have a lot of local turbulence is there and also coupled with the any case if you have a chemical biology dispersion the concentration in the air is actually really low and and also for our uav the small drones or the uh the robot the sensors they carry is not very advanced you can't carry some labor grade very comprehensive equipment it's actually we use a very simple uh gas sensor so that means you have a high level of uh the mis detection or and also send this what to make it even worse is because you talk about the the the turbulence you can make and when we operate on the uav the preparer didn't help us it itself generates lots of local flow it's upset our sensor so it's it's hard now it's somewhat appreciated lots of hard work getting into this area but but try to answer your question is and when we are reasoning we couldn't use a more complicated model but however the complicated model sometimes didn't give us in this case didn't give us too much benefit for two reason one reason is the complicated model is always have a because the environment is have a high level of uncertainty and even you have a complicated model if it accuracy to represent the real environment then that's a complicated model give you benefit if you the the model uh is environment so much uncertain there maybe the simple good enough model is able to do the job there's a first question but also by using a simple model it can significantly driving your computational lower down the simple Gaussian model it is very simple but however you are able to drive your computational lower down because compared with a safety model computation for a dynamics model or more complicated model secondly is because of the particular scenario the the sensor is not high grade so lots of uncertainty caused by model maybe just disappeared within the sense of noise your sense of noise is quite high even you try to push your model more and more accurate but it's just that maybe one percent so the sensor always give you five percent so it didn't give you too much benefit this is what we found is this situation so i'd like to maybe connect that gas dissipating setting to one of the earlier talks and again not sure if you saw it but just to kind of recapitulate the point so this was from tim schneider's talk and tim spoke about goal-driven active exploration so i'm sure that there would be a lot of resonance there with the dual model and one of the questions that he really highlighted was he asked can the familiar have intrinsic value because he described this problem called the the detachment problem where like a region of high posterior likelihood a good region to be in is getting walled off by like a well explored region and so that dissuades further epistemically driven actions into that kind of ring it's almost like we've already been there but then it's like the you know the the new finding was just one more paper down that that bridge yeah and then we talk a lot in active inference about morphological computation and embodied computation and i think in the setting of the gas dissipating there's several ways like we can be really specific when we're talking about applied embodied computation so one was what you just mentioned with the uavs and the fans and how that was causing a distortion it wasn't just like a free floating sensor and that's like our own body models and so people talk about how the body is a model and it has a model of itself in space and that's how we can do all these actions and then the second example of like embodied and extended cognition was the way that the gas was dissipating so it's almost like the forgetting in the model was happening automatically because you couldn't just build up a heat map of where of you know the integral of gas flowing through a region of time you have to have something that is dynamic but you're local searching a space and so your estimate in other areas that are distal will become increasingly uncertain so what does that make you think about in terms of the work that you're doing yeah all those things are very interesting there are many directions for that and we think about the work we are doing here is just trying to illustrate the very basic principle about how to take the advantage of the action you take to give you the new information and then from there to help you so this is basically as we talk about bear the same spirit as active inference so this is a fundamental thing and in terms of scenario in terms of complexity there are many many layers we can add those things on and also one thing you mentioned about the for example the influence of the agent or robot is dynamics but also it could be for example we consider a source now is in the stationary is fixed they are just released the gas but in the real life maybe there is a mobile and things that people some terrorist for example put something on the on the track or on the something they're driving around or some other things could happen and then there are also people talk about intermittent release sometime release and stop the release again so there are many much more complicated scenarios here and it could be very interesting coupled with some other things people talk about for example you could have more than one sources in this release and it's just not one single sources could we have more than one and also people talk about if you have a larger area you have a number of agency try to work together now we talk about the collaboration between the different agency how to work together more efficiently to to to search the area so there are many many things here and one particular thing that I feel is directly can get from the help from your community now we only look at one step ahead but in real life we already have some similar if we look at five step ahead it actually give us a much better result because you look further after the influence of those changes so so there are but however the downside is where they talk about is about a computational load so now you have a like a true search because each step we have a number of directions you want to move your agents and if you many multiple steps and the combination could be quite a nasty and we now have a research to try to think about using a multi-color true search or similar kind of and the car is also can provide for instance provide some idea about it and you work in the area how can we embed it into our work and try to take advantage of this so there are lots of interesting things here awesome I would like to I think just to create a few of these points of contact pick up where you said that you're planning one step ahead which is analogous to the variational free energy it's like the instantaneous best action and then the expected free energy takes it into the future and I wanted to mention a few different ways that temporal planning is accomplished and some were mentioned in the earlier presentations also it's interesting to note that however many years ago I don't have the exact number but somewhere in the 10 years ago range active inference was kind of like an instantaneous perception cognition action theory and several elaborations have specifically enabled it to account for increasingly distal planning so one example is like in the continuous time setting having a Taylor series approximation of the generalized coordinates of higher and higher approximation depth is one approach another approach that can happen in discrete time is having just a broader time horizon for policies as well as with different tree pruning approaches another approach that can work with discrete time or with hybrid models with discrete and continuous time is hierarchical modeling and so it's very interesting to wonder whether for planning 100 steps out to go to think really deep into how the UAV will do something far away how does it chunk that and how does that chunking into is it 100 steps of one 10 steps 10 two steps of 50 and five of 10 inside the way that it starts to chunk and understand are the ways in which the computational burden is reduced and also those ways that start to resemble the ways in which biological cognitive entities also make sense of their environments yeah I think that's absolutely right that's a very good way to move forward we also doing some thinking this way but not necessarily coming from the same direction because we think about it for example when we make a deletion on on the on the for example using the the autonomous search problem you're somehow like you try to design your your your waypoint where you want to move to that and and then suddenly you have a question about how far the step size this is actually can change it and if in the larger area maybe you want to move a little bit longer distance so somehow this is a you can regard this is a hierarchical strategy and then we actually give the commander to low level UAV control or autopilot they try to follow those kind of things so there is a lot of things that we need to think about that try somehow like is trade-off between the performance computation loader and horizon you want to look at in theory if you're longer you give you better performance but however how far you are able to looking at so there is a trade-off between those terms maybe for different applications could have different factors but the whole idea I think that's quite interesting is actually worth to explore cool we I've also kind of to to pick up on some similarities and differences with the presentations some of them utilized neural networks as modules in their training other used variational Bayesian methods which can be fit it as an optimization problem and also we saw sampling based approaches so I guess just pretty broadly how do you see these different ways of doing robotics and edge computing how do you see them working in different situations or together with sort of pre-trained or updateable models that are large or variational or sampling based approaches yeah that's that's a very interesting topic and that was quite a broad and there are lots of research nowadays and about using pre-training model and particularly in the context of reinforcement learning for example if you learn things and then try to deploy them in the real life particularly for the robotics as well they have lots of research in the area and for me and as I didn't have time to explain or educate those ideas fully but I think about it is lots of problems and whether it's engineering system robotics or the biologic or the human is however the the the city could be described like in how to like make a optimal decision at any time the optimal decision could in terms of try to find a folder or try to survive or try to do something useful no matter so the overall those all those problems in my view can be summarized as optimal decision making problem and for this one even when you have all the environment information unknown and also all the behavior of your agents unknown is not easy but when you try to deal with something like what we talk about is uncertain in the environment it will become much more challenging so and there maybe there's no time being new like a single solution for this this is why you talk about many different approaches they try to deal with and the problem for me is the there are two major approach I'm thinking about it one is I call the iteration process the iteration process particularly like in the reinforcement learning what even means you have some initial strategy you try to take the action from the environment reward function or study change you learn from that gradually somehow like you make your policy or or very function close to true truly the optimal one optimal policy so you're always doing that in the in the iteration learning learning so this is a approach and and but because this approach you need lots of data this is why now people tend to train in the data training the strategy beforehand and then deploy that maybe online you can adapt a little bit to that but you you can use offline train use a lot of data to take the advantage but there another approach like maybe more like in the in active inference or more like in the due control I call it as a purely like an online optimization approach the online optimization process just like said okay given all the scenario given all the information I have with about the environment given all my what is my objective I try to work out what is the best study I should have given all the information I have and about the environment about my reward function I just try to work out what are the best studies so then this problem can formulate it as online optimization problem you try to say okay given all the information connects so far given all the understanding you try to do that and for this approach you need to have a larger computation load normally as you talk about aging computing and this is why we struggle we talk about that one step ahead because one step ahead is means the optimization problem either to solve if you look at a multiple state the optimization problem could get much more complicated and aging computing maybe you're not good enough try to do to do it but in principally I can I regard it still as a too broad approach to suggest that in the active inference this is precisely you try to using free energy or expected free energy as a reward function you try to optimize your action to give you the best possible one so that's why I feel is this kind of a similar kind of approach that we talk about it so I was also while you spoke to the dual control systems thinking about how it might have formal differences from free energy principle and active inference as we know it today or again like we discussed earlier how it might have reached some of the same points from two different angles and it may be think about the particular partition and the mark off blanket or the first in blanket so do you have any thoughts on this or I can provide any more context but first I just wanted to ask you if the concept of blankets was relevant or there was some analog in the control literature from your perspective yes we we maybe we can say we always do that in our control community so we always do that in the sense traditional when we control look at is our own dynamics self and in the sense I try to pick up the information then the action we take so this is the control where is the sensor and your own thing and then take the act but however previously control we very much like ignore the the interaction with the so much about the environment we talk about the environment in the control typically we call that disturbance so the aim of our control system when we are existing is try to act against any disturbance because we try to to do something ourselves or keep for example if you think like or keep your room temperature at a 30 degree if some open door is a little if the external temperature changes all those our control is somehow like I don't care about anything outside I care about is that sensors take the information I do something under them I take the action so this is always we regard the world is information from the sensor and we take the action which will have some influence outside then we take the information from the sensor so very much in the micro blanket I found it is natural and maybe some somebody with some thinking we have I feel this is quite interesting the way to looking at the world and because in this way you don't care about is someone open the door or don't open the door you don't know because you just say from the sensor what's the temperature changing and we take this ignore anything outside that I'm glad you added that because I also had wondered whether the control loop was like a pre active partition I mean input output systems systems with interfaces holographic systems more and more equivalences have been found but it's kind of like the sparses representation of interaction between two agents or with an agent in a niche is like well you know you need the road going one way and you need the road going the other way otherwise it's not like you haven't closed the loop so you have those four pieces like the two entities in the edges and then it is interesting with the way that what comes from the environment is described is it a perturbation or disturbance something it should be controlled or as the case with allostasis and like biological processes that are anticipatory and then going all the way into like novelty which is like the ultimate anticipation it's just the anticipation of the different and then it's like again paralleling the development of industrial control systems going from stabilizing vibration into needing to be proactive there's also again a natural coming together as the attention in these fields turned to finding formal cognitive models yeah you're absolutely right I think that the principle of the macro I think are quite found quite interesting relevant and there the big difference from traditional engineering control thinking to what are thinking in your area or current our thinking is about how to represent the environment in your brain so how you try to say this is a make the whole the whole thing is different because as I said previously maybe you don't care too much everything like today the disturbance or are just interesting that you are hitting system and other and now is actually no I want to have some way to if you want to anticipate something happen it may have a better representation about the environment and also you want to align your belief of the inside state more clues to external states so that is where I think all the city things comes and also where you are able to make some a high level division make a where the intelligence come from because you have better representation about that you can figure out a better way how to do it and then how to get benefit from the environment so one example that made me think of and also connecting back to the embodied intelligence and the morphological computing I remember seeing a robot or a little vehicle and it was able to move over a surface that looked kind of like a stair on its side like it was just very jagged but the wheel that this machine had was shaped like a square that was the right radius so basically it was able to roll perfectly smoothly across that one frequency of stair but a different frequency of stair it would have just been a total total bumpy ride um yeah so if the wheel would have been small then there would have been a debate around well should we represent chunks of stairs and up and downhill and all that fine scale but then when the morphology is fit to the niche in a way that's natural or in a way that is off sourcing some of the computation to the physicality then that entity only needs to consider like a linear movement as if it was like in a simpler space yeah and so then it's almost interesting to ask um like in some of the presentations they they used um very standardized robotics platforms like the quadcopter and the turtle bot and other standard to the extent I understand standardized robotic hardware which is awesome because it increases accessibility and it demonstrates it in a clear way like tomorrow we'll hear from um lego's implementations and so on um but then that element of embodied computation starts to show how we could almost work in a different way and ask what kinds of systems could implement certain functions maybe there are shapes of robots or other objects that we we just haven't seen the shape yet they don't have to have two legs they don't need to be five feet tall they don't need to look like a trash can like that's some of the morphologies but then with the air and the water and the ground there's so many bodies just of insects so it's it's going to be really amazing to see how this is used proactively to design morphologies and behaviors that just do things we haven't seen yeah yeah you're absolutely right I think you're absolutely right yeah and in this kind of I think maybe people call the fitness survive somehow like this and it's actually it's somehow like we always maybe you can use free energy or some other notion to describe because you you're living an environment you want to get the best out of that and we talk about now is a strategy how to do it or how to but however in the natural world because you physically uh happily they are you your things are gradually involved with you and if we are in our design we can choose those physical body as also as options as something we are able to change it and maybe gradually can combine those two things together it's not only your brain to make the best strategy but also you are maybe called the actuators or your physical bodies other things you are gradually will change it with that so this is about how you I feel this how to set up the problem but what I feel much more interesting is uh in the active inference when you're using free energy and then you can try to capture much broader kind of behavior and as I said in the control area we now is maybe more focused on the specific task specific mission this is what we do physically want to do it but however a lot of things like softer skills like a more broadly about it like a competent or capability we don't to go about it but human when we do the things we learn learn build up the confidence or embody our skills and then we do more things now this is a and I think this is a things happening in the active inference they're able to explain that they are able to have a true framework to do this we are maybe at a more on the physical alignment so I can see things that gradually can move together and you make a robot for example more capable to do something if they have the ability to change the bodies or change the actuation change other things or even maybe they have improve their sensing because there's something gradually learned and they found more important than other sensing particular environment they will try to more develop that kind of particular sensing capability so I can see that naturally but the question is about how to set up the problem we allowed it to make this happen well um one pretty thing that kind of that reminded me of from my own area of insect behavioral modeling is there are many models of task performance like digging or foraging however there are fewer models that are describing task transitions even though those are really important for the colony and then the models of task transitions tend to be more generalized dynamical simulations and less getting into the kind of agent-based modeling perspective and so it's like as we're seeing robotics with the physicality and the technology pushing the frontier that kind of task switching becomes important higher and higher orders and then again that brings us into the bio-inspired design conversation because task switching is so essential and the idea that like the human is this unfolding of memory and anticipation and all of these different features and it's hard functionally or neuro anatomically to separate out how the human brain works and how animal nervous systems work and so I think that'll be another interesting area with this tension between explainability and potentially even austerity of the models in that they'll have parameters but the parameters in the way they interact even for a few-parametered model might be difficult to understand because it's not like gonna be all the information is in those generative model parameters it's going to be those model parameters and the dynamics of how gas is in the world that are sort of relied upon like outside of the focus like the the background context for the model so the model itself won't be fully understandable or explainable because it's going to rely on this context just like no sentence can stand alone so there's just so many interesting areas and to see that the directions that different fields can meet at and then start to structure a productive relationship yeah that's that's absolutely right and I think I also absolutely agree with you it's about the the importance of models they in many ways they are very very important for us we quite I knew that in the machine learning area they have a model based approach and a model free approach are more biased to model based approach and because you can use like a model free approach they are quite powerful in some ways but however maybe there are some issues you already highlight how to explain this to the why you make this this kind of decision why take this kind of action and more explainable and and for the model based one because a lot of things we are doing and particularly in our area they are engineering systems they are have a first principle or other models already they are for many years but however is now is the question is about how to make a use of that and so I prefer model based approach in many ways one is for the explanation and try to you can really understand the action but also for the I call that somehow like maybe people yeah use the word at some point efficiency somehow like you can use less data and because you have a model based approach and in the in the our engineering world data means money and I knew a lot of people you say okay the google's facebook can harvest millions of data online or freely but however in the engineering if you want to do something you need to experiment to test that you need to do physical wrong something to get a useful data come out from that so that is the meaning of money so when we're using the model based approach is much more efficient in terms of learning understanding what's happening around the environment and so this is why economically I feel is quite important for this not just for the safety or how to make it more explainable but there are some other reasons well very interesting is there any other remarks you'd like to add or questions you'd like to raise not really I think that's really very good conversation I'm very pleased as from this community as I said and it's a surprise for me and the people share the same idea so what I'm interested in is getting more involved in the primary community and also I would like to open the door if anyone working in this community they want to talk with us want to work together and develop some new ideas as we already discussed and they're more than welcome and you could have my contact under them try to talk with me and so basically I think we share a lot of fundamental ideas and we and also certainly there are some different tools different slide different concept and approach if we are able to working together I will think about we make them not only as a tool try to integrate how the natural world and the human or any more the behavior like how but also make it as a useful tools to drive us like this theme to design more capable robotics and do something for the society amazing great okay close to the first session so you can depart okay thank you wow wow what a great discussion big appreciation to professor Chen for joining for that well that concludes the first interval of the second applied active inference symposium on July 31st or at least it is now where I am and in about eight hours we're going to have the second interval it will feature presentations by Bruno Lara Matt Brown Adam Saffron and JF Claudia it should be a great set of presentations followed by a round table discussion featuring several of those presenters as well as Carl Friston so hope everybody has a good break in the eight hours before the second interval and prepare some thoughts and questions and writes them in the live chat or emails us really appreciate you spending the time and attention listening to this active inference institute symposium and hope that you'll stay involved participate and keep on acting and inferring so goodbye everyone and see you in the second interval