 Hello and welcome. It's September 15th, 2023. And we're at the Active Inference Institute in active guest stream number 56.1. Today we have Gregory Sargent Petri. And we'll be hearing a talk followed by a discussion on geometry of world model influences behaviors. First there'll be a presentation and then any comments that people have in the live chat or any other questions. It'll be great to talk. So thank you a lot for coming. Really looking forward to this. So please, to you. Thank you very much Daniel for the invitation. First, I'm very happy to be able to present this work here. So it's a work around some models of consciousness, at least some computational models of consciousness, some part of consciousness and how it can help to generate different kind of behaviors, especially when we think about, let's say, artificial agents. So it's a work in collaboration with David Rudroff, Kenneth Willie Ford, Daniel Benikin. In fact, it's not just one work. It's a collection of works that have been on for like already around 10 years. So if you want to know more about like this group and the work that we're doing, you can go on the page on my personal webpage. And there's a link to PCM.HTML and there's a summary of all the work we've been doing and some summary of the work we've been doing and also some articles that can be relevant on this subject. So today we'll focus more particularly on two articles. So one that is accepted and will appear very soon. The other one needs to be submitted, which give the formulation of, let's say, autonomous agents or let's say, mathematical formulation of agents that have a world model that is structured geometrically in such a way that this world model captures some features of consciousness. So the first article is, let's say, the review of the experimental results that we have and the like most recent formalism that we have on our work, which is the literature on P1D professional mastering process, which is optimal control, stochastic optimal control. And we just try to tweak a bit this formalism to introduce the ideas with how you can include inside of the world model of the agent some ideas on consciousness and still continue to have algorithms to do inference to find optimal policies and everything. And this article focuses more particularly on how changing the world model of the agents can change its behavior in particular the relationship, the geometry of the world model, the way that it perceives its and the way it acts with respect to for agent strategy. So the way that it looks for something so it will change its behavior in looking for something. So I will start by presenting a bit what are all these terms for world model, what you need it, what is a foraging strategy, what it is to how you how can define an agent that is looking for a certain object. In particular, what is an exploration based on curiosity or let's say more technically epistemic value. So let's consider the setting where you have, let's say a real world so the space that surrounds us three space. You do a setup where you have an agent, which is let's say a solid it has a solid frame so it has a center three axis but it's in front but it's on its side, it's above above it. And it's looking for an object O, which is itself a solid inside of R3. And all the configuration of the agents and the objects they are defined by let's say a reference frame that is external and that they find that characterizes completely their configuration inside of this world. Now the agent A, what it wants to do is to find for the object O, but it doesn't, it can only have like some noisy observation of where the object O is. So to be able to find O, it needs to have some a priori and where O should be to be able to plan the consequences of its actions with respect to where O will be once it has moved. And to update and to be able to plan what the how the observation can act with respect to updating its prior. So in another word, so you have an agent and the agent is looking for O. And things that O is for example in this direction on the right, so it's going to move towards the right and makes an observation if it doesn't see O, it's going to change its belief on where O is. So this is a very standard formalism that you can find again POMDP's stochastic optimal control. But the one other point we want to really emphasize is the fact that the agent A has its own frame of reference in the way that it's going to give the coordinates, coordinates of the object. The way that it will trace where the object is, is using its own way of computing the coordinates of O, so it has its own way to measure where O is. So it doesn't have to refer to a global frame, a global coordinate system to be able to know where O is. So for example, I am always centered on myself. When I move, I see the object. I don't think about me moving, but I think about the object moving. And this is because when I move, I change my frame. So the way I'm going to reference the object is going to change accordingly. And the point we want to introduce is the fact that you can have several ways for the agent to define its own coordinate system with respect to its environment. And this is a key feature that we want to capture in terms of modeling, but also we think that it is something that is very structuring with respect to models of consciousness. Because when you consider more generally an agent, let's say that has a world model in which it has its own point of view perspective on this world model, you can start tackling the problem of how to introduce a point of view on the environment of the agent. How can you introduce in the way that it is going to encode its environment, a certain point of view? And in particular, if you think about it in terms of the role of space to be able to take into account your own perspective on the environment, you can see that if you take away everything that fills up the space, you keep just the space itself. The space allows you to, let's say, to structure the way that you are going to fill in some information about the environment. And it's centered on your own point of view in a way that when you're going to act, you're still going to keep the unfilled space, you're still going to keep the same way of modeling the space. The space will stay the same when you move, when I move, the space is the same. But I talk into account the fact that I can move from one point to another in this space without changing the space. And something that is also very important is that with respect to any movement that I can do, the space doesn't change. So there's no particular point in space with respect to the movements that I do. Another property that is very similar is that instead of considering just my actions and the way that I can change my frame while I'm doing an action, you can also imagine taking the frame of somebody else. So it means that you see that the space that you use to be able to structure your environment doesn't change when you take the perspective of somebody else. For example, if you imagine yourself as a solid agent, changing by a rotation your own solid frame and translating. Okay, so this is something that we will use in fact as a definition for the state space, the way that the agent encodes its environment. And we will see that there's a natural notion that encodes this idea that you can change perspective on your environment and the world model that you have of your environment simply by introducing something that's very well known, which is the notion of a group acting on a certain space. So this is just to sum up a bit what we're trying to do. So we're just trying to do exploration inside of an environment. But we're adding an additional information, which is that the way the agent is going to encode its environment takes into consideration the fact that it can change its reference frame on this space without singling out any point. Okay, so what do we mean by perspective taking in more precisely? So how do we go from the real world? So let's say the world described by the external frame that allows you to say the agent is here and the object is here from the internal world of the agent. So you just define a map that will take into consideration that the agent can have a first perspective on its environment. The simplest case in the Euclidean case is just rewriting the coordinates of O inside of the reference frame of the agent. And then once the agent moves, its frame, solid frame in the real world changes and this induces indeed the transformation inside of its internal world, which will be an affine transformation. Okay, but you could imagine also having other transformations. I don't know if you can see my mouse. Probably, can you see my mouse or not? No. Okay, so the map C, Psy, can also be something else than an affine transformation. It could be like any kind of transformation. For example, in an affine case, the apparent volume of the object doesn't change. But if you change this map and consider, for example, the projective transformation, as we will describe later, you can add that the apparent size of the object changes. So we went from something that is a bit like already very well known and not very innovative, which is that you can rewrite motions. The motions depends on the frame that you choose to describe them into saying that you can in fact change some frames to be able to account for some perceptive property. And what we ask also for the agents that we consider is that they can imagine the consequences of their moves, which on their future observations. So there is something just to say that the agents that we consider have a notion of agency, which is that they can plan the consequences of their actions so that they can choose the best action with respect to a certain reward. In particular, the reward that we will consider is trying to maximize a certain surprise. So to sum it up, we have an agent that is looking for certain objects. So it has belief of where this object is and the beliefs lives on an internal state space, so on the world model of the environment. It can make observation which allows it to update its beliefs and then it can predict the consequences of its actions so that it can plan the way that it should act with respect to a certain objective. So formally this is simply the notion of Markov decision process or more generally like notion of partially observable Markov decision process because we do not have a complete information of the environment but also but only some observations that are that are limited. And it's very related and let's say I mean there's a strong duality between POMDP and active inference so that there's really a strong link between both. So now what I did in the way that I presented it, so I presented it with the idea, I tried to present the idea of what we're doing so the general context, POMDPs, agents that are planning their action with respect to a certain objective. Here the objective is to find a certain object. And I give the formal setting for defining it which are NDP to MDP and now I will go one step further and define it explicitly. And I will continue like this approach for the second part where I will also present our results and our specificity whereas with the general statement and a bit more precise and then really digging into the result. So here the classical way to define the Markov decision process is to consider that you have a set of configurations of the environment which is the state space of the world model in the way or at least the way that you encode your environment. So this is something on which you can act because the agent when it makes an action it changes the state of the environment. And the actions, the way that the actions change the environment is with respect to a certain Markov kernel, probability kernel. So it's stochastic which means that each actions changes the state of the environment but you don't know completely, it's not deterministic, you allow some errors in the way that it acts on the environment. And you have a reward for a function that is associated to the actions that you can do at 90 and also the state of the agent at 90. A partially observable Markov decision process is the same thing than a Markov decision process but you will theorize to say that your observation on your environment are only partial which means that you do not have a precise information on the states of the environment. So the way that you need to relate observation and states is through also a probability kernel which tells you that if you're at state, for example, s, you would expect to make a certain observation. Let's say if you think the object is at x, you expect to see the object at x but with a certain error because you know that your sensors are also random. They can make some mistakes and you keep similarly a reward function. So it's the same thing than an MDP Markov decision process but you just allow to get information on your environment through observations that are not complete. In fact, there's a formulation in terms of partially observable Markov decision processes. Our particular case of Markov decision processes which is called belief MDP Markov decision process. And the only difference that you have between Markov process and belief MDP is the fact that the state space is necessarily continuous in the second case. So POMDPs can be seen as a particular case of MDPs. So we said that the actions of a POMDP act on the state space and you account for the consequences of the actions only through observations. So graphically, it means that you have the state space of your, so the world model of your environment x on which you reference, for example, where the object is, so the position of the object. And then when you act, it induces a consequence on the position of the objects at time t plus one. And then you can make an observation of where you think the object will be or you can make an observation with respect to where the object is depending as you're planning into the future. Or if you're really doing the action, you're implementing the consequence of the actions in the real world. The thing that we're trying to do in our setting is to replace simply the actions by a change of perspective. Like I told you, like when the agent acts, it's coordinate system on the environment changes. So you can always see it as in a way passively where you just see it as a change of coordinate. So we want to say that instead of considering actions on the environment, we have a space on which there's a natural notion of changing of reference frame. And actions are simply certain kind of changes of frames. We also allow to have some actions that do not correspond to changes of frames. We just say that we include the possibility that some actions are changes of frames. And the changes of frames have something that is related to all the transformations that are internal for the agents. So for example, things that it knows that a priori it knows it's encoded in the way the agents will interact with the environment. So the way that we do it a bit more formally is that we say that changing perspective is simply through the action of a group. So for example, in the affine case, when the agent moves, a change of frame is an affine transformation. And we say that the world model, so the state space, is simply a space on which the group acts. So formally, what does it mean that the group acts on the space? It's just saying that you have space S, group G. There's an application that takes an element from G and S and that sends back S. In other words, for every G, you can associate a function from S to S. And you assume also that it has good properties, which is that it satisfies equation one and that if you don't move, you stay at the same place. So this is like just the notion of G space, a space, a group acting on space. But now, if we want to include this in MDPs and POMDPs, we just assume that the state space is a G space, that some actions are some elements of the group, and that when we choose the actions that corresponds to the element of the group, it corresponds to the way that the group acts on the element. So it's very like there's nothing like very convoluted. It's just saying that you define the collection of functions which you see as changes of frames and the way that they act. So the property kernel from the state space to the state space at time t to time t plus one after the action, after the change of perspective is simply through the way that the function changes the state space. So it's just kind of a way of reformulation. But what is really hidden behind here is that we have the structure of the group, and that we don't consider any kind of actions. We assume that there's more structure in the actions that we can consider, and that it's defined, that it's encoded inside of the geometry of the space that we have. So we don't separate any more action in the state space. We say that the state space in its geometry encodes already certain kind of actions, which are the change of perspectives. So there are two cases that we consider. The first case is where the state s is the Euclidean space, and gdfn transformation corresponds to the affine transformations, translations and rotations of the agents and changing, rewriting the coordinates of the objects inside of the solid reference frame of the agent. And the second case where the first state space is a projected space, and the group is a projected transformation. Now we described what are the generative models that we consider, and how we can include inside of the classical theory of POMDPs, the fact that you can take a perspective on your environment. Now we will introduce a classical notion of epistemic value, which will allow us to define what is an exploratory behavior with respect to curiosity. So how, what it is for an agent to explore its environment based on curiosity. So curiosity, or let's say the drive for exploration, is a quantity that will, it's a quantity that you will, okay, so it's, okay, it's, so how do you define it? Just with, like, very generally. So you start with the agent has a prior on the state of its environment, and it plans the consequence of one of its move at the next time, at the next time step. So this changes the prior that it has on the environment because there's an action on the environment, so we get a new prior. And now it's going to make an observation. Once, so it imagines that it's going to make an observation. Once it makes an observation, it's, there's an aposteriori, the prior is updated, so you get an aposteriori. So the way that you define curiosity, or let's say epistemic value is, how informative is the observation that you will do? How far is the aposteriori from the apiary? But this, you cannot do it for one given observation because the way that you compute it is by planning what will happen at the next step, so you need to consider all the possible observations that you will make, so the observations are themselves stochastic with respect to the way that you consider, with respect to a priori on the state space of a time t. As we said, forest actions are changes of frame, so we can also define epistemic value for frames, for changes of frames. And in particular something that we gain that is, I think, interesting is that now we have a function that is defined on the group that can be a continuous group, so you can allow to have, for example, if you want to maximize it, you can, or minimize it, and it depends how you see it, you can do gradient descent, for example. So you can have more analytical tools because you're on a space that is continuous and has some structure. So now to give the formal definition of epistemic value. So as I said, it's based on the quantity c that I will define now. So if you give yourself a priori on a space x and you give yourself a probability kernel, so stochastic map from x to y, stochastic map is simply saying that for any x you will associate a measure on y. Then you can get a joint distribution on x and y, which is simply given by the product p y doing x times a priori. In fact, the quantity c is simply the mutual information between x and y, which is saying how far is the joint distribution with respect to the independent distribution, the product of the marginal distribution on x and on y. But this is like you see it, it appears everywhere the mutual information. So this formulation is based on the paper that is from Friston and All, Active Influence and Epistemic Value. The mutual information is something that appears everywhere, but I think that there's a very nice re-expression of the mutual information, which allows to give a better interpretation of this quantity, which is the interpretation that I was discussing in the previous slide. So if you rewrite mutual information, you can always see it as the Kullback-Leibler distance between the posteriori for a given observation and the prior Kullback distance being a way of computing how far you are, how far the two distributions are, but you need to look at the expectation with respect to the observation that you make. So it's exactly what I was discussing before. So then this is like for any kernel, any kernel from x to y with a prior on x. But as we were discussing here, you have to take into so this would be the kernel, but you have to take into consideration that you can do actions and let your prior are on x. So the way to compute epistemic value for this kernel here, when having only the prior on x is that you propagate the prior by the action on x1, then you get the prior on x1, and you can define the epistemic value for this prior and Markov kernel, which corresponds to the randomness of your sensors, which is always fixed. And this is very important. This one doesn't change, even if you change frames, the kernel that you have relating your prior, what you think about the state space, relating the state space and the observation, never changes. And so explicitly, this means that we after a certain action or after a certain change of frame, we get a joint distribution on x and y, which is the following one here. So this corresponds to the prior propagated on x1. And then the epistemic value is simply the neutral information of this joint distribution. So now how does the algorithm work? How the algorithm that corresponds to defining an exploratory behavior for an Asian that is looking for? So instead, it started with a prior of where all should be. Then you maximize curiosity based on some changes of frames that are around the identity elements which correspond to not changing frame. Then you get an action that you can apply. You propagate the prior to the next step with this action, and then you just update your a priori with respect to a certain observation. And then it loops back. So this is the algorithm that we consider in the second paper that was listed in the presentation. So which corresponds to having an agent that is looking for a certain object, its behavior is defined by an exploratory, is driven by exploration, driven by curiosity, taking into account that its state space is structured by the action of the group, so that the state space is structured by the fact that it corresponds to all the possible ways of changing frames. So it has inside of the state space, inside of the geometry of the state space all the possible ways of changing frames. And this is explicitly what we do. So in this context, this is the algorithm that we consider and nothing else. And then we get a very interesting result, at least I find. It's interesting in the way that it's presented here, I find it interesting. And then if you go more into detail, it's because the Euclidean case is very particular. But what you get is that if you look at the behavior of an agent that is driven by exploration, by curiosity, but that has a state space structured by Euclidean transformation. So the first case where the agent is simply encoding its environment in its reference frame, but nothing else, then it doesn't need to move. In the second case where the changes of charts are given by projective transformations, so the way it encodes through its transformation takes it to account a perspective, a projective deformation of its environment, then it will always try to get closer to the object. So you have two very separate behaviors. So this is something that is, I think is very interesting to note. So before going into the details of how you can prove the statement, I will say a bit more about why it's an interesting perspective point of view on this subject, because how it goes a bit further than simply considering this very simple setting. What you could imagine is that encoding the actions of the agent directly inside of the state space of the agent through geometry could be a way to stabilize the representations of the agent. And this is something that we're working on now. And there's already a really literature in this direction. Okay, so now let us prove the statement. And to prove the statement, we need to give a formal precise statement. So what more particularly what we were able to show is that if you assume that the agent has as moves staying still, so it's allowed to stay still, then if its changes of frame are given by a fine transformation, the agent stays still. Now, in the projective case, if you assume that the agent is always looking in the direction of the object, it will always try to get closer to the object. Okay, so the idea of the proof is that what plays the role for the tribe of the agent is how big the agent is the size of the object in the reference frame of the agent and how big it appears to be to the agent. So in the first case, the volume of the object in the internal space of the agent doesn't change. So it doesn't need to move. In the second case, if it makes a move, what is informative is to try to make the object bigger, because once it's bigger, the posterior will be further from the prior with respect to a certain observation. So it will always privilege moves that allow to make the object look bigger. And in this case, what you can do is you can show that this corresponds in fact to actions or change of frames to closer to the object. So how do you prove the results? So as we said, we consider a change of frame from the real world to the internal world. Here I didn't consider changes of perspective with respect to moves. Here is simply how you relate the real world to the internal world. In the Euclidean case, it's clear. It's just the way that you write the coordinates of the object in the solid frame. In the projective case, it's clearly not obvious, because there are several ways the solid frame of the agent to the projective frame. So one way that we decided to do it is we give some set of actions that we consider to be, let's say, coherent with our own experience of space, which is that we feel that we're always centered on ourselves, that the axis of the solid frame of the agent inside of the 3D space, what is in front of it, what is on the right, what is top, are preserved, that there is no point in front of the object that appears to be at infinity. And then that near to the center of the object, the volumes are preserved. So when you do this, this formulation like this result is in this article here. So the way that we relate some solid frames, like some solid reference of the agent to some projective transformations are in this article, in particular in Proposition A1, where we show that this set of actions limits the set of projective transformations we can consider, and we will only have this projective transformation. And so the change of frames from external world to the internal world is given by rewriting the coordinates of the object in the solid frame of the agent and then applying this projective transformation. So now what we defined here is are the maps that relate the observation of the agents. So we defined the maps in the Euclidean and projective case that relate the observation of the agent, the observation of the object to the way that it represents this object inside of its internal world here. So this is simply like the POMDP that we defined before. And now once the agent doesn't move, it changes its Euclidean reference frame. So there is a solid reference frame. So going from one solid reference frame to another solid reference frame, you can, by applying the projective transformation we had before, define another one that we call Psi that goes from the state space at time zero to the state space after action, after moving. What is very important is that we consider the Markov kernel associated to a noisy observation to be of this shape here, which is that if you think the object is at point x, then you believe that the observation will be around x in the ball around x for a ball of a certain radius, which is a small radius, epsilon. Okay, and I need to charge my computer, so sorry. Okay. So now press epistemic value. As you can find, the kernel could be written by the computer from Leile. So in fact that it has a very simple expression. So the epistemic value of the prior propagated after the transformation Phi, after the changes of frame Phi. So you have the prior here, you propagate it on the next one through Phi, and you compute the epistemic value for this, for the joint distribution over x1 and y, is given by this formula here, where it's simply integrating overall the possible places where the object could be times the probabilistic volume of the ball of size epsilon for the propagated measure and logarithmic of the same quantity. Okay, so this is very direct to right. You can just compute it and you find this expression. But what you have already here is that if you consider an occlusion transformation, then the quantity cubed, sine minus one, B epsilon y doesn't change. So then you have the epistemic value that is constant. So basically if you try to maximize the epistemic value, maximize this quantity, you can do anything. You have always one move which is not to move, so you don't move, it's okay, it's perfectly fine. Now in the second case where you have the projective, where you consider that the way you relate environments and the internal world is through a projective transformation, then it's more complicated. And so you need to use a trick which is that you know that once an observation has been made, the support of the prior will be smaller. And so after one step, if you suppose that you're epsilon, so the size of, let's say, noisiness of your sensor is small enough, you have a support that is of the distribution that's small enough so that you can do an asymptotic development of the quantity in the interval. So this is an equality, but this is an approximation here. And now this is very useful, because now that you know we're doing this in a way, you can say that you just need to develop this quantity at the point where the object is really. So if you do it, you get this expression here. And what appears to play a role is only the determinant of the Jacobian of the projective transformation, which here in the Euclidean case will be one, when a projective case can be many things. So here also is not very clear, because how do you define CM? So CM is defined as a composition of several maps. I didn't go into the details, but it corresponds to the map that are given by the changes of frame. So if you have the Euclidean space and the internal spaces, after moving you have a change of frame in the Euclidean space, but it corresponds to also a change of frame in the internal space. So the way you define psi is simply saying that you inverse the projective transformation from internal world to external world to go at time zero. Then you apply the change of frames and then you apply the projective transformation to go from external to internal. And so it's exactly this formula that you have here, and which is very nice with this. And then you have the first term that doesn't depend on the actions that you do, so you don't need to take into account this one will be one, and then you just need to compute this one. In fact, this one has a very simple expression, so it's expression 23, and then here you can directly see which moves correspond to increasing this quantity and decreasing this quantity, because we want to increase the epistemic value. And so this is the result. This allows you to prove the result like in several slides before, that in the projective case, if you have enough movements, if you look at the object, then you're always going to go closer to the object. The way to interpret it is as if the agent, it was a bit paranoiac or paranoid, a bit like he's very uncertain on his own beliefs. So it knows the object might be over there, but it's always uncertain, so he needs to go check. And once it's checked, it's more certain, but still as it can always be even more certain, it will always try to get more and more certainty. And so I only presented in this presentation all the aspects which are more computational and the algorithm and some analysis of the algorithm we considered, but we also have some experiments on how this setting can allow to generate different behaviors, behaviors that we would expect, explain, for example, some illusions, like the moon illusions, and then invite you if you're interested in this to listen to the online talk of the MOC-4 conference that was in Oxford last week, or to check out one of these three papers. We would like to thank you very much for your attention. All right, awesome. Wow. Very interesting and different ways than how we've seen the POMDP and related works. So, okay, let's just start off with little context, and then I'll read some questions and read some questions from the live chat. So what brought you to study this question this way? What question? The question of... Okay. Can you decide and find consciousness and geometries to be interesting, or vice versa, what kind of brought you to want to make this contribution? So at the beginning, so when I did my PhD in maths, I was more interested in, let's say, I call this idea of critical brain hypothesis. So the way that the... trying to understand the way that the brain processes information and makes it something that can be exploited. And so the critical brain hypothesis tells you that the activity of the neurons is basically close to a certain criticality, criticality in the source of statistical physics, because you can model the activation of the neurons as, let's say, a statistical system, like an ising model. But it worked a lot on this. Then there's another hypothesis that is very common, which is the Bayesian brain hypothesis. And so the Bayesian brain hypothesis led me to active inference, to learning more about optimal control, Bayesian perspective from optimal control. And there was... so my advisor was... PhD advisor was working with David, and they still worked together on trying to implement some aspects of consciousness and how it can influence inference, especially with the article of the Moon Illusion. So I was interested in knowing how this kind of, let's say, ideas simply in a very naive way they interact with the Bayesian brain hypothesis. And then I continued, in fact, one of the lines of my research is structured, algebraically structured statistics on machine learning. And so the way that they see it very geometrically, in fact, or let's say geometrically or algebraically, it's not the same thing, it's very related. It's something that I wanted to understand a bit better, let's say to make it in a formal setting so that it's simply a particular case of what is in the literature. So it's specifying what exists in the literature. And this took some time because I was more on the active inference, like the free energy principle side. So I had to read more about optimal control, so I got second optimal control, and understand that, in fact, what we're doing is simply adding more structure on the latent space of the agent, on the state space of the agent. And doing this allows to ask the question like why is it useful? Why having a state space that encodes different perspectives, so the motivation comes from the consciousness study, cognitive sciences, but why it can be useful for robotics? And it's always been like my motivation. So I had to study some statistical models, more structured models with more a priori, that can be useful for understanding the behavior of a closed system, like an agent, like a collection of neurons, like even molecular machines. So it's a very long answer, but it's okay. Cool. So you focused on the spatial movement, epistemic foraging case. Is there something special about space, or can we also think about this perspective taking in terms of, for example, semantic or a narrative reference frame? So this is, I think it's a very, very important question because up to now, all the work, a reference only to space. So the thing is space in terms of a 3D space, and not space in terms of geometry. And the fact of writing it as geometry allows to get out from the classical point of view, a space as a 3D space. Because you see that there's more and more, like for example, in the geometric deep learning, there's more and more the use of space to encode invariance of certain objects. And these objects are not necessarily have the three-dimensional structure. You can have objects that have higher groups of invariance. So I think that it's indeed using geometry for other contexts. And it's really the aim of trying to go to this more general formulation is to be able to apply it to real-world models. What I mean by real-world models, it means the ones that are learned. To be able to have an agent in an open environment that will learn the way it would learn its generative model, but with the a priori that it can take a perspective on its environment and see what it can do. So we really want to get out from the 3D, at least for me, go out from 3D space and go to like implementing it completely in autonomous agents. Okay, to kind of follow on that, there's a question in the chat. Great talk. I also wonder whether it applies to any modality, not just visual spatial, but also text. Yeah, so it's for other modalities like sound, for example, that I don't know, for applying to text. So there's a lot of work now on large language models and like of this idea of prompting having these models being able to have some kind of imagination. So you would like to have these ideas applied in this context. But for now, I mean it's not the line that we're trying to do. So I'm trying to stay on deep learning. So basically take standard data set without considering like text and just try to see how these ideas, like in geometric deep learning, can stabilize representations. So it's not the same as others. It's clearly other modalities, especially if we consider like multimodal integration. Like you try to rebuild the state space and you don't want to see it only as a vector space, but you wanted to see it with more structure because you want to force the way that you have constraints. You know a priori the constraints that you're going to have on the way that you can take a perspective on one modality or the other. But it's not for text. But I think maybe it could be used for text but it's not what I'm doing now. Yeah. So I just want to rebound on that. I have the luck of collaborating with real mathematicians. I'm just some guy with intuition that found the right people to do the work. Regarding multimodality this is important to understand that this is not about vision. This is about spatial cognition. It's supramodal. The claim is that vision is just one particular way of integrating information indeed in an obviously projective manner, but that's integrated in a much larger field of experience than the field of view or the visual field and obviously proprioception touch when you build a representation just by touching something and you get this 3D representation of that thing automatically in your mind hearing to the extent that it's about source, localization and building spatial representation. All of that stuff that's the claim of the theory is integrated in this projective space. There are priors from memories, there are stuff from vision, stuff from audition, stuff from proprioception, interoception, you name it all senses contribute to it. So it's not vision. The claim is that projective 3D projective geometry in that case beyond vision. Vision is just actually a slave to that supramodal representation. That's the claim. Awesome. Yeah, please. That's important because I think we work like a group, there are several different perspectives for me and more on the computational side, so it's important to have that cover because clearly something I would be able to answer. Cool. Here's a nice following question from Vladimir in the chat they wrote, is the visual sensors have variable resolution, for example higher resolution in the center. This can naturally lead to curiosity based change of orientation response within your framework. So is it a question or an affirmation? So it's for EG, right? If there's variable if there's variable sensor precision for example higher precision in the center might you see any resulting curiosity associated change merely based upon the asymmetry or the structure of the sensor field. Okay, so I shouldn't say it but I will still say it because I don't know if the people will go out one day or not this is more attention this is more attention than curiosity so it's more like so you can act on your sensor so that you can change the way that you integrate them and this really acts as a form of of attention so I really wanted to What do you think the what is the interplay between attention and curiosity? I mean it's the way you're going to move basically on the way the way that it deforms the way that you move and you so the consequences of your actions are not the same so basically curiosity is what what drives actions and the way that you so you do choose your action with respect to a reward which is epistemic value and now attention or let's say changing the sensors is a way to change the consequences of your action with respect to let's say optimizing this value so it's like I mean it's I don't know if I should say it or not okay okay let's say it's okay let's say I think it's a metric basically it acts like a sort of metric on the space on the group directly so it means that it's going to so when you have a function you want to grade in the center and you choose a metric and the metric is so the attention will act as a metric and so the metric will allow you to deform in fact the step that you will do and so I think this is how it acts so it's very known that I mean it's known that the fact that changing the sensors is related to attention that's not something much new but the fact that you can directly in our setting related to groups and you can relate it to like changing the metric on the group is something that can be done and I think it's interesting alright another question in the chat what is a formal framework for learning geometry from data how do we move from empirical data sets the files on our computers and the things that we do deep learning on take into machine learning pipelines and how do we utilize geometric approaches and formalize to the kind of analytical precision that we saw here some of these geometric relationships so there's a very like there are several different fields of how to use geometry on data so there's already like I mean depends how you see geometry but one way is a TDA to political data analysis so which is basically trying to let's say you want to provide learning by geometric like stability property of your data and try to interpret it as a certain kind of space and then you look at the whole of your space and this is something that defines your data set so this is one approach another one is that people doing manifold they try to learn so they know that your that the data lives in a low dimensional manifold and they try to learn this manifold so that's another another way of doing it so there's a lot of work in this direction there are people who are interested in variance and equivalence so more in the geometric deep learning there are people who are interested in the same setting on a prioris how to use the geometric a prioris and included in learning for deep learning reinforcement learning and this is what we do in fact this is one of many people do this it's not just this we like for us we try to focus on this idea of how to exploit it for reinforcement learning so in this setting epistemic value was the only driver of action is that so it's kind of like an expected free energy except without pragmatic value so we only have the epistemic term remaining exactly in fact if you look at in terms of optimal control and not by Asian because you know that there's this duality that is most of the time stated in active inference where you have basically a duality between the let's say like stochastic value function and the probabilistic version of value function so you can encode on priors you can encode all your drives with priors or with the value function directly so rewards and so both are kind of dual and they're dual in a way you can make like this duality I mean it's I don't know if it's explicitly dual like in some context this sentence makes sense for active inference for me I'm not completely aware if the duality is exactly formal but at least it's the idea is here so there's no big difference at least today I see it as with value function and seeing it probabilistically but the terms of epistemic value is an exploration drive which you also find reinforcement learning maybe not in this exact expression I think that the exact expression that was given in the paper of Fristan and All I think it's when you believe when you believe for sure I know it's on the paper it's it's very like a canonical way to define epistemic value so it's an exploratory drive you can always add to the function you know but there's a lot of problem of trying to explore in fact like your environment you find the good policies like it because if you're in a state space that is continuous it's really difficult to sort the p1p yes it's like if you knew which curiosities you could get rewarded from you would have already known the answer to the search so that's one of the challenges with pragmatic value it converges well to expectations but then this work really focuses in on epistemic value and shows what it can do alone as a driver so how would you bring pragmatic value into this formalism you just added it's really you can put the value functions of some of rewards in some way and then you add the epistemic value in fact when you look at the formulation in terms of belief mdp for p1dp the curiosity epistemic value is simply a value function nothing more it's a one-step value function but I mean this is like just playing with definitions but so you can always add a drive for exploration and this is something that's really often done in reinforcement learning I mean like programming is even standard in not this way not exactly it's but there's a book that's called reinforcement learning state of the art and they introduce this exploratory drive like this book maybe that has 10 years or something like that so it's something that you add up you can put the way we will do it now is like we can put some drives with respect to preferences and then you add for example epistemic value and you try to just solve the optimization problem but in terms of formalism if you look at the belief mdp it's nothing more than a certain value function so it's nothing so space remains when we translate through it or when we whether we're in the euclidean or in the projective setting space is basically what is not changed through action is that the case? it's not the way so the thing is that so there's a bit of technicality so the way that we use is we use a chart for the projective plane so we need to use homogeneous coordinates so we need to take away a lower dimension of plane so that's why we always have the same like in the way that we encoded we always are in R3 because we chose a chart we chose homogeneous coordinates and we did projective transformations from one homogeneous coordinate to another homogeneous coordinate but it always lists to a projective transformation so what is hidden in the way that we write it is in fact we don't have the same space in the first case we have euclidean and in the second we have projective but the way as we use charts and we like we don't put all the details of the fact that for example you can compose two projective transformations even if you write it in charts in terms of homogeneous coordinates it stays a projective transformation we just don't write it that this way we stay in charts and so that's why there's a similarity between because we want to implement it there's a similarity between the euclidean projective case euclidean case and projective case have the same state space R3 but the space of transformation is really different and in fact they are not the same space but what is important for us in terms of space is just to tell yourself that if you take away everything that is inside of space everything that populates space take it away what your left way is with this kind of concepts which already takes into account the fact that when you act you're not changing the space and there's no single point that is singled out this is very important because it's a very typical like it types a lot of the objects you're looking at if you move there's no point in the space that is changing there's no point that is singled out and when you move the space is not changing so you know that you already have inside of your space a change of charts that is encoded when you can imagine that you take the perspective of somebody else and this is like really something that is at the basis of the space we consider is that the space is simply a way to support the fact that you can change charts you can change perspectives and so it's including this idea inside of agency that we're trying to do so maybe I was not clear in the way that I said it first but I think this is really it's the water we're in so it's a very interesting way to approach it and it reminds me way back when actually almost three years ago when we discussed the projective consciousness model and phenomenal self-hood in live stream number nine and we talked a lot about flipping between the Euclidean and the projective modes how an agent could have on one hand a space in which a book is a rectangle and yet also be seeing it very close to their face so that its visual projection was different and yet here there was different behavior associated with the frame alone so what's going on there where even though they apparently can be reconstructed from each other what flips can we flip from thinking more Euclidean and then in that situation you know I'm playing 30,000 feet above my city so there's no where I need to go but then in the projective setting we do have this kind of inbuilt epistemic drive to be near so I think it's a hard coded so the idea is one of the things that it can be useful for is communication so using space as a way to communicate taking it as a priori that we have very different architectures like the way that we treat information are not the same we don't have exactly the same connection but we still have like a common framework in which we can discuss and visual information is something that is very immediate for us which is not obvious if you take the perspective the point of view if you go from the idea that it's rebuilt and it's not the environment you're living on it's not exactly the space that we see there's something that we construct with like for functional reasons and that is one of them could be that we can communicate very directly with it so I think it's hard coded that the state space needs to have this action and that all the agents share this action but they don't share the same architecture so it gives a common base for discussion like through actions for example change of perspective so it gives like a so to answer the question Euclidean projective we're not going in the direction in terms of research where we're trying to say that we can go from Euclidean to projective for the same agent so agents are hard coded with the latent space which is as some structure and then we try to exploit it for function for communication for multi agent like behavior so that they can collaborate this is okay so each agent is within its own projective setting and then the Euclidean sets the stage and allows that multi-perspectival the Euclidean space which is the outside world just here in this toy model because we it's a way to reference the configuration of the world but for network this would be for example the configuration of its sensors so it's just here there's this kind of ambiguity between space and space so outside space which is Euclidean and inside which is projective but the outside space is just a way to discuss configurations of something that is related to the environment so it can be like your sensors the configuration of your sensors or something like this information coming from your sensors so you just give a space from it geometry is really in the reconstructed world that is internal but being able to do networks that do this is somewhere that we're working on now and it's not that obvious because you need to there's a lot of algorithms algorithmic problems that we try to solve and that we need to solve so in this toy model it's really like the ambiguity comes only from the fact that it's a toy model but what is space is inside it's what is reconstructed it's the thing that we perceive space from sensors as being a whole as being something that structures the information so David you want to say something? Well I mean yes indeed in the talk that you are referring to I was giving and also if I remember well indeed the toy models encoded a certain world model in an Euclidean way and then using homogeneous coordinate and this kind of stuff to do the projective transformation we just transform the Euclidean space into a projective space and you can go of course you can invert it I mean there is an unlike an R division but you can invert it anyway so you can go from one to the other but Gregoire is right that it was a choice of modeling it was a bit motivated by my experience as a brain scientist you know there are reasons to believe that memory for instance cells and grid cells they tend though they are new hypothesis about their hyperbolic characters but you know usually they are sort as encoding space in a Euclidean manner so since when we project you know future action we use sometimes and we have to use memory or even if we do remembrance of the past and we project yourself in the scenes in the past there is this access to memory sometimes that you know we used to think and probably this is still the case encode information in an Euclidean manner so it would make sense that then for the conscious access there would be some operation that would allow us to go from Euclidean to projective which you know was implemented that way at the same time but this is way beyond my abilities in math so Gregoire will correct me there is a way of seeing projective geometry as being more general as an extension to find space by adding point at infinity so if now you think about projective space with a metric basically the projective space is an extension of Euclidean space am I saying something completely false I mean it's one way to seeing it and also a lot of operations that use dialogues I would say exchange between projective and Euclidean if you think about multi-view reconstruction of 3D Euclidean space from multiple shots in your camera you take several shots of a building from different perspective and then you can use deterministic algorithm approach using epipolar geometry to kind of infer under some priors that basically all lines that converge at point at infinity are actually parallel then you reconstruct to go from projective to Euclidean so there is you know a deep relationship between the two that is possible and probably functional yeah that's that's about it epipolar geometry is a subset of projective geometry just to give some context David has a better intuitional projective space than me so when he says stuff I he really has a very very intuitive like physical personal visualization and most of the time he's right so are we experiencing a projective geometry or what is it like to experience a projective geometry so one problem that there is with this model is the fact that you have so you need to force the point at infinity to be to the plane you take away from the projective space space has to be behind and so you're not really working with real maps you know you can always extend them that's okay but the thing is if you want to see it in the way that we see it us which is we have something in front of us I find it a bit restrictive to say that you cannot see what's behind you I mean you never see it right I mean you can imagine it but which is not the same thing because you change your frame so I can't answer really this question I'm just pointing out the fact that for me it's very perturbating to think that I have a plane that corresponds to this continuity I mean with respect to I mean I have a plane that destroys completely the way that I can think about movement behind me which is that if I had to imagine I had a whole space that is around me and not just in front of me things would be very very weird in terms of transformations but it's not something that is not possible I mean it's this is what happens when you take psychic you basically allows the projective transformation to have all these degrees of freedom but in the perception like in sensory motor processes this is calibrated this is restricted so we have to choose certain subset of the group basically it's only certain projective transformation that will be used in practice but now if you go into mystical experience or you take drugs they are going to mess precisely that's the hypothesis with those parameters and now you have the full flat 15 degrees of freedom transformation and you see things that are very weird like that looks like mandalas or things like that so that's something you can get when you when you leave too much freedom to the projective action what if what we're seeing quote in front of us is behind us or something like that what do you mean what was the what was the issue with not being able to see behind us the issue is so if you want to see the project space as a 3D space as a way that we see it now need to take away a plane and basically projective transformation so there's a sort of plane at infinity on which you can I mean it doesn't exist in the project space it exists only in the way you define the Necrogen chart like a 3D figure R3 chart with the plane that is taken away and so this plane that is behind you if you apply a projective transformation in part of you everything will seem fine you're expecting like it's things that you experience but then once what happens is that it can send things at infinity back behind you so this is something that is a bit for me I'm very I mean it's so you can write it explicitly there's no problem mathematically but if it's really the way that we experience it it's extremely weird to think about so I prefer not thinking about it so this is the only thing I I don't take drugs so I leave it to other people and it's really it's true that taking drugs in this context makes complete like you can you could model it in the projective framework and not in the Euclidean framework because you have a huge amount of projective transformations that make sense more than the one that are related to actions in R so in the presentation I give there was a way to relate like Euclidean frames to projective frames under certain axioms and in fact you have a huge much bigger space of projective transformations than the one that are restricted through by the actions of the agents so the actions of the agent inside of the real space so the Euclidean space induce projective transformation in its internal world but there are much more projective transformation than these actions and so you can imagine in this setting having like very confusing states of your mind in a way which you cannot in the Euclidean case so I wonder if a creature or a robot with 360 degree cameras would it not be so perplexed as it wouldn't necessarily have a visual before a visual in front and behind and the thing is how do you glue them together you still have to pick a point of convergence I mean you need to find a way to do multi-model integration like if you have a camera that's 360 you have the space to represent it so maybe you can represent it as a sphere in this case the sensors are different so it's not necessarily the same robot than the models that we are considering but there might be also a homogeneous space so a G space which corresponds to the rotations of the like if you say it can look 360 and then you can rotate and the space of observation stays the same so you can always imagine acting on it and so you can imagine having another G space structure and the good thing about it is just with respect to the perspective if you have an agent that can take a different perspective on its environment which depends on its sensors, on its actions on the kind of data you have then you can define G space and you can do exactly the same setting everything like it's made in the more general context so you can apply it in a projective case like we come from the PCM projective consciousness model but I mean the framework now or at least the way to replace this framework inside of optimal control is well adapted for any change of perspective and not just projective awesome, well where are you gonna go to next, where will the epistemic drive take you carrying forward so there are two projects now so the first one is to be able to be like maybe David can talk about it is to be able to to implement it so that it can be used for monitoring behaviors or to be able to predict behaviors or to analyze it so they have model of behaviors of agents like maladative behaviors that are published but in more limited context now to make it inside of the computational framework so that we can use it to analyze experiments with humans so this is the first project which is ongoing with several students so on the page that I put on the first slide there's all the work that we're doing the people who are working with us and we're very grateful for them working with us and all the ongoing work in this direction and there's the second one which is more like machine learning and so for me it's very important to have effective algorithms things that you can really use in practice things that you can really implement and implementing when you have group structures there are many many problems that appear how do you do sarcastic optimal control when the latent space is a homogeneous space is I think a completely open question and I start to have answers in this direction so hopefully the work will come out soon let's say January or something like this but there's a huge work in this direction cool, anything else you want to add? thank you very much for the invitation a lot to think about mentally rotate slash project I mean after all as you even alluded to with communication this is a perspective taking question technologies adding some new flavors and methods for example asynchronous communication but it's synchronous for the agent when they perceive it but all these different modalities of communication and it's quite interesting to think about and just really cool that you and colleagues are pursuing that from empirical data analytic and from theoretical mathematical approaches I mean on this side for communication I think it's something I find extremely interesting more generally if some people want to work on the project please tell us and we'll be very happy and even if you have money please tell us and we'll be very happy here great, great we were very open you can always send us a mail we'll be very happy to discuss about this great okay well thank you again to all till next time okay see ya bye bye