 Hello and welcome, everyone. It's January 26th, 2024. We're here in Active Inference, Mathstream 8.1 with Richard Sarajevan. And we're going to have an interesting presentation and discussion today on Introduction to Bayesian Mechanics, Free Energy Principle, and the State-Based Formalism. This is part one. So, Richard, thank you for joining. Looking forward to this presentation and discussion. So, to you. Hi, everybody. So, yeah, my name is Richard Sarajevan. I'm French, working in Switzerland. I'm a PhD student at EPFL in Lausanne. And just to bring a bit of context, I'm not working on Bayesian mechanics. We do have a physics background, but we're interested in modeling bacterial evolution and ecology. And what happened is that something like, I mean, the Free Energy Principle was always in the corner of my head. And one year and a half ago, I decided to really read about the Free Energy Principle, especially if I wanted to transition to the field and I do want to transition to the field after my PhD. And so I started to ask many questions to the people from the FEP community, and I'm so grateful. Thanks for them. And also on the discord of the Active Inference Institute. And at some point, I said that I was preparing a lab meeting about the Free Energy Principle, and Daniel proposed to have this discussed on the live stream because there isn't such material to specifically learn about Bayesian mechanics and the actual physics underlying the Free Energy Principle. And so here I am. So once again, I'm not an expert on the matter. So always refer to the original papers, but hopefully I'm going to do a decent job. So without further ado, let's start. I'm not going to tell you where we are heading, what questions we would like to address or whatever. I'm rather going to start building the framework right away, and at some point, what we're doing will become clear. So as you may know, there are two formulations or formalisms of the Free Energy Principle, the so-called state-based formulation and the so-called path-based formulation. So today we will focus on the state-based formalism. It's not like the old versus the new formulation. In fact, thinking in terms of path or so-called generalized coordinates of motion, I've been around forever in the literature, but it kind of came back to the front scene of the Bayesian mechanics literature, I think. Anyway, today we will focus on the state-based formalism. So the very starting point is to write down a Generic-Langevin equation. So it's literally like saying, let's consider a random dynamical system. Very briefly, for the people not acquainted with such an equation, X here is the state of your system. So it could be a simple scalar if you are considering a one-dimensional process, but in general, X would be a vector. For instance, if I don't know if you want to model the 3D diffusion of a Bronyan particle immersed in a liquid, X would be a 3D vector whose components are the coordinates of your Bronyan particle. And you can see on the left-hand side that we have dx over dt, the time derivative of the state vector, so that such an equation really describes or specifies the dynamics of the system. So many things can influence indeed the dynamics of the system. If I stick to my Bronyan particle example, maybe it is subject to an external force. So whatever is relevant here, you put it in F, the so-called deterministic term or flow. We will refer to it as the flow for the presentation. However, in some cases, there is stuff you don't want to explicitly model. For instance, if I stick with my Bronyan particle example, it is constantly hit by the molecules of the medium surrounding it, hence its Bronyan motion. So if you want to take into account these thermal fluctuations, it would be mission impossible to explicitly model every single molecule of the millions, if not billions of the molecules surrounding it. So a convenient way to still take into account these fluctuations, which are literally thermal fluctuations in my example. A convenient way to proceed is just to add a noisy term to the equation. So omega here is a random variable with value changes with time with the appropriate statistics. Okay, so two brief remarks before moving on. If you assume that the state of your system changes slowly compared to the time relaxation of your fluctuations, you can write the autocorrelation function of the noise like that, where gamma is the diffusion matrix and delta is the delta direct function. So what it means is just that in that case, your noise is super rough and it's not correlated in time, basically. Also, second remark, you can use the central limit theorem to argue that it makes sense to assume that omega is normally distributed, so that in the end, the noise is a Gaussian white noise, but not that in the next livestream where we will discuss the path-based formulation of the FEP, we will relax the white noise assumption. Anyway, so we have this random dynamical system and we can do something cool with the flow. So the flow F is a vector. It has the same dimension than the state vector because each component of the state vector has its own longitudinal equation, if you will. And you can decompose it into a solenoidal and a gradient term. So before telling you what this decomposition is all about, on a technical note, just notice that first Q here is the so-called solenoidal matrix. Gamma is the diffusion matrix just as before. And the I here with the nabla I, this I of X here, is a negative log of a density. So it's a self-information or surprise we will refer to it as the surprise throughout the presentation. And the density at play here in this negative log density is the steady state or Ness for non-equilibrium steady state density of the system. So we assume that there is such Ness density that exists so that if you from a given initial state you let your system evolve, it will reach at some point a unique well-defined Ness density. And second remark before telling you what the decomposition is all about, note that usually in the papers the divergent terms here and here are put together in a third term which is sometimes called housekeeping or correction term. But actually if Q and Gamma are not state dependent, these divergence terms vanish anyway and we end up with the two remaining terms which can be nicely factorized like that. Also, last thing I want to say is that if you consider the solenoidal term, the first term, it is indeed a solenoidal term. You can indeed write it as the rotational of some potential. I'm saying that because sometimes people get confused when they see a gradient in both terms. Anyway, what this decomposition is all about is quite in fact simple. Let's consider this nice 2D single-moded Ness density. So the flow and more specifically the gradient component of the flow which is here the vertical flow will drive the system towards its mode while fluctuations kind of push it away. But it's not the only flow. There is also the solenoidal flow which is here called the horizontal flow which will make the system kind of converge to its mode with ever decreasing cycles. And so if you want to get some more intuitions on this solenoidal flow, what we can do is to remove the fluctuations. So all the entries of Gamma, the diffusion matrix go to zero and this means that we would not have any gradient flow anymore. We end up with only the solenoidal flow. And if we do that, the system will just follow an isocontour circulation on the Ness density. That's the bottom right panel here where the solenoidal flow kind of drives the system on this circulation here. So a small remark about this solenoidal flow, because it kind of drives the system in this simple example either clockwise or anti-clockwise direction in an irreversible fashion, irreversible in the statistical physics sense. So it breaks detailed balance and so on. People sometimes view this solenoidal flow as underwriting the symmetry breaking ubiquitous in living systems. Anyway, okay. So before using this decomposition of the flow to do some cool stuff, I need to introduce some stuff. So I will have to go through a couple of things of notions one after the other and afterwards we will put everything together and actually derive the free energy principle. So the first thing I want to introduce is the notion of sparse coupling. So let's say that in my state vector X here, I have a subset of variables. This mu here we refer to as the internal states and they specify the state of some subsystem called mu. So I mean you get the idea. The idea is that we have like an organism, an agent, the bacteria in my schematic. And these variables here literally specifies the internal states of my bacteria. And this bacteria is in a given environment, niche, whatever. So there is this other subset of variable we refer to as the external states and which corresponds to the external world, the external states of the bacteria. And the idea here is that these two subsystems are not connected to each other. So when I'm saying that two variables are not connected to each other, I just mean that their respective flows do not take the other one state as arguments. So they do not influence each other basically. In fact, they are indirectly connected to each other thanks or through a third subsystem we refer to as the mark of blanket so that these guys here are called the blanket states. And we will see in a minute that it really corresponds to a mark of blanket in a statistical sense. Okay, so we have this architecture, this path coupling architecture here and in fact we can even go a bit further and assume that within the blanket states there are two more systems, the so-called sensory states and the so-called active states A. So basically the idea here is that the external states eta, they influence the sensory states S and these sensory states S influence the internal states mu but not the other way around. And the internal states mu, they influence the active states A which influence the external state eta but not the other way around. So it's really a path coupling architecture inspired by the so-called action perception loop. However, you could ask questions like why do the sensory states influence the external states or why do the active states influence the internal states etc. So we don't have really time to discuss this, I guess you can think of some qualitative example in biology but I just want to point out that even though this architecture is quite canonical, it's not a definitive feature of the free-nerdy principle. And in fact, in the next time when we will discuss the other formalism, we will do a bit of zoology and we will look at other path coupling architecture. Okay, so on a technical note just notice that such path couplings are encoded by zero entries in the Jacobian matrix of the flow. Anyway, so thanks to this path coupling architecture we have this system of four coupled long-run equations which respectively describes or specifies the dynamics of the external states eta, the sensory and active states S and A and of the internal states mu. Okay, so I want to say here about the Markov-Blanket thing that under some conditions I'm not going to discuss here. So for the people acquainted it involves having no solenoidal couplings between autonomous and not autonomous states but anyway I'm not going to go into this. Let's say under some conditions the external states eta and the internal states mu are conditional independence. So they are independent when conditioned upon B which makes sense because all the information is kind of translated through B. However, note that when I'm talking about conditional independence here, I'm talking about conditional independence in the stationary density. So basically if you fix P and you have this joint conditional stationary density here for xi and xj, if these two guys are conditional independence, it just means that you can write this joint density like that and so that such conditional dependencies are encoded by zero entries in the ACN matrix of surprise all. Okay, so now just a bit more of vocabulary before moving on. Note that if we put together A and mu, so we consider the couple active states and internal states, we refer to these guys as the autonomous state alpha. And the cool thing about the autonomous states A and mu or alpha is that they are conditionally independent of the external states. So autonomous and external states are independent when conditioned upon sensory states. And if you add the sensory state S to the autonomous states, so you consider the whole thing, the whole Markov-Lanquette and the internal states, we refer to these guys as the particular states pi. And pi constitutes a particle, a particle in a generic sense of course. So an organism, an agent, whatever, a bacteria in my schematic. Okay, so here I just want to make a point to make a bit more clear what we are doing, what this approach is all about. So basically here we kind of define what it means for something, a bacteria, whatever, to exist in the sense that it has its own internal dynamics statistically separated from the external. It does have a Markov-Lanquette, it does have its own physical integrity. So we have no clue of how it maintains indeed its integrity in the sense that if you're considering real systems like an actual bacteria or a human being or whatever, it does survive in a given time scale. Like for instance this plane, I don't know, like active processes, contouring dissipation for instance. Here we don't say anything about how it does survive, it just does. We do have this path coupling architecture. And from there, from the starting point, we are going to derive the necessary consequences of such path coupling. So basically we kind of ask or answer or try to answer the questions if things exist, what must they do. And so if you're a bit confused, don't worry, we're going to go back to this idea later. But I just want first to show you this quote here which tells you many theories in the biological sciences are answers to the question, what must things do in order to exist? The FEP turns this question on its head and asks if things exist, what must they do. But once again we are going to go back to this idea later but that's kind of the idea of this approach in a nutshell. So as I told you, I still have a couple of things to present. So I will have to go through each of them one after the other. And finally we will put everything together and finally derive the free energy principle. So the next thing I need to introduce is the notion of synchronization map. So very generally speaking, I'm not specifically here talking about our random dynamical system. If you have linear map g mu here which gives you mu from B and g eta here which gives you eta from B, then if g mu here is injected so that basically you can go back to the pre-image from the image, you can use the pseudo-inverse of g mu so that from mu you go back to B and from B you can go back to eta. So the successive application of the pseudo-inverse of g mu and then of g eta is called the synchronization map and it basically allows you to directly go to eta from mu. Okay, so now let's try to use this idea in the context of our system. So B here corresponds to the blanket state. So if I fix the blanket states, I have corresponding conditional densities for mu and eta. I have p of mu given B and p of eta given B. And their modes are bold mu and bold eta. So in virtue of this synchronization map, I can go back to the external mode from the internal mode. Thanks to once again this synchronization map here and I'm going to give an example in a sec which is going to clarify a bit more what we are doing here. But first I just want to say that in this nice paper by Lance Dacosta about this synchronization map, basically everything was Gaussian but sometimes I mean if it is not the case Laplace approximation which is literally a Gaussian approximation might be necessary to derive a synchronization map of closed form. But don't worry, we'll go back to this idea of Laplace approximation later. Just remember that we have this synchronization map here which allows you to go to the external mode from the internal mode. So for instance if given B, given the blanket states, the corresponding p of eta given B follows this nice normal distribution where bold eta here corresponds to the mode. Then in virtue of the synchronization map, I can view the internal mode mu as parameterizing a density. I write it that way q mu which is equal to this nice normal distribution where the mode is just the synchronization map applied to the internal mode to itself. And by construction of the synchronization map it is equal to the true external density. So you can view the internal mode as parameterizing a distribution over external states basically thanks to the synchronization map. That's why what the synchronization map is all about. So just a small point because maybe some of you are a bit confused here because we're talking about modes as opposed to actual states. So we will talk about that later. But indeed, I mean if I take the actual internal states at a given time t they are not necessarily equal to their modes just because of fluctuations or whatever. So that if I apply the synchronization map on the actual internal states it might not give you the true external mode. But anyway, we will discuss this a bit more later. So that was the notion of synchronization map in a nutshell basically. So last thing I want to introduce before finally putting everything together and actually derive the free energy principle is the notion of variational inference. So very simply let's say that you have some latent variables or hidden variables or some latent generative process causing some data s. So you have a prior p of eta over the state of these hidden causes of data. And you are also equipped with a generative model which just designates this joint distribution here p of eta and s. So you can view it as a model of how the latent variables cause the data. So the idea is the following. You sample some data s and you want to compute the posterior distribution p of eta given s. So in a way you want to refine your belief about the hidden cause of data thanks to a new sample data. So it's very simple in principle because you just have to apply Bayes theorem, right? However, in practical settings the denominator here p of s, so the marginal density over sensory data usually requires a monstrous marginalization. So it's just not tractable. So we can't just apply Bayes theorem. So we need a method which given some variational distribution q also called recognition density gives us, I mean we want a method that makes it as close as possible if not equal to the true distribution we want ultimately to compute namely p of eta given s. And these two density, so q, our variational distribution and the true distribution p of eta given s are equal or are more or less equal if their divergence, KL divergence here is 0 because this quantity here, the KL divergence basically measure the difference between two distributions. So that's what I wrote here on the top of the slides. Finding an accurate distribution q in the sense of finding a q as close as possible if not equal to the true target density is equal to minimizing these divergence. However, these divergence, I mean there is the target density appearing here. We can't do anything directly with it. We can't compute it or whatever. We need a proxy for this target divergence. And there is a proxy called variational free energy f in green in my slide here. So f is equal to this divergence here between q and the generative model. The idea here is that you can decompose these divergence into the true, the target, sorry, into the target divergence in red here plus something. So it is indeed a proxy for the target divergence. And note that interestingly enough, the second term here is the surprise over sensory data or negative log p of s. So that f can be viewed as an upper bound or lower bound depending on how you define it on surprise. Okay, so what I just said here is that minimizing the target divergence just means minimizing f. So that's basically what variational inference is all about. And note that usually algorithms require q to be Gaussian or require a mean field approximation or whatever. And if q is required to be Gaussian, even though the target density is not Gaussian, we would end up with the best Gaussian approximation of the target density basically. And in practice, it would mean working with a so-called Laplace encoded free energy. Okay, so before moving on, I just want to say that this quantity, the variational free energy is in itself a quite rich and interesting quantity. So you can decompose it in many ways and each decompose provides interesting interpretations. For instance, if you look at the second line here, you can see that minimizing free energy means maximizing this accuracy term here. You basically want to explain the data, I would say. But at the same time, you want q to defer the least possible from a prior distribution. So that's an interesting quantity. Anyway, now let's finally go back to our sparsely coupled random dynamical system and use everything we talked about. And finally, let's derive the free energy principle. So here is our system, and we have these four Langevin equations. And the first thing to do is just to apply the decomposition we talked about in the beginning. So basically, the flows of each of them can be written like that. So I just directly applied the Helmholtz decomposition we talked about in the beginning. Okay, so now let's try to understand how it works. Let's talk about the dynamics of the system. Let's say that there is a momentary instantiated sensory state and let's say that the sensory states are fixed. And there is a corresponding autonomous mode toward which the autonomous states are going to converge and stay in the vicinity of their mode in the closed vicinity if fluctuations are not too large. Okay, but in fact, sensory states with time changes so that the modes of the autonomous state move as well. And in fact, it moves on its corresponding autonomous manifold. So I'm not going to go into the details, but just have in mind that the autonomous mode moves on a so-called autonomous manifold which can be viewed as a statistical manifold and which can also be viewed as a so-called center or center manifold. So if I kind of rephrase what I am saying here is that the flow of the autonomous states can be decomposed into an off-manifold flow and on-manifold flow which corresponds to the path of the mode itself on the manifold. Okay, so just to be a bit more clear, let's say in my bottom right illustration diagram here, the autonomous states are here. And I'm interested in the off-manifold flow. So basically I have this component here which corresponds to the gradient flow towards the manifold, towards the mode basically. Here it's pretty much like what we discussed in the beginning. And at the same time there is here this orthogonal component which corresponds to the solenoidal flow. So that's basically the way the autonomous states are going to reach their mode here. It can be viewed as this ever-decreasing cycle towards the manifold on which the autonomous mode moves. Okay, so that's a bit dense I guess. So I recommend to check the paper, the free energy principle made simpler but not too simple which kind of discuss all these ideas about center manifolds and stuff. So here the interesting point is that if you assume a separation of timescale between the fast flow of the manifold as opposed to the slow flow on the manifold, basically the autonomous states are always in the vicinity of their modes. And if you want to characterize the overall dynamics of the autonomous states, you can focus on the autonomous mode on the path of the mode. And in the next slides we will indeed focus on the autonomous mode. And by definition, as we already discussed, the autonomous mode is or corresponds to the autonomous states which minimize surprise here in the last two launch variations because the autonomous mode corresponds to the least surprise of autonomous states. Before moving on, I just want to say something we can maybe discuss afterwards because I'm not sure to fully understand. But basically if I'm here in my bottom right schematic and so I have this gradient flow towards the manifold and this solenoidal flow parallel to the manifold and if I remove fluctuations, so the corresponding entries in the diffusion matrix go to zero. As we saw in the beginning, it means that there is no gradient component anymore. And what the system will be doing is kind of orbiting or oscillating around a point which moves on the manifold. So that's interesting. And I guess that if we do the exact same reasoning but starting already on the mode, then the whole flow reduces to the unmanifold flow. And I guess that in that case, the autonomous states follow and in fact coincide with their mode. But anyway, maybe we can discuss about that afterwards. So let's use the various things we talked about and especially the notion of synchronization map. As we said, the internal mode parametrized indeed a distribution over the external states. So mu here parametrized a distribution which by construction coincides with the true distribution p of eta given b. And in fact, thanks to the conditional independence between external states and autonomous states, you can just drop the condition upon a and you just have q mu equal p of eta given s. And equivalently, you can write it p of eta given pi. And the idea here is that you can view q mu as a variational distribution. If you want, you can write its associated variational free energy. So you have this formula here, the free energy. And because q mu is already coincide with the true posterior distribution, if you will, the first term here goes to zero. And so that f here reduces, if you will, to the surprise over a particular state. And surprise over a particular state, they appear here in the equations of the autonomous states. So we can do this identification and we realize that the autonomous mode not only minimized surprise though, but free energy in general. And the way mu, the internal states will be updated when the sensory states will change will always be so that this divergence here is zero. So that mu is always, always keeps track or synchronized with or in fact, infer the external states so that you can interpret that under a generative model which is here p, the next entity. The internal states can be viewed as performing inference over external states. And so, in fact, it's not only this divergence which is minimized, but it's also surprise. And it's not only internal states which minimize free energy but also the active states. So let me give an example. Let's say that the actual instantiated sensory states are likely sensory states or unsurprising sensory states. And by definition in general, the instantiated sensory states will be likely sensory states. So mu will, the corresponding mu will be so that this divergence will be zero as we just discussed. And at the same time, the corresponding active mode will be so that you can see the composition with the third term here, i of a given s and mu. A, the active mode will just be the one, the most consistent with this instantiated sensory states. And in fact, you can view it the other way around and say that the active mode is the mode which gives unsurprising sensory states. So that the particle can be viewed as actively sampling unsurprising or likely sensory states. Or equivalently, you can say that the particle kind of accumulates evidence for its own generative model. And I'm going to say something about the generative model in a sec. But I just want first to, so yeah, this sentence here just sum up what we said. Mu is updated so that q mu is always the best distribution of our external states. And we refer to this as perceptual inference. And the idea to, in addition, trying to minimize surprise for action is called active inference. So a brief note, we said earlier that in order to have a synchronization map of close form, it could be necessary to work under a Laplace approximation. So that in that case, q mu is just the best Gaussian, for instance, of the target density. So that's the divergence here would not be zero, but it still would be minimized. So that's the identification here between the two gradients still hold and nothing changes with respect to our discussion. So here I just want to say something about what we are doing here. Basically, we assume that we have our agents or organisms that survives indeed, exist or persist in a given environment, let's say, at a given time scale. And we end up with the fact that our particle must be equipped with or must embody a generative model, which may or may not exactly coincide with the true generative process. And which encodes the causal structure of the world under which it tries to perform inference and to minimize surprise to perform perceptual and active inference. But the interesting thing as well is that, and I think that's something fundamental that people tend to misunderstood, I guess, maybe I'm not sure, is that the generative model also encodes the preferences of the system. And let me explain why. If I tell you that an organism manages to survive, to exist, to persist, etc. And so it means that such an organism manages to stay in its homeostatic life-compatible states. You would be, of course, it almost sounds like a tautology. Survive equals staying in its homeostatic state. That's obvious, right? And that's exactly what we are doing here. We assume existence, survival, so that the likely state in which the particle will persist are preferred states per se. So that, for instance, if I'm considering the prior of my generative model over sensory inputs, POVS, sensory outcomes as associated with high POVS, so likely or unsurprising sensory states, are preferred sensory states. Hence, when I'm saying that the active states try to sample unsurprising sensory states, it means trying to sample preferred sensory states. And so, basically, the particle appears to actively accumulate evidence for its own existence. In a way, it kind of sample life-compatible data, if you will. And that's exactly the definition of self-evident thing. So I think we touched here something fundamental about agency, is that agents are self-evidenting creatures in that sense. Okay, anyway, so basically, I think that's the most interesting things of the free energy principle. We start from existence and we end up that such a particle, which is coupled to the world in that way, must embody a generative model which encodes the causal structure of the world and which encodes its preferences in terms of what is life-compatible, if you will. Okay, so just to sum up what we did here, this idea that free energy is minimized, you can write it that way. And this is, in a way, a variational principle for self-organization. That's a free energy principle. So here I just wrote what we just discussed. The agent keeps tracks and acts on its external milieu through perceptual and active inference. And note that interestingly enough, you can write such a principle as a principle of least or stationary action where the Lagrangian, which is constantly minimized along the path, is variational free energy. So here are some concluding remarks. I'm not going to throw all of them, but the first one is basically what we just discussed, this idea that the generative model encodes preferences. If an agent maintains existence, its likely states are its preferred ones per se, hence the notion of self-evidenting. And I just also want to point out that this new approach or chapter of physics, let's say, consisting in describing physical systems as encoding probabilistic beliefs is called Bayesian mechanics. Okay, so having said that, thank you very much. And especially thanks to all these guys who helped me so much, especially Len. And yeah, thank you for your attention. I'm going back. Thank you, Richard. Okay, well, while we're settling back in and anyone is asking questions in a live stream, what is your PhD research? And if this is your side project, what is your main project that this kind of relates to? Yes, so, well, in fact, I kind of read about the free energy principle in my free time whenever I had some time. And what I'm doing in my PhD is so we have a couple of projects. The first project we did was really modeling bacterial evolution through. So basically we model bacterial evolution as a bias random work on genotype space with successive mutations and successful fixations. So that's what we are doing. It's not related to the FEP at all. And the second thing we have been doing is modeling. So basically we had a system where you have bacteria which can kill each other thanks to a system which is called the T6 secretion system. They kind of have needles with which they can go through the membrane of other bacteria and liberate toxins. And they can also bind to each other. So there is like a prepredator kind of dynamics. And we did like a lattice gas modeling of such systems. So basically that's what I'm doing in my PhD, which is not related to Bayesian mechanics. But I would like to transition to the field afterwards. So yeah, we'll see how it goes. I remember when I thought my PhD wasn't related to active inference. Okay, cool. Well, the work built to an amazing crescendo that in its simplicity, even though you highlighted it, it's easy to fly by, which is the coincidence of the preferences and the expectations. So could you maybe give a little context? How else has that nexus of preference and expectation been approached? And is the FEP only and simply and always that coincidence? Is that coincidence upstream or downstream of some other commitment that we make? Like what are the commitments that we really make? And is that alignment the commitment or a resulting commitment? Yeah, so first of all, I think the notion of self-evidentity may be a bit refined with the next formulation. But anyway, I think that's a crucial point about the FEP. And usually it's kind of confusing because when you're reading the papers and people are starting to write that the system, sample evidence for its own existence, you're like, what? I mean, I'm not sure to understand what's going on here. But in fact, yeah, I think the way I introduced it, this idea that by definition, a living thing is a thing which managed to sample life-compatible sensory data is really what allows this alignment story between surprise and preferences basically. And this idea that actively sampling unsurprising data is, in fact, and it's not like a tricky wording. In a way, that's really what's happening. It is sampling life-compatible or preferred in that sense data, hence the notion of self-evidentity. But yeah, I think the whole idea here is that we start from existence. We start from this past coupling architecture where the particle managed to maintain its physical integrity, managed to display a mark of blanket which allows the agent to have its own internal dynamics separated from the external. So somehow it managed to counter dissipation or whatever. And so from there, likely states are states consistent with the fact that it is existing indeed. So I think that's basically the idea. But yeah, in the beginning, this kind of line of reasoning can be a bit confusing. But in fact, I think that's very much what the FEP is all about. And actually, last remark in a machine learning street talk interview of Maxwell Ramstead, he was titled the FEP as a Physics of Survival, if I remember well. And I think that's very much what it is all about in a way. Awesome. How would you relate what you just described to reward or to reinforcement type learning schemes? Yeah, so I mean, I'm not an expert at all. I could not make the bridge here. But I know that Lance Dacosta made several works and interviews about the subject. And actually, I think there is a very new paper called Active Inference as a Model of Agency. You just shared actually today. So yeah, I recommend the viewers to check them out. And as far as I know, but here I'm just seeing what I heard is that any reinforcement learning algorithms can be framed in terms of Active Inference. So I think Active Inference is a very fundamental scheme. But yeah. Yeah, it's all good. Like the reason I asked just with how you presented it is what kind of observations do we want to sample? That could be the sensory embodied interface between the agent and the environment. Or you can take a more cognitivist approach and sample internal observations. But those are just external, some other internal. So what do we want to really sample? Well, if you're even in a position where you're talking about sampling from like a utility or a reward distribution, you've already specified a distribution. Why not just specify the existence distribution, the actual attractors and stationarities of the measurement, and then it's simpler because there's no proposal of a secondary intermediate between the temperature and how good different temperatures are by going and just saying it's not rewarding to be at 37 homeostatic temperature. It's just expected and likely. And the ball runs downhill. It's actually a lot simpler and more general. Yeah. And I think that it's way more simpler to... I mean, the idea here is that the agent has a kind of world model which, as you said, specified what are the expectations with regards to just existing in a way. And as opposed to designing explicitly objective functions which incorporates the notions of utility and so on. So, yeah, I very much agree. Earlier when we were looking at the flows and we had the breakdown of a flow, could you maybe just... What animal are you thinking about or what scenario can help us understand like what's the solid black line? What's the small red line? What's the spiral? What's like a physiological setting that we could associate to help us understand that kind of complex movement? Yeah. So, generally speaking, the first thing I could say is that this notion of solenoidal flow. So, it's like in the schematic in the first slide where you had this either contour circulation on the next entity or here the component of the flow which creates this sort of spiral here. So that it's this sort of oscillations are, I think, the sort of oscillations or cycles that are ubiquitous in living systems. I mean, I'm not a biologist, but you can... Or not really a biologist, but you can think of the circadian cycle or anything in any sort of systems. There is this thought of attractor where you're circulating along. And so here, specifically to this schematic here, I think the idea is that you have... So, you have... Basically, let's say that for a given sensory state you have a corresponding autonomous mode. And when the sensory state changes, the autonomous mode changes as well and in fact moves on its so-called manifold. So, basically I guess here you have the mode moving on its manifold. And now if we take the perspective of this autonomous state here, we converge to the manifold, to the mode. And because of the solenoidal component of this flow, the way we will reach it is with this kind of ever-decreasing cycles. So here's the idea. And I really recommend here the free energy principle simple paper. You have the flow on the manifold, it's just the path of the mode itself, let's say. And you have the flow of the manifold whose gradient component is the flow towards the manifold, in fact. So that's basically how autonomous states kind of react to sensory data which change the autonomous mode. And I think an important idea here is to assume that the flow of the manifold is fast as opposed to the flow on the manifold. So that's basically the sensory state are always in the vicinity of their mode and move with their mode. And yeah, I think that's pretty much the idea here. Okay, so let's just say that the black line is our homeostatic body existence. Life-compatible pH, oxygen, blood sugar. And we are that light blue dot that's off that manifold. Of course, if we were far enough off to be dead, it would be a moot question. But we're off, but within a life scaffolding a compatible zone. And now as time pushes us down into the right, there are different slices that we can trace we could take the shortest path, the gradient flow directly towards the manifold. So as that plays out through time, it would look like a linear line converging to the thick black line. Or pure solenoid flow would just stay equally far away from the thick black line and continue to spiral. So that would look like a cork screw through time. And then here, when you have the combined character of the linearized convergence towards the manifold and the cork screw out through time, we get this kind of winding spiral. So it reflects on me that the gradient flow is pragmatic value in that it aligns future observations with preferences. And the solenoid flow has an almost epistemic character in that it circulates amongst a set of equally valid outcomes. Yet here we're not looking at the pragmatic plus epistemic decomposition of the expected free energy policy selection strategy like equation 2.6 in the 2022 textbook. So is that just a concordance or where do you see some of those topics connecting? I'm not sure, maybe. But having said that on the meaning of the solenoidal part here, I know that on the... I don't remember if it's in the free energy principle simpler paper or someone else, but there is an analogy. I mean, they discuss the meaning and the role of the solenoidal flow where they say that it kind of helps mixing the systems and you can view and they discuss the metaphor where you want to dilute your coffee, for instance, and you're going to have this sort of motion in order to reach as fast as possible the steady state where everything is diluted, but I'm not sure. I didn't think enough myself to provide any sort of interesting insight. Just to have composed it is very insightful. Well, you made choices assembling things. Like what do you feel like would have been background? Maybe a course or skill? What background do you feel like you kind of conditioned upon that somebody might want to check out? And then what do you feel like you would have wanted to include in the state-based formalism? Because to bring it into a under one hour timing is very concise. So where do you feel like somebody could fill in some background to pick up with you at the beginning? And then what else do you think would make a fuller presentation? I think, I mean, there are a few aspects and details that I didn't really, like, fully discuss. Well, first of all, all these things which here which involves like central theory, a central manifold theory and stuff like that, we kind of played qualitatively with it. We didn't really go into that. And also if we want to be like full, really full, formally speaking, let's see, maybe... Well, there are a couple of things that we kind of accept without really checking all the assumptions and all the derivation. And I'm especially thinking of the Helmholtz-Hau, the composition of F, because, of course, you need a steady state, a net density to exist in order to have such a decomposition. So here I think there is a lot of stuff to check. And I mean, there is a nice... I think it's in the appendix B of the Bayesian mechanics of stationary process paper by Lenz where he derives the Helmholtz decomposition. So yeah, there are quite a few things we kind of state without derive. So it can... If people are interested in going further, I think that's kind of interesting formal directions. And yeah. I think it'll be a really fun collaborative project to axiomatize and formalize and modularize using the actin fontology and understand a lot of these relationships. And then the other piece that that made me think about is like, what work is any of this math doing at all? Just kind of like the ultimate existential question here. And when we condition upon existence, we've kind of like off sourced a lot of cognition. We don't need to make the jump or the walk or the miracle from axiom to embodied existence or to even measured hypothetical existence. So that is left unaddressed. The margin was not big enough, but it wasn't even addressed and maybe there are even advantages to leaving the... What happens before the conditioning, you don't want to take it with you after you condition upon it. That's the whole Markov concept. Like if you're like, well, I'm conditioning on five years ago in the present, but also I'm carrying five years with me today. Well, then it's like, well, then it wasn't conditioned upon. So to really condition upon measurements is an extremely radically simplifying maneuver that may change the scope or the applicability of the framework relative to a conception in which what the free energy principle does is describe how things come to be. However, this rather conditioning upon it opens up that discussion and more circumscribes this very analytically tractable setting of the agent and the environment across a conditional interface. Yeah. By the way, about the conditional thing, there is now the notion of weak Markov blankets that Dalton introduced, which kind of lose the approach, let's say. And because indeed there is a question on... I mean, does it... Apart from the formal setting we have here, can we really apply it to real systems and stuff like that? And also I think it's the physics of survival in itself at a given time scale there is... At a given time scale we survive indeed in the sense that there is indeed this partition or conditional independence between the internal and external. Here is the physics you have to comply with but we didn't tell you... It tells you how the Markov blanket arrives or whatever, it's just not what it is designed to explain. But I think, generally speaking, it's really informative because, for instance, if you are considering the... I mean, just the sort of approach in general. I mean, for instance, if you consider the pendulum effect where you put pendulum oscillating on the table and they are going to synchronize with each other. And I think that Kuiya Izomura did a paper about that recently. In order to understand what is going on and why the pendulum synchronized at some point, you just have to recycle all this line of reasoning with the synchronization map. That's very what is at play and what explains why the pendulum synchronized when they are both on the same table. So I think it is really informative in order to understand what is going on when we are talking about synchronization phenomena across sparsely coupled systems. And also, it gives you, I guess, the sort of recipe to understand what it takes to be an agent if you want to design an intelligence system. But, yeah, the question of how much useful it is beyond the fact that it's just some nice formal framework. It's an interesting discussion, yeah. And just two things. First, I would like to go back to your previous question about what sort of things could be discussed further. I think an interesting point we didn't really discuss fully is the notion of synchronization map because we didn't necessarily discuss the hypothesis and stuff about the synchronization map. And in fact, I think there is much things that can be said, for instance, because we assume injectivity thanks to the rank nullity theorem. It kind of constrains the dimension of the internal manifold here with respect to the blanket manifold here and it kind of constrains in order to have injectivity thanks to the rank nullity theorem. And so it kind of constrains in order to say it in a qualitative fashion, it kind of constrains the complexity and the richness of the internal states which speaks nicely to other frameworks like HB's laws of frequency variability where you want the regulator system to be as sophisticated or as rich to the regulated systems and here you need the internal states to be enough complex or to constitute the sufficient statistics, let's say, to be able to parameterize the density indeed. And this richness, let's say, is constrained by the cardinality of your sensory channels, if you will, because basically you need the internal manifold to have the same dimension than the blanket manifold or the sensory manifold to have the same dimensions than the autonomous manifold. So I mean, I think there is many things to discuss about this aspect here. And the last thing I would like to say about your last question about the applicability of the framework and how much it's useful as opposed to be a simple, elegant, formal framework, I think, so you know there is these papers about, like, the Markov-Blanquet trick and stuff like that, about how much difficulties like to identify what states correspond to the Markov-Blanquet or whatever. And personally, I'm not really convinced by these critiques because to me it's like saying to Newton, yeah, I mean, I'm not sure that I can do anything with your framework. It's complicated if not impossible to model systems with clearly identified and separated rows and masses, let's say, okay, fine, but we're talking about Newton mechanics here. So I mean, I think it's the same here. It's if you have a sparsely coupled random-damped system, that's the sort of behavior it will display. It tells you fundamental things about the nature of living systems and the idea that when it comes to a specific system, it can be quite tricky to model it. That's another question. And indeed, when it comes to the art of modeling complex systems, it's interesting and we can discuss about how much complicated it can be to apply the framework, yeah. Awesome. I love that. It's like the art of the science and the art of the modeling and the craft, especially in the kind of early hand-built, largely custom stage. One thing I even wondered, looking through these slides, what fraction of these representations and formalisms exist only analytically and is there a code representation of this exact scenario or are some of these areas equations that don't have code realizations? They're just pure existing equations. So, I mean, I think more or less everything here can be simulated, even this synchronization thing here, you can perform simulations where you can really literally see within the simulations the synchronization. And I mean, the whole thing here can be you can simulate such parts like a couple of dynamic systems and kind of interpret the dynamics indeed as the way we frame it. But yeah, that's also an interesting aspect. It could be cool like in the GitHub repo in the journal for this transcript or something like that to curate together the simulations that demonstrate or a minimal specification for it. Because it's awesome. Yeah, and actually there is a... I mean, I think it's in Lens paper about synchronization map, the Bayesian mechanics of stationary processes paper. There are some simulations where he shows that... I mean, he shows the synchronization map at play and it shows that basically you can't go back to... I mean, if the map between the blanket states to the internet states is not injective and you apply the synchronization map to the actual internet states, it gives you like some natural event things and there are some nice plots from simulations. So that's definitely a paper to check out. So where do we land and then how do we leap, exercise, relax to prepare for part two? Yeah, so I think here the world point was... I mean, this world formulation in a way about the momentary... the short term and the momentary response to autonomous states, to sensory stimuli, let's say, if there is this... I mean, the kind of instantiated active states are so that it comes, whatever. But in the next video where we will look at the path-based formulation of the framework, the world idea would be to ask what about path and what about future path and what about the long-term behavior and what about planning, what about higher-order cognitive abilities and we will kind of extend the scope of what we are doing in that sense. But yeah, I mean, I think from a formal point of view, here I kind of introduced many things, variational inference, synchronization map, et cetera, one after the other before actually deriving the free energy principle. Next time, I think it will be more straightforward but the main concepts which will be at the core of the framework and which can be confusing if it's the first time you look at it is the notion of generalized coordinates of motion when you relax the white noise assumption and that's something that can be confusing, especially for the physicists, because when you're starting saying, yeah, the generalized Lagrangian, it plays a role of an action or whatever, they are like, no, but Lagrangian is not an action, what are you talking about, et cetera. But when you get acquainted with the world construction, it's very elegant. But that's definitely something people can start to look at before prior to the livestream, yeah. Awesome. Yeah, well, it was excellent. You brought a lot together and a lot of trails leading off this trail and the citations and previous papers that also brought things together, Lance's work and others, and it's gonna be awesome to see part two. So thank you, Richard. Yeah. Thank you very much, Daniel. Thank you. All right. See ya. Bye. Bye.