 All right, hello and welcome everyone. It's Actinflab livestream number 45.1. It's June 1st, 2022. Welcome to the Actinflab. We're a participatory online lab that is communicating, learning, and practicing applied active inference. You can find us at the links here on the slide. This is a recorded and an archived livestream. So please provide us with feedback so we can improve our work. All backgrounds and perspectives are welcome and we'll be following good video etiquette for livestreams. Head over to activeinference.org to find out more about what we're up to at the lab and participate if you'd like to get involved. We're here in livestream number 45.1 and we're in our second discussion around this paper. The Free Energy Principle made simpler, but not too simple by First-in-Decosta, Sajid, Heinz, Goldsofer, Pavliotis, and Par. And the video, just like all of them are, is an introduction and an invitation for those who would like to ask questions in the live chat, for those who would like to join us next week in the conversation and just to try to learn and unpack and understand this paper, as well as seeing it as a starting point for the continued learning and applying that we're all involved in. And we will just begin with some introductions and overviews, remind ourselves where we've been in this paper and then start to think about where we're going and we'll see who joins, we'll see who asks questions in the live chat and I guess where we go together. So we can begin just by giving an introduction, saying hi and mentioning anything like what's something that we're excited about, what's something that we liked or remembered about the paper and what is something that we're wondering about or would like to resolve towards the end of the discussion. So I'm Daniel, I'm a researcher in California and I'll answer the questions later. How about you, Blue? I'm Blue, I'm a researcher in New Mexico and I don't know why I'm tasked with going first. There's a lot of things that I think I would like to have resolved or greater certainty about like including the hierarchical nature of active inference and how mathematically that works out as well as like the time dependence and time independence. And I'm curious about like can, I don't know, we can get more into it later. Yeah, let's just sit here. While people are filing in and getting their questions ready, let's take a look back at big question, kind of situate ourself in this paper and just recall also what the main sections are in the roadmap. So Brock and I worked together on 45.0 and we highlighted the big question, just one way to ask it, as what are the foundations of the free energy principle, FEP? And what does it contribute? And then also we can ask how is active inference related to FEP? And we drew from the abstract where they mentioned starting from the description of the world in terms of a random dynamical systems and ending up with the description of self-organization as sentient behavior. Greetings, Carl. How goes it? I've just managed to get back in my house after being locked out by my son, but otherwise fine. You found an alternate policy approach? I did and some keys hiding underneath the bucket. Would you say this paper is the key to the free energy principle or what is the key? This is, which one's this? This is the simpler one. Simple as possible. Made simpler, but not too simple. There are simpler ones. I hope there will be simpler ones. So this is one route to the desired simplicity. All right, so what is this paper about? What did you intend to do and how did you accomplish that in this paper? It was originally conceived as a brief summary of the particular physics paper. If you remember that one that's an archive, we're very dense, hundred page deconstruction of physics as we know it. So this was meant to be a technical rehearsal and focusing specifically on the free energy principle and the active inference part and it was meant to be as simple as you could get it, but as it was being written, it became more and more complicated. So now we're writing another one that's even simpler. It was a technical rehearsal, then it was a dress rehearsal, and then before you knew it, it would be a showtime. It was a whole fashion, yes. Well, there's many places to jump in, but first, thanks so much for joining and feel free to join and leave as works for you. We were interested at the outset with the layout and the roadmap of the paper. So you described how there's the introduction and then the following sections are topically arranged. So how did these seven inner sections come to be? Is that the seven islands in the island chain? Are there other topics? How did you come to this order and these seven sections? Right, are you posing the question to me? I guess Blue could go for it, but maybe you could go first. So I can't see who else is here, so I'm not quite sure who else could speak to this. So from my point of view as the author, it was exactly as he's saying, was the stations en route from the basics or the basic assumptions of the free energy principle, which is basically the long van equation, right through to the end, which is an account of sentient behavior meant in a very deflationary way. So it was just the steps that you would need to take to go from a long van description of some system through to, first of all, defining systems of interest that had steady state solutions and then further thinking about the nature of thingness, how you would distinguish a thing from something else and the implications of that distinction as articulated with a Markov blanket for the dynamics and various interpretations of those dynamics and of course the interpretation we use is that of self-evidencing or gradient flows that look as if they are the kind of gradient flows you would see in systems representing in a Bayesian sense the causes of their sensory inputs on their Markov blanket. And then looking at special classes of types or kinds of particles and in particular, the limiting case of large particles where the random fluctuations on the particular states namely the Markov blanket and their internal states tend to zero essentially banish such that you are now describing things that have a classical aspect in the sense that you wouldn't need a sort of microscopic or quantum description that they move and behave in accord with classical Lagrangian dynamics. And that has some special constraints on these gradient flows on the Lagrangian or the suprisal that is being read as a gradient flow or sharing the same minima as a gradient flow on variational free energy and that takes us then to the final move which is what that would look like in terms of planning as inference and the articulating that in terms of expected free energy. And then we stopped. Awesome. Blue, anything to add or we can continue this line? You know, I have a question that's like a burning question from reading this paper and listening to Carl's talk and then also doing the textbook reading group. I'm curious about the language, the causes of sensory inputs. So, and I'm curious about this because you can see it in so many ways and I know that it's open to interpretation and depends upon the model of the system. But as someone from a neuroscience background like the cause of my sensory input is like a 450 nanometer light wave or a pressure wave through air in terms in the sense of sound. So when you think about the cause of the sensory input like if there's a red balloon and I'm seeing red like that's 700 nanometer light wavelength that I'm actually detecting but the balloon is like missing from that. And so I just wonder why did you choose this term or do you mean it in a literal sense or is it supposed to be abstracted? Am I supposed to come away from the cause of my sensory input? Come away from the red light and go more towards that's a red balloon? I would come away from the red light and go more towards the red balloon. Would you excuse it for a second but the sun that lost me out is just knocking on the door because I stole his keys. Oh yes. I'll just see it. Thank you. Move away from the red light. Away from the red light. Sorry about that. Yeah, no. So the notion or the use of the language causes is used in a very broad sense in terms of carving up all variables into causes and consequences. So very often when you talk in any modeling context particularly in terms of the inversion of a generative model people ask what do you mean by inversion of a generative model? Well, what I mean is a generative model is a probabilistic description of the joint occurrence of causes and consequences. And usually the consequences are observable and the causes are not. So the sometimes referred to as hidden causes. So the generative model provides this description of the joint occurrence probabilistic relationships between causes and consequences. And the way that we, or the way that one can understand sensations or sensory samples is as the consequences of some abstract causes. But because the causes are hidden or unobservable they can only be inferred. So these causes now become random variables that can only be inferred. So it's from the point of view of a modeler the forward model or the generative process would be a map from cause to consequence. But we only have available or at hand or on eye the consequences. So we have to do the reverse mapping which is the genesis of the notion of model inversion. We have to map from consequences back to causes. And that's basically the process of Bayesian inference. So that's where the language comes from. It's just sort of spreading the whole universe into two types of variables, causes and consequences. When you move to the free energy principle as articulated in the particular physics paper things become a little bit more nuanced because now you've got the consequences of external causes impressing themselves on the Markov blanket. But you now also have internal states internal to the Markov blanket that are now doing the Bayesian model inversion effectively. But the causes are meant as a universal and very, very high dimensional or referred to a set of very, very high dimensional variables, external states out there that conspire through whole chains of causal links to produce the sensory consequences. So I would subsume both the red balloon and the reflected and the radiant light and all of the biophysics that causes your retinal cell to fire. That whole chain of causality will entail a whole series and possibly uncountable number of hidden causes in generating the consequences. And you have to work back through that chain and hence the notion of hierarchical models that explicitly try to put that chain in place. So one level of your hierarchical model you have a red balloon and another level of the model you might have the consequences of that red balloon in the outside world, which would be reflected light and passing through your optical machinery and impinging upon your retina all of these things will be implicitly modeled or coarse grained in your generative model so that you can solve the inverse problem and reconstruct the scene out there given this is how you observed it. Because how you observed it is specified in the generative model through a likelihood mapping. So if I knew the cause, red balloon, what would it feel like? How would I palpate it? What would I observe? And that whole likelihood model would entail all the mechanisms that you described as a sound waves and pressure waves and photons and the like. Just to conclude, another common word or phrase people use certainly machine learning for this notion of causes that are unobservable that they are effectively random variables because they are the support of a generative model which is specified as a probability distribution is hidden states. Another word for this is latent states. And I've been told off by some reviewers for using the word latent states but I can't remember why. So I've reverted to using them in exactly the same sense that latent states or hidden states are those unobservable variables that are distinguished between from observable states or outcomes or measurements or data. So that's where the language comes from. That's very helpful. Yeah, it really helps me because it's something that I've struggled with but to think about it a hidden cause in terms of a causal chain that leads to the observation that that's a very helpful way for me to think about it. Thank you. Good, yeah. And also how hierarchical nesting of models which we can come back to it's like one model might be the unobserved is the temperature in the room and the observation is the number on the thermometer but you don't really see the number, you see the light. So then you could have a two step model where the causes of the temperature are the shape of the thermometer and then the shape of the thermometer is what number it is. And so depending on how much data and the adequacy that's needed one could coarse grain and just say we're gonna go from the hidden state to the observed inferred number or can start to parse that if there were degrees of freedom there. Let's maybe go through. Oh, did you? I was just gonna say that's a nice example but you know, you could have a sort of a deep artifact or personal creature with you actually has a representation of temperature as the cause of the shape or the height of the thermometer or you could have a very shallow artifact that just registers the height of the thermometer and I have in mind here the difference between a thermostat and me or you. So we would know that, you know we would represent and explain changes in the shape in terms of a change in temperature because that's our model and we would have a deep model. The thermostat might not it might directly act upon the world simply because the shape, you know there are some quantitative single valued scalar observable that is telling it to switch the radiator on. So it's a nice distinction. You know, there's no rule that says how much core screening or how much depth or hierarchical structure you bring to the table in terms of trying to explain what you can observe. Great. Let's go to quite early in the discussion in the introduction. Elsewhere, the tautology and the tautological aspects and the falsification have been explored. So let's jump into a slightly different point which is the utility. So there is a rhetoric in the paper is such a description of the FEP being in this framework or principle like way. And the answer was probably no, but there was a however. So where do you see the utility of the FEP and how does that come into play in the particulars for Carl or blue, maybe to you first. Blue goes first this time. Yes, blue. Have you seen utility in the FEP? I like that the FEP resolves so many very difficult things to model or maybe doesn't resolve it fully yet but has the potential to resolve. It is maybe a better way to put it, especially in terms of hierarchical modeling. Also the balance between exploration and exploitation, complexity and accuracy. So I think that it's useful to describe circumstances where there's not always a single solution but maybe a best solution or how can we find the best solution given what we know. Okay, what would you say, Carl? What was the function of this example with the principle of least action which we'll also want to explore. But how is there a utility in the FEP? How is there a utility in the principle of least action and what is their relationship between FEP and the principle of least action? I think that blue has provided the answer and it can be illustrated with Hamilton's principle of least action. So if I wanted to understand how a ball was thrown or understand its dynamics and its kinetics and it's the other laws, the physical laws by which a ball or a massive body pursued a particular trajectory, then what I could do is to apply Hamilton's principle of least action to the particulars of this situation, the mass of the ball, the initial conditions, the wind resistance and everything else that I would need to know in order to predict the behavior and understand the mechanics that underwrote what actually unfolded. So if you dumb down blue's aspiration to understand sentient creatures and the distinction or the relationship between exploration and exploitation and just think about understanding the trajectory of massive bodies, what you will end up doing is appealing to a principle of least action and all that says is these prescribe the lawful behaviors if you apply that principle to a particular situation. So for us as builders of sentient artifacts or neuroscientists trying to understand sentient creatures, what it says is that you have a principle at hand that allows you to build. And if you can build an artifact that either resembles or reproduces the kind of behaviors that you're interested in studying, then you have a more formal understanding of the mechanics and how those artifacts or systems that actually operate or you can simulate or you can build artificial intelligence systems. So it's all in the pragmatic building of something by applying the free energy principle to something and you may be asking, well, what is that something? Well, it's a generative model. So all the hard work starts not in understanding or deploying the free energy principle. It's all the hard work is really specifying writing down the generative model apt to describe the kind of creature you're interested in or the system you're trying to characterize and understand. I'll just close by nodding to or giving a nod to Richard Feynman here because that whole, the motivation for application of the free energy principle and in a sense the motivation for the free energy principle in and of itself was just to realize or provide tools for realizing the observation that was famously or purportedly on his blackboard at the time of his death. You can't build something, you can't create it, you can't understand it. So to build it is to understand it and vice versa. Great. I'll ask one follow-up on Feynman and then we have a great question in the chat. What is variational inference as Feynman utilized it and as it's utilized in FEP and ACTIMF? Well, variational inference is generally read as a suite of data analytic tools called variational Bayes. It used to always also be called ensemble learning but I haven't heard that phrase in the recent years. So what is variational Bayes? Well, it's a form of approximate Bayesian inference which should be contrasted with exact Bayesian inference. So what's the difference between approximate and exact Bayesian inference? The key difference is that you assume a functional form for the posterior beliefs that you're trying to update from your priors to the posterior beliefs. Now, in assuming a functional form, you are going to depart from the actual true posterior and it's that departure which renders the inference scheme an approximate Bayesian inference scheme but what you buy for that approximation is tractability. So exact Bayesian inference cannot be realized physically by which I mean it's mathematically intractable and also numerically impossible to do exactly even if you have enormous sampling and you use sampling Bayes procedures like metropolis hastings or Gibbs sampling just open brackets is another way of approximating a posterior just with a large number of samples and sample density. So what's the fundamental importance of the move from exact Bayesian inference to approximate Bayesian inference namely variational inference? It's because you've assumed a functional form which makes your inference approximate you can now analytically convert what is an impossible marginalization or integration problem into a tractable optimization problem. So just to put some flesh on that if you wanted to use if you wanted, well, let's take Richard Feynman's problem. He was trying to work out the probability distributions over all the paths a small particle could take from some initial state to some final state. Now, given the nature of the paths he had in mind there is a universe of an enormous and countable number of such paths which meant that any either analytic or numerical realization of these paths meant that it was impossible to normalize the distribution by marginalizing or integrating over the distribution to work out the normalization constant also known as a partition function which you would need in order to specify the probability distribution. So when people talk about an intractable marginalization or integration problem, what they mean is the very high dimensional probabilities it is you cannot integrate under the unnormalized distribution to produce a probability distribution. However, if you know the functional form then you can create what's called an evidence bound or a variational free energy which is always for a physicist always greater than the quantity that you want to estimate or marginalize the evidence that constitutes the result of that marginalization also called the marginal likelihood which means that if you can reduce that bound you're guaranteed to find the minima that is always greater than the thing you actually want to minimize which is usually the negative log marginal likelihood which converts an impossible integration problem into a tractable optimization problem and that's conceptually quite important. So what it means is that you are now converting effectively an inference problem and it was up to phase an inference problem into an optimization scheme. So now you have a formal definition of the normative account of behavior. You can now talk about inference as an optimization procedure where you're optimizing a well-defined objective function that is the variational free energy. Why variational? Well, very simply given the functional form of your approximate posterior and given the functional form of the bound on the marginal likelihood or the negative log marginal likelihood to be more precise you then use the calculus of variations to demonstrate that this is indeed a bound thereby you can perform a gradient descent or you can find the minimum analytically as part of that optimization scheme. So the variational base just means you're using the calculus of variations to license the use of a particular approximation to the posterior. Welcome, Thomas. Thank you for joining. Would you like to say hi or give any overview? I can continue with the question or please feel free to take in any direction. Well, thank you for inviting me. I don't have any particular comments at the moment because unfortunately I missed the start of it. Apologies that I joined late. Perfect. Well, I'll go to the question that was in the chat. So this was asked by Jax in relationship to our discussion about the balloon example and the thermometer example with hidden causes and observed consequences. Is this all in the service of homeostasis? So how can we connect what we've been talking about with this causal Bayesian framework and the variational inference to the idea of homeostasis and survival for cognitive entities? As per tradition, we could go to blue or Thomas, if you'd like to jump in on that. No, please go ahead. You have to give Thomas a few minutes to settle in. Daniel, I'm deflecting that question back to you. So you've dodged every question so far today. So we're going to let Daniel answer this one. Aha. Sure. So just to give a first thought. So we just heard about how variational free energy calculation is converting an impossible integration problem to attractable optimization and a gradient descent that's through time. And so if we were trying to stay within homeostatic bounds for temperature or for blood sugar or some other vital parameter, and we found ourself anywhere but exactly on the one target location. So we're some divergence away from a set point. The number of paths that we could take back to get to our set point or to get to the range of comfort, just like Feynman's particles, the number of paths we could take would be infinite. Like in the blood sugar example, you could change the regulation of every single gene in every single possible combination. So the number of paths and microstates would be very large. Variational inference and planning as inference would give us a way to, in an unfolding through time way, functionally optimize and return to homeostasis rather than waiting on the sideline for this intractable integration to occur, we could start moving in a direction that's more like a ball rolling to the bottom of a hill or like a molecule descending to a Gibbs free energy attractor, but in some other space that we're doing inference in. Maybe blue, next. We'll go from least knowledge to most. Yeah, so I don't know, homeostasis, I think there's always, so we started off by saying what exists or if you exist, what must you do, right? Like, so the paper starts off saying this way. So if something exists, what does it, what must it do? Rather, what must something do in order to exist? So it starts with the reverse of this and I think homeostasis is like the most driving force behind life, which goes back to the life as we know it paper and also a particular physics. So what must something do in order to exist, something biological must maintain this non-equilibrium steady state density of homeostasis because if you don't, then you die. So equilibrium in a living system is death. And so fundamentally, I think that it does come back to homeostasis. Yes. Thomas? Sure, I think I've got my bearings now. So I think homeostasis is always quite an interesting comparison with or thing to relate back to variation and inference and the free energy principle. And for me part of the reason for that is that homeostasis is really just the observation that sense of values tend to live in some particular space over time or some particular distribution. And so it's more when women talk about homeostatic bounds, what we're talking about really are just I mean, the description of how things tend to be and coming back to this question about inverting this question of how things have to exist and going in reverse and saying, well, it does exist and it exists in this form. How does it then do it? We're sort of starting when we talk about homeostasis from the principle that we are in this range and traditionally you then talk about negative feedback type mechanisms of reflexive correction of deviations from some set point. But you could also see that as exactly the process of maintaining some steady state density over time of climbing probability gradients exactly as you would for variational inference. And when you look at things in that framing then and particularly because in homeostasis we're not leading with sense of values whether those be Bayer receptors or measures of glucose control or sorry, glucose and that sort of thing. We're really dealing with a probability distribution over some sort of sensory data here predominantly inter receptive data. Which is effectively a marginal likelihood once you then equip it with a model of how those sense of variables are generated. And so one of the things that's useful in terms of taking it from homeostasis to variational inference or putting it in the realm of the free energy principle is it then asks you to try to formulate the sort of generative model you'd need to be able to explain those data and therefore to optimize your own internalization environment and that's I think where a lot of the interesting neuroscience comes in where you've tried to build the models that the brain might use and then invert as the process of performing autonomic reflexes and the like. Thank you. Carl, anything to ask or we can continue with other questions. No, I did this very briefly just to reinforce the last two answers. The question is an excellent question because in a sense the free energy principle is just about describing things that have a generalized homeostasis. So it's not saying how would you implement homeostasis? It's saying that our universe seems to be populated with things that possess a homeostasis. So what behaviors must they, what principles must they conform to? So that's a very good question. In this notion of generalization, we just heard generalized homeostasis and I'd like to ask how it's connected to two other generalized, generalized synchrony and generalized coordinates of motion. And if there's any other generalizations, we can add those, but how might this notion of generalized homeostasis be related to generalized synchrony for one entity or in communicating ensembles and then the notion of generalized coordinates of motion? Shall I take that one? Sure. Shall I take that one first? I think there's a very close relationship between the notion of generalized synchrony and the notion of homeostasis. And by generalized, I just mean the same tendency to keep states that matter, sometimes referred to as essential variables in the cybernetics literature within certain bounds or within viable domains. If you wanted to describe that process as a systems, dynamical systems theory, what you would be saying is that the trajectories through some state space are limited to a relatively small attracting set and the smallness of that attracting set defines the bounds or the range that characterizes that particular system. So for you and I, we have quite narrow bounds, say, on our temperature. And that just means that our dynamics keeps us within or close to those set points. So from a point of view of the dynamicist, understanding the dynamical behavior of two systems that are coupled to each other, and these can either be a creature or a particle and it's embedding niche or environment or the external states or there could be two similar particles in dynamic interaction, the way that if they are coupled and they are both restricted to a small set of states or perhaps the relative to the total number of states or paths that they could pursue or occupy, then they're said to have a generalized synchrony. Technically, it means that I can predict how I will move given your state and vice versa. So I think there is a very close formal relationship between the notion of generalized synchrony and homeostasis of the kind that is accountable to the coupling between one system and another system. Generalized coordinates of motion, I think are meant in a slightly more technical sense. So in order to understand the dynamics, so to move from a description of things in terms of where they are in some state space, say your temperature, and now to understand the processes that keep you at a certain temperature, then you have to talk about the rate of change of temperature or the paths in a state space where one dimensional coordinates of that state space is temperature. So how would you describe mathematically paths? Well, you could just have a long list of states as an increasing function of time or there's a more compact way of describing a path which is effectively to use a Taylor expansion and describe it with a list of coefficients of that Taylor expansion, which turn out to be the state you're in at the moment, your velocity, your acceleration, your jerk, and all higher order temporal derivatives. And if you go to an infinite number of temporal derivatives, you can now reconstruct using that Taylor expansion, your trajectory into the future and into the past. So generalized coordinates of motion are just the observation that you have to in describing or more precisely in defining the support of a probabilistic description of a system. If you want to describe the paths, then you just augment the state with all its higher order temporal derivatives. And that's called generalized coordinates of motion. Interestingly, should be slightly disambiguated from the way that people use generalized coordinates in physics. So that's usually read as basically thinking about a complete description of the system just in terms of the state and its first order temporal derivative. So you'll come across, for example, in mechanics P and Q. So momentum and position. So if you're in quantum physics, you'll be dealing with real and imaginary parts of your particular variable or a probability distribution. The important observation here is that you've got more than one variable to describe what's going on. And this, if you like, is the simplest kind of generalized coordinates of motions. It's basically X and X prime or the rate of change of X or X dot. But you can extend that notion and infinite item to create a Hilbert space or an inferential space where one point in this inferential space encodes or stands in for mathematically represents an entire trajectory or path. And that's a formalism that in fact the free energy principle is actually articulated in in terms of path integral formulations. Awesome. Thomas, would you like to add anything there? It's difficult to know what to add to such a comprehensive description. I suppose one thing I'd pick up on is, and perhaps this is more a question for Carl, but you mentioned the idea that Taylor expansion is a more compact way of describing the trajectory. I was wondering whether that's necessarily true. I appreciate it would be if the trajectory is relatively simple. But you could argue that you need a very large number of Taylor coefficients to be able to adequately describe a more complex trajectory, complex in the sense of non-linear, whereas it may be simpler in some circumstances just to have a series of discrete temple points. Yep, excellent question. So it speaks to when I or people like me talk about compact, where sort of implicit in that statement is the notion of some truncation. And certainly one truncation of the kind you just mentioned would be having a discrete set of points. But if you wanted to represent the entire path, you're obliged to write down an infinite number of numbers. You can either write an infinite number of states as time moves forward or backwards, or you can write down an infinite number of Taylor coefficients or generalized coordinates of motion. So certainly generalized coordinates of motion in and of themselves don't afford any compactness. But when you actually apply these different kinds of representations practically speaking, then there are two kinds of truncations that now present themselves. If you're dealing with continuous time trajectories so that your path is now described as a function of time, you normally have to truncate it by specifying t equals zero and some horizon in the future, the initial and the final state. So there's a choice there that you have to make in order to deal practically or numerically. Indeed, some instances analytically with a path by truncating its length in time. The equivalent truncation in generalized coordinates of motion is in terms of a discrete bound or upper limit on the number of the generalized coordinates or order of the temporal derivatives. The nice thing, the compactness comes in generalized coordinates in that there's a natural space in which you can do that truncation and that comes from the fact that as you increase the order of the temporal derivatives or the order of the generalized coordinates of motion, because the variance of the random fluctuations on the higher order motion gets large very, very quickly, you can assume that after a certain number of generalized coordinates or orders of generalized coordinates of motion, the precision is effectively zero. So that means that you can get away with describing an infinitely long trajectory with a small number of generalized coordinates because there's no further information in the higher orders. So it now becomes like a local description, but in that locality, it becomes compact in the same way that you would discreetly change the length of the time series or the path in a sort of function of time formulation. So practically what tends to happen, and it has to happen in the sense that random fluctuations have to be fast, therefore, they have quite a narrow order correlation function, and the shape of the order correlation function determines how quickly the precision of higher order motion decreases. So generally speaking, even for quite smooth random fluctuations, you don't need to generalize orders of motion beyond about four to six orders of motion. So just practically speaking, then you now have a compact representation, but in doing that truncation, you now can no longer roll out the path using the Taylor approximation to T equals infinity. You can only roll it out with confidence for a relatively shorter period of time. It's not defined precisely because of course you're defining it probabilistically because you've got random fluctuations in play. So it has a sort of soft locality associated with it, but only when you do the truncation. Awesome. I'll ask a follow up on the Taylor series, especially in SPM, the Statistical Parametric Mapping, there's discussion of the Volterra series. So what is the Volterra series? How is it different from the Taylor series? And what are they doing? Does Blue want to take the social Thomas Sevac? I can have a go, but I think that's a bit before my time being involved with SPM on the whole. As I understand it, it's more a kernel-based representation of dynamics where you're looking at convolution with particular kernels over time as an expression of how dynamics might evolve. Beyond that, I think I'll probably have to refer back to Carl to answer that one in any detail. I'm asking to understand a little bit of the history of the movement from focusing on the Taylor and the Volterra to only seeing the Taylor series in the current paper we're discussing. So Thomas is absolutely right that the Volterra series is a way of describing convolution operators that are an equivalent description of your equations of motion in terms of convolution kernels of increasingly high order. But Daniel, you're also absolutely right, implicitly, in the sense that a Volterra kernel expansion just is a functional Taylor approximation to any given dynamical system. So what's the difference between a Taylor expansion of a path of a function and a functional approximation? It's using exactly the same technology, but now expanding or applying the Taylor approximation to the equations of motion, so the functions that determine the flow of a system. So you're applying a Taylor approximation to the flow operators or the equations of motion. But the spirit of the expansion is exactly the same as a standard Taylor expansion. I don't think we need to know much more about it, really, in the sense that the Volterra series expansion is another way of providing a compact description, compact in the sense you truncate the Taylor expansion in the same way we're just describing truncating the order of generalized coordinates of motion to get a compact representation and, in some instances, a compact inversion scheme. So normally, when people actually apply, and certainly as we apply in a practical data analysis context Volterra series, we generally don't worry about very high order terms. And in fact, all our software is written just to deal with first and second order terms. So we actually stop at the second order derivatives or functional derivatives, which is even simpler than stopping at the fourth or the sixth order in terms of expressing the original variables in generalized coordinates of motion. But because it's really a device that just re-represents a differential equation in terms of convolution operators and kernels, it doesn't really get you anywhere in terms of understanding the mechanics or the theory behind it. So there is nothing in the free energy principle that refers to the Volterra series expansion. But you could always derive a Volterra series expansion and implicit kernels to a certain order, given some differential equations that at hand in the first differential equations are doing a gradient descent on free energy, such as in a Bayesian filter. You could always articulate a Bayesian filter in terms of a convolution operator, which might be one way of understanding how Bayesian smoothing and Bayesian filtering operates. Awesome. We're deconvolving and adding action in many, many layers and adding more depth to the deconvolution and to the planning and inference. Lou? So speaking of convolution, I have a follow-up question that's maybe a little bit convoluted, or hopefully I can articulate it well. So when we're talking about, in the very beginning of the paper, you talk about the Focker-Planck equation, and then you talk about the path integral formation, and the Focker-Planck encompasses random fluctuations, and then the path integral formulation encompasses the trajectory of a particle. And you talked about the fast path and the slow path, and the Focker-Planck is fast, and then the path integral is slow. So I just want to try to wrap my mind around what happens. For atoms inside of my cells, the density is over states. Like the atoms can occupy excited states or any number of states. But the cells are really the ones doing the work. The motion is happening inside of cells. So that's the slow procedure, the path integral path. But then when you think about the cells, they're in my body, so the cells that are occupying the states, and I'm doing the moving around. So then I'm on the trajectory. And similarly, I am in any number of states with all the rest of the humans on planet Earth, all in any number of places, but they're all on planet Earth. And then planet Earth is moving at a slower trajectory. So what is the math kind of, or can you guess, or maybe foreshadow how the math compresses or unfolds in this kind of hierarchical situation? So Thomas has to answer that one. And he has to use the word re-normalization group. I thought you might say that. I was going to try and do it without, but now that you've said that. Just to start with, you mentioned this problem to start with in the context of the Fokker-Planck equation and the path integral formulations and looking at that sort of distinction. Which I think is interesting when thinking about fast and slow, but it is to some extent orthogonal. I think it's interesting when thinking about fast and slow in that you might expect that the Fokker-Planck equation describing the rate of change or the evolution of some probability density is going to be a slower evolution than the evolution of individual particles that might be described by the stochastic differential equation or long van equation of that system. There may be very fast fluctuations, but really what we're dealing with when we're dealing with the Fokker-Planck equation is some summary statistics, some slower change over time that might be reflective of a mean or a variance. That does link back then into the idea of a re-normalization group, which is a way of describing how things at one temporal scale can be summarized in terms of some slower changes and some faster changes. Typically, the way you might do that in a dynamical system setting would be to look at the Jacobian, which is a linearized version of the flow equation, and look at the eigenvalues and eigenvectors of the flow. And the eigenvalues tell you how quickly the system is either expanding or contracting or how quickly it's decaying in one direction or in unstable systems going off towards infinity. That means that if you take the set of eigenvalues, they will tell you a series of different directions that the system is changing in and the speed with which they're changing. That means that you can choose at some point at which or some threshold above which the eigenvalues are considered slow and below which they're considered fast. And I've got that the right way around. I have as long as they're negative eigenvalues. Those eigenvalues are each associated with an eigenvector, which is just a way of expressing some linear combination of the components of that system or you could think of it as a direction and some high-dimensional space that describes the position of that system at any one time. By separating things out or separating out the eigenvectors and the particular directions based upon whether their eigenvalues are large or small, you can separate the system out into its components that move very slowly and decay very rapidly. Oh, sorry. They move very slowly and those that decay very rapidly. And that's effectively the principle that the renormalization group works on when applied to these kinds of systems because it then says that if we have some way of describing all the things in that system, we separate them out into the fast and slow components. We can then say, well, let's just look at the fast components, find some new way of grouping them and then look at the fast components of that or the slow components of that and continue separating it out hierarchically to multiple different scales. And that can be quite interesting in a range of settings to think about the emergent behavior you might get at higher and higher levels when the dynamics at the lower level might be a very simple description. And I think that's probably what you were getting at when you were talking about the sort of very fast movement of atoms in yourselves versus the slightly slower movement of yourself and the even slower movement of things that are much larger and evolve over much larger time scale. You could read each move from one time scale to the slower time scale as an application of a renormalization group operator. Is that enough? From your perspective, Carl, have I said the words renormalization group enough? Yeah, yes, that's excellent, thank you. Interesting, I just wrote down, Thomas, when you said thinking about fast and slow and that plays into this metabasian cognitive systems on cognitive systems angle, any entity might be thinking fast and slow, famous work, but this is thinking about the fast and the slow maybe of a cognitive entity and maybe that's multi-scale. So I'm gonna bring another question into chat and then we'll be continuing on. What type of boundary does a Markov blanket define for a system functional in terms of ports or constructive and modular with interfaces or might both viewpoints be considered depending on modeling reasons? I think for me, you may have to unpack what some of those terms mean a little bit more when you're talking about ports and whether they're constructive, what does that mean? It was a question from the chat, so I'll try to unpack and see what you think. Perhaps even a general question would be, what is the physical interpretation of these Markov blankets that are being identified? Are they causal or functional connections that occur across an interface like a USB port or are they the interfaces themselves or in what viewpoints exist to interpret the Markov blanket in terms of the modeling of some defined system of interest? It's an interesting question at quite a few levels and I think you could probably unpack each of those terms in several different ways. And I guess the first thing to do is think about how we define a Markov blanket and what it actually means because we can sit around interpreting it in various ways of different sorts of systems. But actually the key thing is that it's a probabilistic relationship. It's a relationship between normally three distinct sets of variables where we're saying that two of those types of variable are separated by the Markov blanket probabilistically. And all that means is that the two that are being separated are conditionally independent of one another once we know the thing that's doing the separating and the thing that's doing the separating is the Markov blanket. And often that means that there are direct relationships between each of those things in the blanket but no direct relationship across the blanket. But then it's important to think about what sort of relationships we're talking about whether those are dynamical relationships and causal relationships and all sorts of other ways you can describe the interactions between pairs of states. And there's a discussion we've had on and off for the last couple of years really about what is the relationship between dynamics and probabilistic relationships and even which probability distributions are appropriate to use to even define a Markov blanket of this sort. One answer to that is that we can look at the Markov blankets associated with some steady state density but we can also do the same thing with any probability distribution which includes distributions over the paths and look at the independence or dependence between different paths. When formulated in terms of path there's a slightly clearer relationship between the dynamics of the system and the Markov blanket because once we've said that the path of one variable or two sets of variables are independent of another given the trick about the dynamical coupling and implies a certain degree of spot coupling between those in a dynamical setting. And probably once you look at the dynamical setting there's a lot of the physics as you've framed it might come in because you can then start to interpret that in terms of physical influences and things that might cause an acceleration or a changing velocity or a change in position and interpret it in terms of physical forces. There are of course other ways in which you can look at Markov blankets and sometimes they're interpreted in terms of sort of cell membranes and that sort of thing. And I wondered whether that was what was meant when ports were mentioned because you can think of sort of ion channels as being ports that allow interactions between two sets of states. I think when people make those sorts of comparisons you have to be very careful about what exactly you mean which variables reflect different parts of that blanket. So for instance, you might talk about intracellular, extracellular ion concentrations being independent of one another conditioned upon the state of some cellular protein. So I think again, it's an interesting question and there are lots of different ways you could unpack it and you could think about a range of different systems and try and interpret what the Markov blanket is there and what the physical interpretation of that is if it is a physical system. Ultimately, I think probably the physics of it is going to be articulated most clearly in terms of the dynamical equations which sometimes bear a direct relationship to the Markov blanket but it depends where it's defined or in what domain it's defined. Awesome. There was two complimentary ways that this question one it's like this is where the two paths first branched and they're complimentary but could you describe where equation one begins and read it however you see fit and then the question mark at the end and then help us understand how the probability density can be expressed in two complimentary ways. What is equation one showing? What does the question mark leave us with and how do these two sides shown on the left and the right provide two complimentary approaches? I feel like I've done a lot of talking but I'm happy to take this one as well if you'd like me to. Sure. So first line of equation one we're really just describing some dynamical system that has some fluctuations associated with it. So there's some degree of randomness where that randomness may be interpretable as deterministic behavior at a much faster scale that we're effectively ignoring through the application of our renormalization group that we were speaking about earlier. So our x in this equation is the some variable that's evolving through time. The f of x describes how in a deterministic system that the rate of change of that variable would depend upon the variable itself to depend upon its position in space. And the omega term at the end here tells us that there's some additional fluctuations associated with that to something that's changing in a less predictable way at least at the scale and speed of description of interest. And the second line of the equation provides a little more definition to that. So the first equality is relating the current position of the system, the current x to the probability density describing the fluctuations are saying how that's likely to be at any one point. And here we're saying that it's something that is zero mean. So if we were to take an average over time the fluctuation would end up averaging to zero. But then we're also defining a variance that's associated with it and that's this two times gamma parameter which effectively tells us about the amplitude of those fluctuations. The implication symbol then is saying, okay, so if we've found that or if we've defined it in this way so we now know the form of fluctuations that then tells us the form of the rate of change of x. And we know that if the average rate of change of or the average fluctuations are going to be zero then the average rate of change of x is just going to be our f of x. So the same path or the same rate of change the x would have if we were dealing with a deterministic system. However, we then have this additional uncertainty around that rate of change which in there is exactly the same variance as the random fluctuations did. So we then finally have this third line which says, so what is this probability distribution? How do we now account for what the distribution of x would be at some time? And I say at some time to make an additional distinction that isn't necessarily expressed in this slide which is that we can talk about the distribution of x evolving through time from some initial conditions or we can talk about the distribution that it will eventually attain if we wait until we're a long way from our initial conditions which is then the steady state density. So the point at which the Fokker-Planck equation here evaluates to zero. So there are several ways now we can think about the distribution of x and the density associated with it. Equation three then says, well, we can also look at it a different way and think about some distribution or density over alternative trajectories that x could take in alternative paths which is a description of a slightly different sort of quantity but is equivalent in many ways because they're determined by exactly the same things. Does that sort of make sense? I know said quite a lot there. Awesome. Have I got myself out of answering the next question? Yes, probably one rain pass. Maybe we can even just continue walking through and understanding the formalisms. It's how we structure the dot zeros is where the formalisms and the figures in order and so we follow along and try to connect key points in between. So you wrote that both the Fokker-Planck and the path integral formulations inherit their functional form from assumptions about the statistics of random fluctuations in one, which we just described. Then you describe the path of least action and the most likely path. So what is happening in four? Where is the fluctuation being reduced to zero and what does it mean to take the most likely path or the path of least action? Blue, do you want to do this or do you want me to do it? Definitely, by all means. Carl, why don't you go? Okay, so just to backtrack a little bit and just to reiterate some of the key points that Thomas just made. Just notice that the everything basically starts with this long-run equation or stochastic differential equation and then you can go in one of two ways. You can go via the density dynamics description of the probability density distribution, which is a description which is deterministic or you can go via the path integral formulation and you can map gracefully between one and the other. They're complementary. You can derive one from the other. A little bit like the difference between dealing with the equations of motion or dealing with the convolution operators that we're just talking about. They're equivalent in many senses. However, the way that they are used in the free energy principle are very distinct. So I just wanted to rehearse that because that speaks to the particular question here. Where did the random fluctuations go? Before I do that though, just to note that all that the first equality in equation one is telling you is that there is some probabilistic relationship between x and x dot. That's all it's saying. It says saying that there is some coupling and normally in terms of a causal coupling between a function of x and x dot. So if you just think about our previous conversation in terms of generalized coordinates of motion and everything coming along in pairs like P's and Q's in classical physics or real and imaginary in quantum physics, this is why there's always a pair of things, at least two things to worry about because all that we're saying here is that there's some relationship between where you are and how you're moving in state space and it's a probabilistic relationship. So if you think about what that means in terms of modeling things, this is a state space model. So it tells you nothing about which is a most likely state to be in. It just tells you how likely, where am I gonna go from here? That's all it's telling you. That's all we know. So you can start to think now about the probability distributions or where I will be in the future, or you can think about probability distributions over the trajectories or the paths that you take and that's basically what Thomas was highlighting previously. In the free energy principle, the Fokker Planck is actually not used very much. So I'm just saying for Blue's benefit, she was worried about the fast and slow and how does that fit into the Fokker Planck? In fact, most of the interesting stuff comes from the path integral formulation. The Fokker Planck does bring a lot to the table, but only when you look at the solutions for the Fokker Planck equation that you would get under the assumption that it has a solution and you get into things like the Helmholtz decomposition and solenoidal flow or that interesting stuff and the link between the steady state density over states and the dynamical coupling that underwrites or defines the original launch van equation. If you can ignore all that if you wanted to and just talk about the probability distributions over paths. So no longer am I defining me, my homeostasis in terms of the states that I occupy. That's a side product of the trajectories and the dynamics that I typically evince when I'm in this situation or at this point in state space. So I think the more powerful and certainly the basis of the free energy principle, the more powerful calculus is actually in the pathological formulation. So you can tell a free energy principle story without even mentioning the Fokker Planck equation if you wanted to. And really that story is a story of least action or Hamilton's principle of least action. So what does that mean? It just means that one way of summarizing the behavior of things or the characteristic behavior of things is in terms of the most likely way forward from here, the most likely trajectory or path from this point in state space. What is that? It is the path of least action. What is that? It's nothing more than the most likely path. What is the most likely path? Well, it's just given by the launch van equation, the flow operator at the F of X iterated in time in the absence or under the most likely random fluctuations. So what are the most likely random fluctuations? Well, we've just said that the most likely random fluctuation is zero. So all that we're doing is taking the omega, the random fluctuations out of the equations of motion. And that just is the path of least action which is just the most likely path. And if you apply the calculus of variations to that notion, you get these expressions that are in equation four here. So if you interpret the action A as the probability or the suprisal or the negative log probability of a particular path, then if you want to minimize the action, you're just trying to identify the path that is most likely. That is just the evolution of the system without any random fluctuations. So the first expression which follows a convention pursued in this paper and other papers that the paths of least action, namely the most likely paths are in bold phase. All this says is that the most likely path in both phase simply is that which renders that path the most likely for this system, which is another way of saying that it minimizes the action. What does that mean? It means that if you deviate from this path in any way known as a variation, hence variational calculus, if you vary or deviate from this path, then you are going to increase the action. So you can think of a marble rolling down a long valley. And the course of the marble rolling down a long valley will follow a trajectory that is always at the minimum of the height of the valley. And therefore any deviation from that path will cause you to climb the sides of the valley. And therefore the path integral or the average action, perfect. The average action will be slightly bigger than the path of least action. So that's basically what is meant by the second implication which is that any variation of the path about the path of least action will cause an increase in the action. And another way of stating that is that the path of least action at the point of the minima, infinitesimal changes don't change the action because you're at the flat bottom or microscopically the valley has bottomed out. And therefore your variation is zero. And another way of expressing that is where we started which is that the path of least action follows the trajectory in the absence of any random fluctuations which is just the final equality down there. So it's just the original launch van equation but without random fluctuations or with the most likely random fluctuations. So I brought on as you saw this classical Waddington epigenetic landscape. Here's the marble rolling down a hill coming towards us. Might we interpret decisions as being bifurcations or if we're talking about paths of least action in space or paths of least action on cognitive parameters where do decisions come into play? That's definitely for blue, that one. Yep, I knew it. So can we even make decisions? I'm gonna just answer your question with a question because I mean, if up until this point we are just a sum of all the things that have occurred up into this point. We now have a model that's based on all previous action and when you come to this bifurcating path you're just gonna take the one that minimizes free energy, right? Like isn't that how it goes? So according to your model and all the things that have brought an organism to that point, that's my answer. I'll stick to it. Right, I want Thomas to answer now but he must use the words free will. I'm not going there. I don't philosophically qualify enough to use those words. I skipped right over it, but set it more or less. Anyone can address how is free will related to free energy or how decision making plays into a path of least action? Well, while Thomas is gathering his thoughts to slip free will in another philosophical sense into his response, I think that's absolutely right. So the notion of a bifurcation I think is very prescient when you think about paths of least action on any energy landscape in particular free energy landscape. But at the same time, if we just pursue the path of least action, there are no decisions or it looks as if there are no decisions from a mathematical point of view but in virtue of the fact that there is an indeterminacy about the paths you would pursue from the initial where you start say the marble at the beginning of the top of the hill here. That indeterminacy I think does call into question whether technically the chaotic structure that underwrites the itinerancy of the thing that we are trying to model will actually induce a sensitivity to initial conditions which means that we can never totally predict how we will behave or the decisions we will make given our state of mind and the state of the world at the present time. So I think there is a latitude to talk about free will but it would be of the kind that inherits from deterministic chaos and sensitivity to initial conditions. And that's quite important of framing it in the context of bifurcations in deterministic dynamical systems theory is quite an important move because of course the Langevin equation does not admit deterministic chaos because you've got random fluctuations. So you have stochastic chaos or you could have stochastic chaos but you can't have deterministic chaos. So the question now is where does, is there any opportunity for deterministic chaos in the context of the free energy description of sentient behavior and choice-making? And I think there is but only in a very special class of systems where the internal, the particular states or the internal states and the blanket states don't have any random fluctuations. And you may be asking, well, there must always have random fluctuations while not necessarily at a particular scale. If you take 100 million neurons or say even a million neurons and you now write down your state as the average of a population or ensemble of neurons, all the random fluctuations and neural noise will be averaged away. And this is exactly the same mechanics that Thomas was talking about before in terms of the renormalization group and taking certain mixtures carefully to eliminate fast fluctuations and random fluctuations. So the argument would be here if you want to talk about bifurcations in the sense of transcritical bifurcations or hot bifurcations as a way of describing a particular path of least action or paths of least action from slightly different initial conditions, then I think you'd have to, you'd be perfectly licensed to do that but you'd have to be very careful in saying that we are assuming here that the dynamics are deterministic and therefore only apply at a particular scale and they apply at a scale where the particular dynamics, the internal and blanket spades don't possess any random fluctuations of that matter. And that's quite interesting because there are other things that inherit from that assumption. Again, coming back to this notion of different kinds of particles, some very, very small particles like the single cells that Blue was talking about before, maybe so small and in number and in size that you cannot average away the random fluctuations and they would not show any deterministic chaos but there may be a coarser grain description of neuronal dynamics and computational anatomy that does actually acquire a deterministic aspect, a classical aspect, where you have an n-body-like problem which will invariably show some kind of deterministic chaos which means that then you can get back to one form of looking at something that might look like free will. Anyway, now Thomas has had plenty of time to tell us about the dynamics of free will. So I noticed that the way you framed that was in terms of unpredictability that under the definition of free will you were using there, it's things where we wouldn't necessarily be able to predict in advance how well or how a particular system would evolve. And I'm sure that will please some philosophers to upset many others. I don't have an alternative definition that would work better. Please everybody. So I suppose we've spoken really about one side of things which is the deterministic aspect and clearly that does become very relevant at larger and slower scales. And I suppose it's also interesting to think about unpredictability when stochastic dynamics become much more important. So we certainly could think about systems where there are multiple plausible paths and where or even a continuum of plausible paths and the particular path they end up taking will depend upon the particular values that the fluctuations take during the course of that path. Arguably if the fluctuations are themselves just dynamical systems that are deterministic but occurring at a much faster scale, it's still not really unpredictability of the stochastic sort because ultimately it's described by deterministic dynamical system which again brings us back to this renormalization group idea. And it depends a little bit on your interpretation of what happens at the slowest or so the fastest possible scales when we perform this renormalization operation is the fastest scale something that is genuinely random or is it something that is a sort of recursive recursively faster deterministic dynamics. Either way it may be that if stochastic descriptions are most appropriate at some scale of description we can develop a form of unpredictability both from things like deterministic chaos and uncertainty of initial conditions but also potentially from the random fluctuations that happen along a particular trajectory in systems that have more randomness involved. Clearly when we're trying to describe a real system or a real brain and the decisions people make we clearly don't have access to every part of that system and what every neuron in that brain is doing. So to some extent we're always dealing with a partially stochastic system and that feeds into how we might end up modeling behavior in those kinds of systems as well. So if we were trying to fit an active and style model to real behavior or even simulated behavior often what we end up doing is adding in a turn to account for some uncertainty in the action selection or uncertainty that we as experimenters have in the action someone might choose even if we think we know what the expected free energy they're gonna calculate is and how they're going to evaluate the alternative parts they might choose. So that's just an acknowledgement that if we don't know everything or if we don't have a complete description of the system then we have to to some extent allow for some stochastic parts of that course that stochastic element may be purely the uncertainty over the initial conditions in a system that can exhibit chaotic behavior and that bifurcates in various different ways. Awesome. We'll continue on as we approach the end of the dot one and introduce something we had talked about in terms of the Markov blanket previously but it's the particular partition. So how have other mechanics addressed particles and how does this particle framework do something different and what does that different framing enable? It's blue's turn or I could give a shot. Why don't you go ahead Daniel? Okay. As a non-businessist, my understanding is that particles have not been explicitly modeled this way and fields or continua have been the norm. The particulate description provided here enables a principled approach to separating bigger from ground, from separating an entity that's being tracked that's doing the movement away from the entity that is not doing the movement in a sense that would just be one thought. Which I would concur with very much. If you go back and think about the physics you did at school, usually it will be focused on the behavior of some idealized gas in what they would call a heat bath or a heat reservoir. And really what the particular petition brings to the table are deep questions about how did that heat bath get there and the heat bath now stands in for blanket states and what's beyond the heat bath, the external states. So it is a move which I am not aware of being attempted in any other branch of physics. So in quantum physics, the Markov blanket or the thing that contains the particle of interest, probabilistically speaking, is usually defined in terms of a Skroniker potential and the functional form of that potential leads to all the applications of quantum electrodynamics that you might encounter. And I repeat it in thermodynamics, people just talk about there being some blanket states in terms of a heat bath, but they don't think about what's beyond the heat bath. In terms of classical mechanics, perhaps there is a notion of particles there which is less if you like hidden under the rod. The rug, but of course you don't have any explicit representation of the internal dynamics because normally classical mechanics is just dealing with the motion of heavenly bodies or large massive objects without really thinking about the faster dynamics that underwrite autonomous movement or animate movement or what people in chemistry might refer to as active matter. Just a few thoughts on that. The classical indeed it does deal with diffusing particles, massive bodies, but just like you pointed out, it doesn't consider internal states. So it's like a view from the outside and this is a bit more like a view from both sides which enables us to have sentient or cognitive behavior on both sides of the observation interface rather than implicitly on only one side which is a very different situation. And then you talked about in this thermo physics context and in this quantum physics context like what is beyond the boundary? And then in the quantum case you said that that was like the role of the heat bath and what's outside of it is the Schrodinger potential and what's outside of it. And that makes me think about Schrodinger's implicit question and Schrodinger's explicit question. Schrodinger's explicit question in 1944 and in Rammstead et al. 2018 was what is life? Schrodinger's implicit question was what is beyond the quantum? What is outside of that? And so it's almost interesting though appropriate that the particular partition actually brings those two together but I would let anyone else reflect to how they're brought together. If anyone wants to add or we'll just continue to leave a few little pieces before closing the dot one. I was hoping that Thomas and Blue, I think that's an excellent observation and I just wanted to reinforce it. So what is life? It's basically coming back to our early question. It's those systems that are equipped with a homeostasis of sorts but beyond that these things move there are homeostatic systems that are self-constructing and self-maintaining that move. And this I think speaks to the importance of having the active states that couple back and influence the external states beyond the Markov blankets. Again, speaking to another important natural kind, the distinction between inert particles and active particles where you've got this active autopoetic like aspect to the maintenance of the homeostasis. So I think that's quite an important observation that we're not talking about, we are talking about a bi-directional exchange across the Markov blanket. Just to reiterate your sort of observation that what we're doing here is trying to account for the coupling between the inside and the outside of something. We're not just interested in the inside as in classical physics or the outside or what it looks like from the outside in terms of, sorry, thermo-ethic mechanics or what it looks like from the outside in terms of classical mechanics. We're actually trying to get to the meat of the maintenance of a Markov blanket and thickness in an active context. So then as a closing round here I wanna point to one tension that we pulled in the dot zero. So here in footnote 21 it says a lot of interesting things but then it says minimization implies a teleology that goes beyond any claim of the FEP. Do particles minimize surprise or free energy? Well, minimization implies a teleology that goes beyond any claim of the FEP. Okay, very well. Later, the question is asked, is it tenable to interpret gradient flows on variational free energy as variational inference or is this just teleological window dressing? So where is teleology? Is teleology in the FEP? Or is it window dressing? What's the window? What's the building? What's outside the building? Is minimization teleological? And is that in the FEP's claims or what other claims are coming to the table if it's not part of FEP in the strict sense? This is Thomas' favorite question. I didn't know. My favorite question is that in that I've asked it before. Several times. And what's the lyrical about it? It is a tricky question in its ways. I mean, to some extent, the answer to the big, bold question is both. I think it's also worth asking the question, is it useful? So the reason we've spoken about it quite a lot in the past and one of the reasons it's come up is that in some of the recent formulations we've dealt with in some of the papers that we've been discussing is that the free energy or the gradient flows have been on some normally steady state density that can be interpretable as a generative model of sorts or as a marginal likelihood. Now, the challenge is that a marginal likelihood in itself is not necessary or is it, I suppose, a special case of a free energy when the variational density that we were speaking about earlier, this tractable part of the optimization problem, is exactly equal to a true posterior density once we've performed Bayesian inference. So the question we've often asked is whether that kale divergence, whether it's divergence between our approximate distribution, our variational distribution and our true exact Bayesian solution. Whether if that is defined stipulatively as being zero, whether there is a meaningful role of free energy minimization as being interpretable as inference. And there are several possible answers to that. One is that we're really dealing with when we're engaging in computational neuroscience, we're dealing with a tractable approximation to how systems behave. So it's useful from the perspective of a computational neuroscientist to be able to write it down into the free energy functional but not necessarily treat that as a literal description of what brains and sentient creatures are doing in terms of the variational free energy aspect that we could still interpret the margin of likelihood as before or interpret gradient flows in the margin of likelihood as before. Another answer to it is that part of what makes inference work particularly in a dynamical setting is not going to the bottom of a free energy minimum but it's finding a way of staying there. And so that depends upon the ability to predict and the ability to represent some dynamics of how a system is evolving over time. So it's a question of staying at the bottom and the model staying at the bottom of the valley and not deviating up the sides. And we could interpret that as a form of inference as well. And here we then deal with flows that are not necessarily up or down free energy gradients or marginal likelihood gradients but are maintaining the same level or staying at the bottom of the valley. And that's often what we refer to in terms of the solenoidal flows. And this links back to a lot of the discussion we're having about deterministic versus sort of chaotic dynamics, I suppose deterministic chaos where systems can behave in a range of different ways that is very difficult to capture if you're just going up and down probability gradients. So it talks about how we go around probability gradients, how we circle around the hill or along the valley and the potentially chaotic way that we do that. And that results in a form of prediction and results in a way of staying at the bottom of the valley and not leaving it. So we could interpret that as a form of inference as well. I think they're probably asked also an open question as to whether under certain circumstances it looks as if we are also minimizing this kale divergence part, this divergence between the approximate and exact basin posteriors. And I think that's still open. I think plausibly there will be ways of writing things down that look as if they're doing that. It may be that it's simply a useful approximation to how we tend to, the useful approximation that we can tractably write down and simulate and perform computational neuroscience with. Third possibility is that if the brain is realized or evolved to realize that things in the world behave in the way we've been describing, it could plausibly be a mechanism or variational inference type approaches could be a mechanism that the brain has evolved to use to model its environment as well, which would bring back this idea of an explicit variational for energy. So I think there are several different possibilities. I think there's still probably ongoing discussion about a lot of this. For me, the idea of inference and variational inference has to be one where the KL divergence is decreasing over time, where it's not something that we necessarily keep static, which might mean that variational inference is not something that's necessarily happening in the systems we've been describing so far, but whether it could be something that could be interpreted in terms of the flow of internal states toward their most likely value and other aspects of that, or whether it's simply a good approximation. I'm sure Carl will have other things to say on this because I've not given a definitive or a clear answer one way or the other here. For me, variational inference occurs when the KL divergence is shrinking. However, you could also argue that marginal likelihood and the optimization of marginal likelihood is its health and form of Bayesian inference and more akin to sort of model selection. Certainly it feeds into the idea of active inference in terms of acting upon the world to maximize the fit between model of the world and that that element is certainly there. Excellent. Carl and then Blue will each have a last thought and next week at the same time will continue with 45.2. But for now, Carl and then Blue are last thoughts on what Thomas just added or anything else. No, I think he's absolutely right. I mean, I think it's more interesting to ask the question, isn't it? I think it's largely from the point of view of the way the world works. I think it's the FEP interpretation is largely teleological window dressing. But I think it's incredibly useful to have that cheap and cheerful and doable kind of window dressing when you actually want to build artifacts that behave as creatures who are auto poetic and have a homeostasis and have make the right kinds of decisions. Awesome. First, Thomas or Carl, any last thoughts and then Blue and then all close. Just to say thank you for organizing this and inviting us to be part of it. Blue? Yeah, it was great to have you guys here and I definitely had some questions that I've been wondering about for a long time resolved. So that was really great. Yes, this was an excellent discussion and we sometimes think about the dot zero as almost like preparing the marble. I don't know if it's blowing it out of glass or just bringing it to the edge of the bathtub and the dot zero, we just walk through in the order as it was laid out and try to follow the traces and understand some of the causal dynamics at play behind the writing. And then in the dot one, we're in kind of like this plane. And so there's little micro variations, some local sense to be made, but I really appreciated how we opened up a lot of threads and also covered a lot of the key aspects of the paper. And then we'll be looking forward to the dot two when we get to climb or descend or circumambulate and continue the discussion. So thanks again to Carl and Thomas and all the other authors for this great paper and thank you, Blue and Brock for the dot zero. See you next week. Bye. Bye.