 Hello, everyone. It is Acton Flab Livestream number 45.0. It's May 27th, 2022, and we're discussing the paper, The Free Energy Principle Made Simpler, but not too simple. Welcome to the Acton Flab. We're a participatory online lab that is communicating, learning, and practicing applied active inference. You can find us at the links on this slide. This is a recorded and an archived livestream, so please provide us with feedback so we can improve our work. All backgrounds and perspectives are welcome, and we'll be following video etiquette for livestreams. If you want to learn more about the livestreams or any of the other projects to get involved at Acton Flab, head over to activeinference.org. We're in stream number 45.0. Our goal is to learn and discuss this very interesting paper, The Free Energy Principle Made Simpler, but not too simple, by Carl Firsten, Lancelot D'Costa, Norsagin, Connor Hines, Kai Ultover, Gregorius Pavliotis, and Thomas Parr. And just like with all the .0 videos and indeed all the videos, this is an introduction and an overview to a quite technical and lengthy-ish paper. It's not a review or a final word. This is like the opening context for some of the coming discussions we're going to have in the following weeks and beyond. And we're going to first just say hello, introduce the big question, the aims, claims, abstract, and roadmap. Then we're going to give an overview of some of the keywords that are in the paper. Then we're going to go through the sections of the paper with a focus on some of the key points, the formalisms, and the figures, especially. So it should be a great discussion. And let's get into it. We'll start with just an introduction and saying hello. So I'm Daniel. I'm a researcher in California. And I'll pass to Brock. Brock, thanks for joining and for all the contributions in this .0. Yeah, it's exciting to be here and participate in the Acton Flat. And yeah, just really drawn to this topic in this paper is a great starting point for that. It's really got a lot of detail to dig into and a lot to learn. So yeah, okay. So one of the big questions or one way to state the big question was what are the foundations of the free energy principle and what does it contribute? In the paper, they write that they start from a description of the world in terms of a random dynamical system, systems changing through time, and end up with a description of self organization as sentient, sensing, active behavior, and that's active inference. So that's the question that we're wondering about. We're all wondering about what is the basis and the essence and the implications of the free energy principle. What would you say about that? Or what were some big questions that you had coming into and out of this paper? I think my biggest question around the free energy principle is kind of how general is it? Where does it end? Because it seems so, they say in the paper also, it's pretty simple. It's kind of hodological in some sense. So yeah, that's one of my questions. Where does it begin? Where does it end? Like how general is it? The other question is, yeah, how does it emerge or what does it look like at different scales? Awesome. So we'll be returning to questions again and again. Let's check out the aims and the claims of the paper. So again, it's the free energy principle made simpler, but not too simple. And the authors are listed here. The paper describes that it's trying to present the free energy principle as simply as possible, but without sacrificing too much technical detail. And that's sort of a pun slash self-reference being that's what modeling in that Pareto optimal or Bayes optimal way is. And that's going to come back as a theme again and again, giving information but not overfitting nor underfitting. And then several claims that they make are that they're going to step through the formal arguments that lead from the description of a world as random dynamical system to the description of self-organization in terms of active inference and self-evidencing. They're going to discuss Bayesian mechanics and how those Bayesian mechanics have the same starting point as quantum statistical and classical mechanics. And then they're going to differentiate this Bayesian mechanics for particular systems from some of these previously mentioned mechanics in that careful attention is paid to the way that the internal states of something couple to its external states. Some of the aims and claims, although many more will be introduced and are discussed in the paper. Any thoughts on that? Or if you'd like to, oh yeah, go ahead. Yeah, just, well, let me, that essentially that it's, you know, it's a self-evidencing, self-describing system. Again, just very, it seems it seems to fit with a lot of different domains and a lot of, like, just observed experience. So. Cool. Would you like to read the abstract? Sure. This paper provides a concise description of the free energy principle starting from a formulation of random dynamical systems in terms of a Laguvin equation and ending with a Bayesian mechanics that can be read as a physics of a synthiance. It rehearses the key steps using standard results from statistical physics. The steps entail by establishing a particular partition of states based on conditional independencies inherit from sparsely coupled dynamics to unpacking the implications of this partition in terms of Bayesian inference and 3D describing the paths of particular states with a variational principle of least action. Teoleologically, the free energy principle offers a normative account of self-organization in terms of optimal Bayesian design and decision-making in the sense of maximizing marginal likelihood or Bayesian model evidence. In summary, starting from a description of the world in terms of random dynamical systems, we end up with a description of self-organization as sentient behavior that can be interpreted as self-evidencing, namely self-assembly, out of places, or active inference. Awesome. Okay. Let's go to the roadmap. So this paper has in layout and it'll be awesome to hear from the authors about why they've laid it out as well as numerous other details. So we're looking forward to those discussions. In section one is an introduction and they wrote at the end of the introduction, the remaining sections describe the free energy principle. Each section that's two through eight focuses on an equation or sections. The ensuing narrative is meant to be concise, taking us from the beginning to the end as succinctly as possible. To avoid disrupting the narrative, we use footnotes to address questions that are commonly asked at each step. We also use figure legends, captions to supplement the narrative with examples from neurobiology. So there's a main narrative thread that's going to touch on sections two through eight, each of these being like a single or a cluster of formalisms. And then especially in the footnotes, which are formulated as questions, and in the figure captions, which are sometimes quite lengthy and include examples and citations and so on, more of the biological details and inspiration are added. But the main line narrative is going to be touching on these formal areas. Systems, states and fluctuates, steady states and non-equilibrium three, particles, partitions and things in four, self-organization to self-evidencing in five, Lagrangian generalized states in Bayesian filtering in six, statistical to classical particles in seven, path integrals planning and curious particles in eight, and section nine will be a conclusion. So we're going to next discuss the keywords at an overview level and just describe like some of the broader topics that might lead someone to find this paper, be curious about it or want to cite it in one of those domains. And then we're going to just jump right into the introduction. Each of the nine sections will be clearly indicated which section we're in and we'll be following the order of the paper, bringing in a few other resources as we saw fit in the preparation. And we're just going to focus on some of the dots connecting a few and leaving many, many for the dot one and the dot two discussions because it's again a paper with a lot of formalisms and a lot of rich topics. So we're just going to give this overview and try to front load the contextualization and then continue to unpack and develop this and work on our shared understanding in the coming weeks. The keywords that were listed were Bayesian, non-equilibrium, self-organization, variational inference, and Markov blanket. So Brock, tell us about Bayesian. Bayesian analysis. So it's basically a, you know, it's an analysis method for determining the posterior probability which is basically your inactive inference and it's going to be something like your beliefs after having evidence added to your prior beliefs which are something approximately like a hypothesis. So you have the equation here and also this word explanation that's essentially what I just said. So you're adding, you know, observation to hypothesis and changing, trying to reduce the, trying to approximate the, this PHV here over E, the posterior probability. So it's pretty foundational to, yeah, a lot of statistics and physics and other things. Thank you. Yes, totally agreed. Bayesian statistics, it's a big topic and just like you said it's about how prior beliefs meet incoming evidence and how that is updated to form posterior, you know, after the evidence beliefs versus before the evidence and that can be done in this continuous cycle and a lot of the variables that we're going to be seeing and talking about have interpretations of being one of these. For example, it might be a variable that represents a systems prior about something or something might come in and that might be evidence. So we're going to be working within a Bayesian statistical framework rather than, for example, a frequentist p-value driven statistical framework, but we'll come to these details along the way. How about non-equilibrium? This is, I think, a much fuzzier topic. Historically, science studied equilibrium state systems and non-equilibrium state systems are more relatively newer things. So non-equilibrium is, these are things that were mentioned in the papers, like flows between transitions of between like finite sets of states having an attractor in a system that over time, you know, perturbations in the system still act as a set of characteristic states to which the system is attracted over time. So non-equilibrium being like a higher energy level state that is to its local environment. But nonetheless, distinct, definite, separate, and over time, continuous. So just looking at the animation you put here, it's like the brick at some time scale is out of equilibrium. Like the brick will also dissolve. But for some model of a certain timeframe hours, it could be seen as being spatially or just materially in an equilibrium. And then looking at these two moving entities, it's kind of like two kinds of non-equilibrium. There's like the low bar non-equilibrium, this mechanical bird on the right side, the water dipper. And so it is returning to these characteristics states. It's not just staying in one location, it's returning to drink the water again and again. So it's not in spatial equilibrium. But also there's like this stationarity or there's something about it that is constrained. And then the ostensibly living bird, I guess, on the cartoon on the left, is like non-equilibrium of an even higher or more complex type, because that bird might like fly away or do something else. So there's like a gradient from things that are like the ball at the bottom of the bowl in equilibrium and then progressively mechanical systems that are returning to attracting states. And then what does it mean for biological systems to be or to be modeled as returning to attracting states like sleep or a certain body temperature or certain blood glucose? Any other thoughts on that or we can carry on? Yeah, just yeah, this is a different like time scale set of states, larger, smaller, and navigating, traversing those states, yeah, internal, right. And so I guess it just depends where you're drawing the time scale. Awesome. But they can be modeled differently like that, right? How about self-organization? I think this one like maybe it sounds like just as complex, but I think this is like really familiar for most people. They have a good idea of what this is. And these examples here of non-equilibrium systems that have some symmetry, asymmetry of constraints that over time causes them to take a particular path through that, you know, set of states towards one attractor or another. So your meiosis and mitosis here, black hole mergers, nuclear fission, nuclear fusion, very large, very small, mezzo in between scales here. In the paper referencing FEP, they said FEP was about kind of turning on its head this traditional what must things do in order to exist framing of systems and physics to if it exists and what must they do. And so this is saying, you know, we have now a Bayesian evidence of black hole mergers. What must it be doing for that data now, that new evidence to exist? Like how will, how does nuclear fusion, you know, how does that turn into two atoms turning into one atom? What must it be doing instead of the other way around? Awesome. Yeah, big topic, many angles on it, but sometimes those inversions, like Axial Constance paper that we discussed in live stream 34 with, it's about how you got there. So conditioned on something, which we're going to return to existing, being perceived, being modeled, what must that thing be doing in order to exist? Not what has to happen for the thing to exist, but we're going to come back to that. And then the last two, we'll just give a first coat of paint, because we're going to see them in their technical glory soon. So what is variational inference about? I'm actually less clear on this, but related to Bayesian inference, it's about approximating that posterior probability, the after things happen, like, but what, you know, to what resolution can you kind of, you know, what precision can you state about that, and also trying to have a lower bound for the marginal likelihood of evidence of the observed data. So how likely are you to see the data that would cause that to be, that would lead to the posterior probability being what it is? That was my understanding of this. Yeah, awesome. What is the divergence? It'll be awesome to hear the authors and others perspectives, but places where we've seen variational Bayesian inference before. First off, it's not an innovation of active inference. Variational auto encoders and variational Bayesian methods are being used in active inference with the partial novelty of action being considered as a parameter rather than just sensory parameters, but it's used in situations where exact base is intractable and sampling based approaches like Monte Carlo are also not tractable or not plausible. And so variational inference is about minimizing the divergence between Q and P. Mind your Q's and P's as they say up there and making sure that Q is from a family of distributions that is tractable and easy to optimize, even if P is something that is challenging, instead, a different distribution, for example, one that has a shape looking more like a parabola, more like a bowl can be fit with Q. And then there's still some detail like, if P truly were bimodal, would you want the one that identifies and rests Q on top of one hump and ignores the other hump? Or would you want the Q that is kind of centered at a place where neither of the P humps are, but contains some density in both? So those are some detail questions. But in general, it's about minimizing the divergence between Q and P as a means of approximating what exact base would be doing in the asymptote. If the divergence were zero, you would be having the perfect approximation of Q on P. And then anywhere other than the divergence being zero, there's a monotonic way for you to get towards that. And then Markov blanket, which we're going to come to in multiple formalisms, but just what comes to mind when you hear Markov blanket and what do you want to show here? Everything. This one is like, I feel like this is the most easy to understand one somehow. Like this picture here, you can see is a bunch of dots that are different colors, right? And you have like a circular concentric circles here. And that's, you know, at this large inside the large square scale, there's this like yellow line that separates all the outside dark blue from the inside orange and red. But there's also an orange inside circle. And these yellow and orange are kind of in a relationship that keeps the dark blue and the red separate, independent of each other. But you also have these smaller scale inside the red, inside the yellow, inside the orange here, right? These other scale of the same thing, like that it is in some sense either recursive or fractal or, you know, locally having this same kind of dynamic of two layers conditioned on each other to keep an external and an internal state independent of each other. Awesome. Yes. Which is everything. I feel like it's everything which we're going to be getting to. Mind the space. And it's an exciting topic. And there's a lot of ongoing discussions and developments. Here's a recent paper by Casper Hesp. And there's many other valuable commentaries on Emperor's New Markov Blankets by Jelle Bernberg, which was previously on a live stream. So there's many, many technical and philosophical and implementation discussions around identifying Markov Blankets, reifying them, and so on. And we're going to get there. So into the paper we go. We're in section one, introduction. They write it is said that the free energy principle is difficult to understand. No citation. Who's saying that? No one is saying that. This is ironic on three counts. First, the FEP is so simple that it is almost tautological. Circular reasoning. Indeed, philosophical accounts compare its explanandum, what it explains, to a desert landscape in the sense of quine, a quinian desert landscape. As per what Clark wrote, and we'll return to that in a second. A second reason for the irony is a tenant of the FEP is that everything must provide an accurate account of things that is as simple as possible, including itself. Finally, the FEP rests on straightforward results from statistical physics. So one is it's not difficult to understand. It's so simple. It's circular. Two is how can it be difficult to understand when everything is as simple as it must be, including the FEP literature? That's some high level irony. And then the third reason is how could it be difficult to understand when it rests on straightforward results from statistical physics? And then I just wanted to highlight this Clark 2013 citation and the desert landscape, because that seems like an oblique reference. But here's what Clark and also here's two representations of desert landscapes. So they're not all bad, in fact. But what Clark wrote was referring to radicalisms that might exist within this framework. In extending the models to include action, action-oriented predictive processing, which is very, very similar to active inference, we might simultaneously do away with the need to appeal to goals and rewards, replacing them with a more austere construct of predictions. So I don't want to do this, or I'm not rewarded by doing that. I'm just making predictions and expectations about what might happen and then fulfilling them or diverging from those fulfillments. And so that is this desert landscape vision. There are neither goals nor reward signals as such. Instead, there are only learned and species-specific, so kind of developmental and evolutionary, expectations across spatial and temporal scales that are in control of perception action loops. And so what are we explaining or explaining away? That's this desert landscape notion. And that's how the authors are beginning the paper with a sort of preemptive, tripartite broadside, dissuading one from thinking that this is going to be difficult to understand before launching into the subsequent eight sections that might give you some evidence to the contrary. But yeah, any thoughts on that? I just thought, yeah, the desert, Laquanian desert landscape very apt description and yeah, physical sort of, yep, it lines up and yeah, like I said, it does seem tautological, like a lot of these concepts, if you think about them, from like the statistical physics, it's like, obviously, but then there's just a lot of area there to be covered, right? Yes. And also note that Clark in 2013 wrote, I remain unconvinced, even if the austere description is possible, this would not justify the claim that this is the better tool for understanding the cognitive economy. So it's, it is intractable. So it's like, yeah, what? Yeah. Yep. Awesome. So still in the introduction, the author is right. Before starting, it might help to clarify what the free energy principle is and why this big question alert. Many theories in the biological sciences are answers to the question. What must things do in order to exist? And some answers, but we could think of probably many others that people have raised. So if you asked somebody, what does something have to do in order to exist? Some answers could be like, well, it has to resist dissipation, it has to replicate, or it has to reproduce or propagate some way. Like there's stuff it has to do to be there. The FEP turns this question on its head and asks, if things exist, what must they do? More formally, if we can define what it means to be something, can we identify the physics or dynamics that that thing must possess? To answer this question, the FEP calls on some mathematical truisms that follow from each other. Then they write, much like Hamilton's principle of least action, the FEP is not a falsifiable theory about the way things behave. It is a description of things that are defined in a particular way. And for more reading on this, here's a 2018 interview with Fortier and myself and with Carl Friston. And here, there's some discussion around notions of falsifiability. Can the FEP be falsified? Or what is the relevance of falsification, which is a frequentist idea, in this Bayesian, meta Bayesian, post Bayesian, whatever epoch or mode that we're in. And just one really interesting quote is from Carl. I would assert that the notion that a framework can have the attribute falsifiable is a category error. The notion of falsifiability is thus a very weak notion because it's actually reflecting, rejecting a null hypothesis in favor of an alternative hypothesis. That has some limitations, including the sort of the transcendental arguments on falsification, which is just that belief is rarely scrutinized using a falsificationist framework. And then dropping from the transcendental arguments into a more history of statistics and science, when we move from frequentism towards Bayesian statistics, where the p value and so on, and the parametric distributions are seen as like special cases, but not the whole substance of the statistics, Carl's suggesting that a better way to frame evidence based selection of hypotheses, which is ostensibly what falsification is after as well, is in terms of how much empirical evidence is accrued by competing hypotheses. And that's quite a different model than can and then falsify the ones that aren't true. Let's shoot down the ones that don't exist. So this is quite an interesting area. What would you say or add to this? There's a word, verisimilitude. I think that is exactly what he's describing here. Again, just how much evidence is accrued rather than, yeah, I mean, that's in practice how it works. So I don't completely understand the notion of falsifiability is, it's really important. But like Dirac had this notion of beauty, or and he's saying, look, if the equation is really beautiful, then maybe you check your experiment first. So like you were saying, it's kind of like that's not falsifiable. But then what do you mean by not falsifiable? And that is that you're not examining that claim, it's kind of there's a, you know, infinite regression there of falsifiability that makes it intractable, which is why taking the opposite approach of just stacking up the evidence is, you know, a way to make the, it's the same thing, you know, but attractable version kind of of it, right? So. Awesome. They're right. Again, picking up on that last quotation there that the FEP is not a falsifiable theory about the way things behave, it's a description of things that are defined in a particular way. Let's talk about utility. Is such a description if the FEP is indeed a description? Is it useful in itself? The answer is probably no. Oh, well, then we can stop reading in the sense that the principle of least action does not tell you how to throw a ball. However, the principle of least action furnishes everything we need to know to simulate the trajectory of a ball in a particular instance. In the same sense, the FEP allows one to simulate and predict the sentient behavior of a particle person, artifact, or agent, i.e. some thing. This allows one to build sentient artifacts or use simulations as observation models of particles or of those other mentioned types of systems. These simulations rest upon specifying a generative model that is apt to describe the behavior of the particle or person at hand. At this point, committing to a specific generative model can be taken as a commitment to a specific and falsifiable theory. Later, we will see examples of these simulations. So it's like the linear regression concept is not even where you would want to focus the brunt of your falsifying effort. But if there are two different linear regression models, we might ask which one is preferable or which one is falsified by doing some subsequent perturbation or experiments. But the linear regression framework is a framework, hence not falsifiable in the way that people often expect, and deploying a specific generative model, analogous to like a specific linear regression model, but different, that is where there's a commitment to a falsifiable theory. I think the amygdala connects to this that way. That could be tested and that evidence could be evaluated within a falsification or Bayesian Bay's factor framework, falsifying the concept of generative models. They would be have crossed that bridge and now could be engaging in falsification and evidence comparison within some constrained space. Anything to add on that as we close the introduction? Yeah, just again, what is a linear model versus a generative model of like, the linear model will yield some approximation that's not necessarily a real possible observation to falsify. The generative model will produce many things that you might not observe possibly, but some that you will and then could potentially be falsified, like it gives specific examples. So, yes. It's almost like it takes this meso scale and it separates the milk from the cream and the FEP lies over falsification. And then once you're committed to a specific generative model, it's actually written down, not just speculated that it could be written down. Then it's like, oh, well, then it's trivial or against it. So, we've kind of separated out the framework, like language as a framework, like Dave just mentioned in the chat. How would you falsify a language? You would not any more than you would falsify a conceptual framework. So, one could not falsify a language like English, but things that are said within English might be amenable to being tested once they're actually said using that framework. Very interesting. Evidence stacked up or not versus alternatives. Just like language or evolution. Awesome. So, recall the roadmap. We're now going to head into the next sections, each of which is going to be building on each other in this concise narrative. However, also semi-standalone in their focus. And we're just going to do basically a first pass and some contextualization because we have a lot of time in the coming weeks to discuss these topics in more detail. So, we're going to go to section two, systems, states, and fluctuations. Okay. So, they write, we start by describing the world with the stochastic differential equation, Pavliades 2014. So, this is a citation to the book stochastic processes and applications of diffusions processes. And just one preliminary note, this is not the same sentence as the world is a stochastic differential equation. We start by describing the world with a stochastic differential equation. So, leave your realism at the door. We're describing models here. They write, why start here? The principle reason is that we want a description that is consistent with physics. Here's some discussion questions. What does it mean to be consistent with physics? What does it mean to be consistent with physics in theory or in principle or in practice? And then they write, this follows because things like the Schrodinger equation in quantum mechanics, area A, fluctuation theorems in statistical mechanics, area B, and Lagrangian formulation of classical mechanics, area C, three different mechanics, quantum, statistical, and classical can all be derived from this starting point. So, we're going back to the last common ancestor of those three different mechanics, and then that is where Bayesian mechanics is going to also be initiated from. In short, if one wants a physics of sentience, this is the place to start. Why start where? Where are we going? Why are we going? Anything you want to add here? No, I think, again, just that, yeah, that all these scale physics, it's odd that you would be able to reproduce, you know, derive all these things from something that's not fundamental or important in some way. Awesome. They write, we are interested in systems that have characteristic states. This means the system has a pullback attractor, the sets of states that the system will come to occupy from any initial state. So, the rubber band is going to pull back to its resting state, if it's allowed to, and formalism, that stochastic differential equation describing the rate of change of the states. So, x are the states, the tau is the time, and the dot notation is the derivative with respect to time. So, how states are changing is equal to their flow, f of x, f for flow, and random fluctuations omega. So, like a signal and a noise term, there's the flow, the ocean current, and then there's the vibration that isn't the directed flow. And this separation of scales is going to come back again and again. So, dot notation, again, means the derivative with respect to time. So, how things are changing with respect to time through time. This yellow means that time and causality are baked into everything that follows in the sense that states cause their motion. This equation is itself an approximation to a simpler mapping from some variables to changes in those variables with time. This follows from a separation into states and random fluctuations implicit in formalism one, where states change slowly in relation to fast fluctuations. So, if this term, the noise term is dominant, then there will not be movements, flow like movement on a manifold. There's more details, but things are changing in different ways. Different kinds of stochastic equations that are going to be presented here and in the book and in the citations, they help us see different parts of how we can use dynamical models to understand physics and specifically the physics of particular cognitive slash sentient systems. Then in the footnote, they ask a question, why is the flow not a function of time? And that's what we're going to return to in dot one. How is time dealt with or not dealt with similarly or differently than other frameworks or other physics? And then just to stay on this formalism one, and then please give any other thoughts. This last line of one, p of x equals question mark. The next step shared by all physics, so those three areas that we described, classical, statistical, quantum, and Bayesian, is to ask whether anything can be said about the probability density over the states, the question mark in one. So, this is the common grounding. This is the last common ancestor of the multiple physics. And now we're going to take it in a different direction. A lot can be said about this probability density, p of x, which can be expressed in two complementary ways. And this is going to introduce a very important distinction and dialectic. Here are the two complementary ways. Left side, blue. Density dynamics using the Falker-Planck equation, aka also known as the forward Kolomogorov equation. Here's formalism two. The Falker-Planck equation describes the change in density due to random fluctuations and the flow of states through state space. This is like a field model. And then the second complementary way in orange on the right side, in terms of the probability of a path through state space using the path integral formulation, and they write for formalism three. Conversely, the path integral formulation considers the probability of a trajectory or path in terms of its action. So what are two things that you can say or more about the probability density? What are two complementary ways that it can be expressed? What are important similarities and differences between this Falker-Planck formalism two approach and the path integral formulation of formalism three? Are there other complementary ways to say it or other ways to say what is here? But this is just to introduce what's going to be very important and going to be moved back and forth in the course of this paper, which is one representation that's more field-like, continuum-like, and then another representation that's going to be more trajectory and path-based. What do you see in there? Yeah, we discussed this really briefly about like a continuous space versus a kind of discrete or instantaneous space. Mostly, I guess, metaphor there, analogy. But when you just ask this question of what are other ways to say this, the action trajectory, the path is like an action density or an energy density. Where is the path of the energy and the whole system going to events itself? So it's like an action density gradient or something like this. Yeah, cool, real interesting. Both the Falker-Planck and the path integral formulations, the formalism two-three dialectic, inherit their functional form from assumptions about the statistics of random fluctuations in one. So recall one, with the flow and the random fluctuation term. For example, the most likely path or path of least action is the path taken when the fluctuations take their most likely value of zero. So we're going to explore this more, but the path of least action is not using action in the same way that active states are. Exactly. Path of least action does not mean the laziest or the least energetically costly or the least mobile thing to do. It's not that. It is something that is what we're going to explore. What does it mean to minimize action in this framework? The motion on the path of least action is just the flow without random fluctuations. So if the question were you're trying to run in a straight line forward, there's two forces. There's the flow of you running forward. And then there's your thermal vibrations. So in one limit, the flow might absolutely be overwhelming. The random fluctuation of least action would be that person running forward. Conversely, if that person was like the size of one molecule, they would be being buffeted by stochastic thermal vibration. And so they would not be following that path of least action the same way. Loose metaphor hopefully doesn't misrepresent, but paths of least action will figure prominently in the following sections of the paper, especially when considering systems that behave in a precise or predictable way. We will denote the most likely states and paths with a bold typeface. So it can be a little subtle sometimes. And so we'll try to clarify notation and get assistance as we can. But like this is like a bold x like a bold x through t is the minimization of that unbolded function with respect to the action a of those states through time. And that is going to be related to the change in x through time being just only the flow. So here it was like x dot of t equals the flow of x plus a noise term. Here if the noise term is zero, you can drop it. And then they write, although equivalent, the Falker Planck and the path integral formalisms, this dialectic that we've discussed, the formalisms provide complementary perspectives on dynamics. Did we mention that they're different but complementary? The former deals with time dependent probability densities over states. The latter path integral considers time independent densities over paths. The density over states at a particular time is the time marginal of the density over trajectories. These probabilities, Bayesian probabilities, Bayesian statistics can be conveniently quantified in terms of their negative logarithms or potentials leading to suprisal and action respectively. So we'll explore the formalisms more, but just to note that the fancy i, the fracture i, is the joint distribution and it is going to be a negative log on the joint distribution p. And then similarly, a, the action is going to be on those same variables in a slightly different way, being looking like a surprise conditioned upon x sub zero, the starting conditions. And we're going to go into more detail in the coming weeks. Any more comments on section two or we'll continue to three? No, yeah, that's awesome. Section three, solutions, study states and non-equilibrium. So far, so sections one and two, we have equations that describe the relationship between the dynamics of a system and probability densities over fluctuations states and their paths. This is sufficient to elaborate most physics. Big if true. Here's where the, for example, we could use the Falker Planck or the path integral formalism, so either side of that dialectic to derive quantum mechanics where the Falker Planck becomes the Schrodinger wave equation. Mechanics one, quantum. We could focus on systems that comprise statistical ensembles of similar states to derive stochastic and statistical mechanics in terms of fluctuation theorems. Statistical mechanics, mechanics number two. Finally, we could consider large systems in which the fluctuations are averaged away to derive classical mechanics, such as electromagnetism and dot dot dot general relativity. Classical mechanics. Mechanics number three. All of these mechanics, quantum, statistical, classical acquire boundary conditions. To give the examples, the Schrodinger potential in quantum mechanics, the heat bath or reservoir in statistical mechanics, and the classical potential for Lagrangian classical mechanics. At this point, the FEP steps back, I wonder what space it's in, and asks, where do these boundary conditions come from? Indeed, this was implicit in Schrodinger's question in the famous article, What is Life, 1944, where Schrodinger wrote, how can the events in space and time, which take place within the spatial boundary of a living organism, be accounted for by physics and chemistry? And this is a bit of a nod slash reference to the paper of Ramsted et al. in 2018, answering Schrodinger's question of free energy formulation. So this was a really impactful and relevant paper that started a discussion around the FEP and multi-scale FEP and self-organizing complex biological systems and so on. So anything to add about these mechanics and their similarities and differences? Mark, I'll blanket. I'll sleep in there. I'll sleep in there. In Formalism 6, we're going to go into it hopefully in the discussions, but just in livestream number 32 on stochastic chaos and Markov blankets, just to show where these similar equations were represented. We talked about how the Helmholtz decomposition could be used to take a field and decompose it into multiple sections. There's the divergence term, which is like or the gradient term, which is like you're on some landscape and the ruler that's the most angled just going straight up the hill. That's like the divergence term. And that might be useful for going up or down a hill as fast as possible. Then there's this solenoidal or curl flow term, and that is like an isocontour. And those two are part of the classical Helmholtz decomposition. And then there's this capital lambda, which is the housekeeping term as it was referred to in 32. And we explored it a bit, but it reflects the way that that landscape is also influenced by movement. So we wondered if that was like kind of like walking around on a trampoline where you can't just snapshot the topographical map and then do your navigation on the fixed map, but there's some change there. We'll come back to that later. So that's section three on solution study states and non-equilibrium. Now we're going to really get to Markov blankets. Particles, partitions, and things in section four. All right. So what do you see in figure one? Yeah, I mean, see these two inner red and green conditioned on each other that keep these external and internal states on the outside separate. They still have conditional relationship between like the sensory states, the external states, and the active states, the internal states, but not directly with each other. Awesome. So they wrote in the caption, it's an influence diagram. I believe it's also fairly to say it's a Bayes graph, but we're thinking of the arrows as influence and the nodes like the circles as variables. It's a particular partition of states. Particular is upon because it's one specific partitioning. It's not the only partitioning, but it's a particular one that we're sharing with you. And it's partitioning that into a particle, into something that's like distinguishing figure from ground. And the four states that get partitioned are internal states and external states that are separated by a Markov blanket comprised of the sensory and active states. There's a lot that could be said, and we're going to continue to explore, but just to give a few more notes, the partitioning entails that all of these states come into existence in a model specific way at the same time. So it's not like there's some feature of the world that just is fundamentally an internal state. It's the relationship between an internal and external state conditioned on a blanket by which all of those assignments can be made. It's like an incomplete sentence to have one referenced without the other. Also, we have the names of the states, mu for internal and eta for external. And it's kind of like they almost look like flips of each other. Then s for sensory states, incoming info, a for action, active states, outgoing actions. So we have eta, external, s sensory, a active and mu internal states. And then there's a few couples of states, sets of states that we are going to care about. And the equations that describe them have super different implications. So one set of states that's interesting to compare together is the blanket states B is the set of s and a. And that's more aligned with the Pearl 1988 formulation of the Markov blanket or boundary where it was an undirected blanket. And Friston and others have developed this notion of like a two way partitioning. So the blanket is the set of both s and a. The particular states pi, not to be confused with pi, the policy selection in the POMDP, the particular states are B and mu. So the blanket and mu. So the particular states are internal, active and sensory, everything except for external. So the particle is like the cell, the boundary and the internal generative model. And then there's the autonomous states alpha, which is just a and mu. And one way to think about those states are like, if you were to design the system, those are the states that you can control. You can't directly control the sensory input, which is the only thing that differentiates the autonomous from the particular states. You can't directly control what photons hit the retina, but you could control your interpretation and your action for where your eyes move, which might absolutely influence which photons do hit the retina. But the autonomous states are the ones that there's a degree of agency over in a way that is slightly different from just the consideration of the blankets and the internal states. Let's continue to talk about some of the formalism. The conditional independencies, which are going to be, let's pull back one more. In associating some of these equations of motion with a unique non-equilibrium steady state or NEST density, we have a somewhat special setup in which the influences entailed by the equations of motions place constraints on the conditional independences of the NEST density. These conditional independencies can be used to identify a particular partition of states into external sensory, active, and internal states as shown in the figure. This is an important move in what space because it separates the states of a particle, internal states, and their sensory and active states, the particular states from the remaining, i.e. external states. To do this, we have to establish how the causal dynamics in one underwrite conditional independencies. This can be done simply by using the curvature or second derivative of surprise as follows. So how is the second derivative of surprise related to causal sparse dynamics? In your own words, what is the formal description of a Markov blanket? What's this sort of minimal Markov blanket? This is a slide from Livestream 26 on Bayesian mechanics. This is like an undirected blanket. We don't have S and A separated, just B intermediating and partitioning, making mu and etsa conditionally independent. And what about the Friston blankets type with sense and action separated? What about nested Markov blankets and so on and so on? But just to look at seven, we basically have x, u, and v, x sub, u, and x sub, v are conditionally independent, conditioned upon B, the blanket, dot, dot, dot, dot, fancy i, surprise, d squared of surprise, partial derivative, second derivative of surprise, dot, dot, dot, dot equals zero. What does it mean? How is the second derivative of surprise related to causal dynamics and the partitioning? But continuing with a sparse coupling notion. Sparse coupling means that any two states are conditionally independent if one state does not influence the other. Totology number five. This is an important observation, namely that sparse coupling implies the non-equilibrium steady state density with conditional dependencies. In turn, this means any dynamical influence graph with absent or directed edges admits a Markov blanket. So if your variables are all connected to each other, it's a fully connected social network. It's a fully connected statistical model. Then, I guess you could argue it has a blanket with respect to some other things you didn't model, or you could get into that whole question. But within that model, there are no blanket partitions to make. If there are any absent edges, then there are some states which upon knowing them make some other states conditionally independent. They give some more technical details, and then they raise a question in the footnote. Why does the particular partition comprise four sets of states? We'll talk about that in dot one. Continuing on sparse coupling and Formalism nine. We can define sparse coupling as the solution to this equation in which all the terms are identically zero. So what terms are zero? What is the reading of this equation? Meaning, under what situations are they zero? Are they being measured and said to be estimated at zero? Are they being dictated to be zero? So what terms are zero? What does that mean in nine? Sparse coupling means the Jacobian coupling states U and V is zero. So on either side of the blanket is zero. Look in seven, U and V, they're on either side of the blanket conditioned upon that blanket existing. I.e., there's an absence of direct coupling from one of those to the other. This definition includes solenoidal coupling with U that depends on V. Because H and gamma are positive definite, sparse coupling requires associated elements of the solenoidal operator and the Hessian to vanish at every point in state space which in turn implies conditional independence. So what is being shown here? But just to give one more visual way of looking at it, this was figure eight from live stream 32. And we had basically six variables that were in two clusters of three, and they were like two communicating entities. And so they were being coupled by this like one to four connection. And so up here on the top left, there's strong coupling within nodes one, two and three. And then there's strong coupling within this click of four, five and six. And then there was a sparse coupling between the systems with a one, four. And there was also somewhat of a sparse coupling within the click because they're not all block squares. So we'll return to this and it's again on the theme of how is sparse coupling related to the particular partition and a Bayesian mechanics for particular systems. They give some more formalisms in 11 about blankets and couplings and flow. We'll go into it later. And 13 is very informative. The normal form means that particular partitions can be defined in terms of sparse couplings. Perhaps the simplest definition that guarantees a Markov blanket is as follows. External states only influence sensory states. And internal states only influence active states. This means that sensory states are not influenced by internal states. And active states are not influenced by external states. Here's formalism 13. So recall, eta external S sensory a active mu internal. So f is the flow and omega is going to be the noise. And the subscript in something is here like tagging it to be about that variable. So we have the rates of change through time. Here's like a vector or a tuple. This vector is describing the four states. And it's how external sense action and internal states are changing function of time. This, just like formula one, is going to be unpacked where we had x dot of t equals flow plus noise. Here we're doing like four of those in parallel. And so it's like external states changing through time is a flow on external states as a function of external sense and action plus the noise of external states is a function of time. So sense and external states are a function of external sense and action states plus accompanied noise terms. Whereas action and internal states are functions of sense action and internal states. And so it's like external and sensory states are not being caused by internal states. That's the only missing guy here, mu. It looks like NSA it's not it's ETA SA. Conversely, the autonomous states, the ones that we have more agency over the odd one out the missing, you know, the dog that's not barking is ETA. So these autonomous states are flows on sense action blankets and internal states. So the bottom two rows are the autonomous states. And those are defined as flows of particular states functions of flow flows that are using the particular variables. Whereas the external and sense are driven by external facing variables. How do you read that under the sparse coupling? It's simple to show that path show that not only our internal and external states conditionally independent, but their paths are conditionally independent, given that path integral formulation. Okay, any thoughts on these last few? Yeah, again, I just feel like whether the math is landing for people and this, you know, separation of the internal states and the being the internal flows being conditioned on the sorry the autonomous states, right? Like it's also it's just partially like tautological or just just makes sense in the simple way that the model is presented. Couple slides back. I'm not sure if I guess this is the way it's about to bring it up. But thinking also about just so like, I think a question that I have frequently about this is like, where do they come from? Where do things come from? Well, they come from, you know, the separation and conditioning of, you know, particulate from the background and the blanket forming out of blankets form, etc. Or whereas, what's the difference between the internal state, the external like you said, if there was, you know, this giant social graph, all everything's connected, they're all connected, then, then there's no markup blankets. Um, just like two cells in the body, how do they know that they're not, you know, they both have their own markup blankets, and yet they are both like external states that are indistinguishable from kind of the rest of stuff to themselves, right? For each of them, right? Like, over time that, you know, dissipates, right? And so, yeah, I just, I guess, I wonder, like the overtime dot notation here, like how that um, the dynamics of how something becomes an internal state or an external stage, how that exactly happens, because, you know, it just seems like when you look at a lot of examples on, you know, our entropic physics, you know, um, entropic like, um, state where we're right, like everything is good for us, um, and everything we can serve, right? It's, it's like galaxies or something. Oh, they're so far away. They don't, you know, there's a lot of intergalactic, you know, space that's not affecting. It's very external. You're blinded a few billion years, like lots of interaction, lots of markup blankets. So, it's like, um, yeah, like where does that begin and end? Yeah. Oh, Markov, where are they staying? Yeah. Formalism 14 and 15 are further details on the entropy of internal and external paths and the conditional dependencies of paths. That's going to conclude section four, Particles, Partitions, and Things, which was about the Markov blanket partitioning of particles, particular states, and so on. We'll move pretty quickly through the following sections. Section five, from self-organization to self-evidencing. Equipped with the particular partition, we can now talk about things in terms of their internal states and Markov boundary, namely autonomous states, and we can talk about particles, particular states. The next step is to characterize the flow of the autonomous states in relation to external states. In other words, considering the nature of the coupling between the outside and inside of the particle across its Markov blanket, it is at this point that we move towards a Bayesian mechanics that is the special provenance of systems with particular partitions. The existence of a particular partition means that given sensory states, one can define the conditional density over external states as being parameterized by the most likely internal state, and we had several discussions in live stream 26 on Bayesian mechanics with Lance Takasta. This is where variational inference is going to come into play. We will call this a variational density parameterized by the internal mode. Q, the distribution we control, of internal states, mu, about external states, eta, is defined as p of external states conditioned on sense. Equation 16 means that for every sensory state, there's a conditional density over external states and a corresponding internal mode with the smallest suprisal. This mode specifies the variational density, where by definition the KL divergence between the variational density and the conditional density over external states is zero. More formalisms, more details. Inducing the variational density is an important move. It means that for every sensory state there's a corresponding active mode and an internal mode. This mode does not mean style here. Mode means like the most common value in a statistical distribution. The active and internal modes constitute active and internal manifolds. We will see later that these manifolds play the role of center manifolds, namely manifolds that contain paths that do not diverge or converge exponentially, the operative exponents. The internal manifold is also a statistical manifold because it is equipped with a metric and implicit information geometry. This is because movement on the internal statistical manifold changes the variational density. This is moving towards that dual information geometry perspective. What is equation 17 showing? We're looking at the flows because if we're interested in those paths of least action, we can discount that fluctuation term. The expectation of the fluctuations is zero, so we're able to drop those omegas with a subscript that we saw in 13. So now we're going to be looking at the flows on different states. As for the top two, the external and the sense states, those are going to be about surprise, for actor eye, fancy eye. For action and internal states, the autonomous states, that is free energy. Minimizing surprise about external and sensory states, minimizing free energy about autonomous states. What is that F? The free energy in question is an upper bound on the suprisal of particular states. So in variational Bayesian methods, elbow and free energy are used to bound the suprisal. Here we're bounding suprisal on a particular partitioning, including action to model the action perception loop. So we've seen variational free energy several times. What do these different rearrangements mean? What other rearrangements or restatements of F are possible? They give a few thoughts here. Anything you want to add on variational free energy? Knowing that we'll have a lot of time to go into it, that's why I just bring it up. That's what formalism is 17 and 18 bring us to. Yeah, back to the path and a goal. It's, I'm wondering, my question is in relation to action and energy in the way this is being stated. What is the action version of this? What is the easy to miss? Oh, it's just energy thing that is in the way of framing this here because they're not the same thing. That's my question is just differentiating that or better defining which parts are action, which parts are the energy components of that. That's what I would like to be interested in rearranging or thinking about these equations. Cool. They write in footnote 19, is variational free energy the same kind of free energy found in thermodynamics like Gibbs free energy perhaps? The answer is no. This entropy is distinct from the thermodynamic entropy of internal states. We're going to return to this thing to have. Are we talking about thermal free energy? Or are we seeing equations that look like descriptors of thermal entropy but we're applying them in an informational setting? We'll return to it. As they had brought up before, one can formulate this along the lines of a center manifold theorem where we have a fast flow towards a center manifold and a slow flow on the manifold. Going fast towards the valley, ravine, and then going slow on the river. This decomposition can be derived simply using a Taylor expansion, Taylor series expansion around the time varying autonomous mode. Here is a Taylor series expansion. It's evaluating a function at a given point, usually zero, and then taking its first derivative and then taking its second derivative and using higher and higher derivatives to approximate increasing distances from the reference single point where that function was actually evaluated. So how is the Taylor series expansion or other power series expansions related to time varying autonomous modes? What about that flow on the center manifold? We know from 17 that the flow of autonomous mode can be expressed in terms of free energy gradients. Those are the two F's on the autonomous. So this expression 20 unpacks the manifold flow in terms of accuracy and complexity parts of free energy where the accuracy part depends on the sensory states and the complexity part is a function of and only of autonomous states. The manifold will look as if it is trying to maximize the accuracy of its predictions while complying with prior Bayesian beliefs. Bayesian statistics, these are all Bayesian variables. Here predictions are read as expected sensory states under posterior Bayesian beliefs about their causes, namely variational density over external states. The real temperature out there in the room, hidden unobserved external state, thermometer reading, sensory state, and then states some sort of cognitive mirror or representation as modeled of temperature that's conditioned upon the thermometer reading, but there's some generalized synchrony across that thermometer blanket. In footnote 21, they write question, do particles minimize surprise or free energy? And we'll go into it because they say minimization implies the teleology that goes beyond any claim of the FEP. Whence teleology? I thought we were like talking about teleology earlier. So how deep in the desert are we? What level of mirage are we engaging in? But we'll get there. And that takes us to figure two, which is going to show two components of autonomous flow conditioned upon sensory states. So I'll move us away. We can talk more about the technical details like how it relies on the Taylor expansion, but here are the autonomous states and the black line potentially can be read as like that manifold attractor. And so there's a movement towards directly towards the attractor. And then there's a movement that is perhaps orthogonalized keeping in mind the difference between the gradient and the solenoidal flow. And together through time, so this is not just like a hill previously like in 26, we saw like the hill and the particle makes its way to the top of the hill as a function of gradient, ascent and solenoidal coupling flow. Here we're not just ascending a hill, like we're tracking this moving point through time, but we're converging in this spiral way. And then we can maybe unpack that in relationship to manifold flow. And how even if this black line is the manifold, then how is the entity staying on the manifold? So we'll talk more about figure two. Now they're going to summarize this section. A particular partition of non equilibrium steady state nest density implies autonomous dynamics can be interpreted as performing inference particular kind. It's not doing every kind it's doing a specific kind. And it's particle. There's the fast flow towards the center manifold, and the slow flow on manifold. The manifold flow can be interpreted as Bayesian belief updating. And posterior Bayesian beliefs are encoded by points on the internal states statistical manifold. In other words, for every point on the statistical manifold that synchronization manifold, there's a corresponding variational density or Bayesian belief over external states. These are internal states about external states conditioned on the particular partitioning. And that can now be expressed as the variational principle of least action. This is a basis of the FEP put simply, but not too simply, I guess. It means that the internal states of a particular partition can be cast as encoding conditional or posterior Bayesian beliefs about external states. This license is a poetic description of self organization as self evidencing. We'll unpack it later just wanted to mention it so we can continue in figure three, all different ways in which this self evidencing is connected to various theories. For example, value, surprise, entropy and model evidence. And the schematic is illustrating how minimizing variational free energy. Wait, but I thought that minimization implies the teleology that goes beyond any claim of the FEP. But minimizing variational free energy relates to normative theories of optimal behavior, like value maximization, pragmatic reinforcement reward learning, plavlov, surprise, novelty, info gain, infomax. Why is the free energy principle here? Isn't it like all of them? Entropy and model evidence as well. And then just to conclude the section, they ask, is it tenable to interpret gradient flows on variational free energy landscapes as variational inference? Or is this just teleological window dressing? The next section addresses this question through the lens of Bayesian filtering. In brief, we will see that autonomous paths of least action implied by our particular partition are the paths of least action of a Bayesian filter. This takes us beyond as if arguments by establishing a formal connection between particular dynamics and variational inference. So, to discuss what are particular dynamics? What is variational inference? What is the relationship between particular dynamics and variational inference? What would it mean for it to be as if? And what would it mean for it to be something different than as if? Okay, continuing on. Six, Lagrangian generalized states in Bayesian filtering. Now, say we wanted to emulate or simulate active inference. What if that were the case? We could find the stationary solution to the Falker-Planck equation and the accompanying Helmholtz decomposition. We could then solve number 21, the paths of least action that characterize the expected behavior of this kind of particle. However, there is a simpler way to recover the paths of least action by finding a path that minimizes Lagrangian at every point in time, noting from three that the path integral of the Lagrangian is the action. First, they're going to reintroduce generalized coordinates of motion. So, we talked about that in number 26 as well, but the generalized coordinates of motion are like the position, velocity, acceleration, and higher and higher and higher derivatives of the coordinates of location. So, it's taking like the x, y coordinates and adding the first derivative and their second derivative and their third derivative and just having that in a vector. So, that's what's shown here. x through time, the derivative of x, x prime is a flow, the second derivative, the first derivative, and so on. That's the integrator chains, PID control, and so on. In the generalized coordinates of motion, state, velocity, acceleration, and so on are treated as separate generalized states that are coupled through the Jacobian. So, if the first and the second are related to each other and the second and third and third and the fourth, there's a sparse coupling in that vector. This allows us to relax certain assumptions and gives a quadratic form. We'll unpack this more, but there's a sparsity and a capacity to do modeling of movement in the generalized coordinates of motion. Now, maybe three, maybe six are sufficient, but this is like an infinite dimensional framework, but it may be the case that you can go super far with like just a few. Formalisms 23 and 24 describe that M can be read as a mass matrix. That would be interesting to know what is meant there, and there's a suggestion that precise particles with low amplitude random fluctuations behave like massive bodies. So, that's like the baseball on the parabola. The air fluctuations and the thermal vibrations are not dominating its trajectory, but the one molecule baseball, it is getting tossed and turned. In equation 25 and 26, they continue describing using that Helmholtz decomposition, the divergence free flow, and the curl free flow in terms of the gradient descent on a Lagrangian. That'll be helpful to learn what they mean. And then crucially, when that is minimized, the mode of the path becomes the path of the mode. That would be something useful to understand what is meant. What's the difference between tracking the mean and tracking the mode? For Gaussian distributions, like the Laplace approximation or any other second order curvature, what does that mean? And then just to close this section, the generalized free energy is easy to evaluate, thankfully. Given a generative model in the form of a state space model, here it is. F of sense, action, and internal is going to be some terms that we'll learn about. And finally, one can simulate active inference by replacing the generalized flow of autonomous states with a generalized Bayesian filter. So, here we have 27. Very similar, except notice that there's a lot more arrows. And the bottom two, instead of 13, we now have these bottom two rows with the D on A and the D on mu. And some triangles have been introduced. We'll unpack more what that means. Any thoughts or comments or we'll continue to seven? So, yeah, awesome. Seven, from statistical to classical particles. So far, we have a Bayesian mechanics that would be apt to describe a particle or person with pullback attractor. But what is the difference between a particle and a person? This speaks to distinct classes of things to which FEP could apply, molecular versus biological. Verses? Here, we associate biotics self-organization with precise and predictable dynamics of large particles. So, as we described earlier, if we were talking about classical mechanics, something bigger would be more resistant to like thermal fluctuation. We could think about like that analogy to mass in the statistical setting. Where are the priors like massive? And they're just on their own inertia and they're not being buffeted by stochasticity versus where is it a lightweight something that is getting buffeted? So, here's going to be a distinction between statistical and classical mechanics in the setting of the particular partition. It is often said that the FEP explains why biological systems resist the second law, tendency towards disorder, and the natural tendency to dissipation and disorder. However, this is disingenuous on two counts. So, is Friston 2013 disingenuous or is that paper combating the disingenuity? First, the second law applies only to closed systems, while the free energy principle describes open systems in which internal states are exposed to in exchange with external states through blanket states. Of lots unpacked there, what exactly is the exchange? Is it like an informational exchange or are like nutrients crossing and becoming incorporated into the internal states? And second, there is nothing so far to suggest that the entropy of particular states or of paths is small. So, this is like the design language for particular states that might be totally buffeted by stochasticity or totally on the least action railroad and everything in between. But nothing has been said to identify one or the other and so everything high and low entropy densities. So, there's two particles in two different rooms. One particle is equally traveling to all parts of the room. The other particle is staying in one part of the room. So, one of them has a very ordered distribution of space and the other one has a very like disordered, you know, equilibrium gas flowing throughout the room. Both of them are particles. So, what distinguishes between high and low entropy systems, e.g. between candle flames and concierges respectively? 28. We could have had a concierge and that is going to describe that and we'll come back to it. This suggests that precise particles such as you and me respond to environmental flows and fluctuations in a precise and predictable fashion. Well, I'm an unpredictable guy. Yes, in our regime of attention, it is almost like tuned to some maximally confusing or uninformative things or we could have some cognitive experience of being confused but someone's blood sugar through time has more predictability than zero. So, it's on that continuum from totally buffeted to totally on the railroad tracks and it's more like the railroad tracks if you want to survive or design higher liability systems. And then they're going to introduce figures four, five and six. Four is about the difference between generic and precise particles using an information diagram. For precise particles, there's no uncertainty about autonomous states given sensory states. Knowledge about action, knowledge about internal states like metacognition. Is the behavior of precise particles sufficient for sentient behavior? Perhaps. Figure five, the implicit computational architecture used in simulations of sentient behavior. And six is reproducing an example from the act in literature with action and action observation. So, figure four has to do with information gain and sharing with generic and precise particles. Do you want to add anything? Otherwise, we'll just take one look at each of four, five and six. Okay. Figure five, Bayesian mechanics graphic summarizes belief updating implicit ingredient flows on variational free energy. And six, sentient behavior and action observation. So, this is like somebody involved in pointing or tracing or handwriting and also looking visually. In summary, precise particles immersed in an imprecise world respond almost deterministically to external fluctuations. Why might this behavior be characteristically biological? Precise particles may be the kind of particles that show life-like or biotic behavior. So, let's think of those two particles in the two rooms. One of them, 24 hours a day, no matter what the temperature is or what the light, dark cycle is, it's always equally covering all parts of the room. The other particle, you find out that when it's light, it stays in this part of the room. And then when it's dark, it stays in this other part of the room. Which one of those particles without knowing anything more seems more biological? The one that's diffusing like a gas molecule or the one that has this orderly distribution, especially one that's conditioned upon like salient's external factors. So, the distinction between those imprecise and precise particles getting buffeted by thermal vibrations or being like the classical, like the Kepler universe, like the planets going around, orbits totally unbuffeted by thermal vibrations rests on the relative contribution of dissipative and conservative flow to their path through state space. One might associate precise particles with living systems with characteristic bio rhythms and then they bring up many, many nested bio rhythms from rapid oscillations that are multiple times per second in neural systems, heartbeat, respiratory, circadian, seasonal, life cycles, even pulling out to the evolutionary angle. Turning this on its head, one can argue living systems are a certain kind of particle that in virtue of being precise, events, conservative dynamics, bio rhythms and time irreversibility. So, wow, how are these all related to each other? Where does the time irreversibility connect? And their summary. The emerging picture is that biotic systems feature solenoidal flow in virtue of being sufficiently large to average away random fluctuations when coarse-graining their dynamics. And figure six explores that more. Okay, final technical section eight, path integrals, planning and curious particles. The previous section was focused on linking dynamics changes through time to densities over generalized states. Internal states can be construed as parameterizing Bayesian beliefs about external states. Internal states are about external states in this model. In what follows, we move from densities over states to densities over paths to characterize the behavior of particles in terms of their trajectories. So, Formalism 29. We're interested in characterizing autonomous responses to initial particular states, autonomous, the action and internal states, the ones that we control. Recall that when random fluctuations on the motion of particular states vanish, there's no uncertainty about autonomous paths given external and sensory paths, knowing exactly what to do and exactly what to think. And there's no uncertainty about sensory paths given external and autonomous paths, knowing exactly what senses will be observed given the external world and the autonomous paths, which is what one does and thinks. If we interpret Entropies as the limiting density of discrete points, figure four, then the uncertainty about particular autonomous and sensory paths given external paths become interchangeable. Formalism 30. Formalism 31. Expected free energy. The autonomous path with the least expected free energy is the most likely path taken by the autonomous states. Expected free energy. How can we read it? What does it mean? Why does it matter that we're focusing on autonomous states, autonomous paths? And how is it similar or different from other representations that we've seen and that we will see of expected free energy and free energy of the expected future and so on? Expected free energy. Where it can be regarded as a fairly universal objective function for selecting paths of least action? Planning. Planning as Bayesian inference. Equation 33. Expected free energy. Expected action. Bayesian optimal decisions. Utility. Pragmatic value. Expectations of information gain. Optimal info gain. Bayes optimal experimental design. Optimal hypothesizing. So not just falsification, not to beat a dead horse, but rather than constraining and then executing, keeping this infinite game optimal design perspective open so that we can always be tier of the balance between optimal decision making and optimal experimenting. When simulating with active inference and planning afforded by path integral formulation, one usually works with discrete state spaces and belief updating over discrete epochs of time. That's where the partially observable Markov decision process, POMDP, comes into play. And plausible policies can then be scored, evaluated based upon their expected free energy and the next action is selected from the most likely policy. Policies are sequences of actions that can be taken. In summary, we now have at hand a way of identifying the most likely autonomous trajectory from any initial particular state that can be used to simulate sentient behavior of precise particles that we've associated with biotic systems. Expected free energy absorbs two aspects of Bayes optimality into the same objective functional, which is a function with functions. The information gain is optimal Bayesian design. Bayesian decision theory is minimizing the cost function under a decision or choice and uncertainty. As with the interpretation of variational free energy, this dual aspect functionality of expected free energy is an interpretation of a single existential imperative to possess a Markov blanket and implicit thinness. Teleologically, it's worth reflecting upon the differences between the generative models that underwrite variational and expected free energy. For VFE, the generative model is a joint density over external and particular states supplied by or supplying the non-equilibrium steady state density. For the path integral formulation, the generative model is a joint distribution over paths. Because consequences, the generative model acquires a temporal depth. They then have several figures that continue on this theme. Figure seven is related to a Dacosta 2020 discrete states based synthesis paper showing special cases of variational and expected free energy. Like when there's no preference, there's optimal information gain. When there's no ambiguity, you get pure risk sensitive policy. When there's no intrinsic value, you get expected utility theory. And when there's no ambiguity or preferences, some recent work by Ramstead at all have started to soft reboot the FEP and connectic B principle, which we're going to be looking forward to learning more about. Figure eight is about Bayesian mechanics and active inference. So we can contrast that with figure five, Bayesian mechanics and active inference. Figure eight, Bayesian mechanics and active inference. Figure nine is a simulation in which isocades are occurring. And we can explore this more to the conclusion. There are many points of contact between what we described and other theories. Please read the papes. Discuss special cases. They discuss the mechanics of synchronization, separation of timescale and applications. And then in closing, they write about how the developments are speaking to the shift in focus from the foundational issues addressed in this article to their applications, learning and applying active inference and FEP. It is quite possible that the foundational aspects of the free energy principle may also shift as simpler interpretations and perspectives reveal themselves. That brings us to the end of the dot zero. So Brock, what would you like to add or say? I would like to add that for the next dot zero, I will be preparing my answer to this part first. I had a time. I don't know, just writing down a few things and thinking about interactability and uncertainty and possible kind of informational information entropy kind of questions that you could ask around that. What makes those kind of ultimate Markov blankets of interactability? How action intensive is it for information to cross or for the external states to influence the indirectly influence the internal states? And another question. How hard do you have to push on the blanket? What kind of push and what? So yes, they're conditionally independent, but also they do influence each other in the blanket. So how hard do you have to push? It seems like I guess what I'm saying is that the scale which we kind of you're using the particle people and the baseball analogy, right? And we talked about that yesterday or the day about like just a large matrix, large space of states of variables of dimensions of what you mean by a Markov blanket in a particular and the scale of that Markov blanket to the environment, right? So how hard do you have to push to have a big impact on how hard you have to push there? The baseballs flying through the air, but the particle is spreading out and being pushed around. And I was like, are we little molecules trying to push the baseball or are we the designated hitter who's way overpowered and able to put it wherever we want? Or are we just doing the tiniest buffeting? Maybe this is a design language that helps us frame both of those settings and recognize when are we able to be guiding the path of least action and where might no matter how hard we yell, maybe we're not influencing it like literally at all. Or we're influencing it like stochastically. It's not we can't kind of concentrate our action into one, you know, path, right? One pet. So yeah, I don't know. There's a lot of, um, wonderings and questions that are like not probably very well informed to just read the papes on and continue producing uncertainty. Yeah. Awesome work with this dot zero and thanks a ton for being a part of it. Thanks everybody for watching and all the great comments in the chat. And we'll see you in the dot one and in the dot two. Bye. Awesome. Thanks again. See you.