 Hello. Welcome everyone. It is Act In Flavstream number 45.2. It's June 8th, 2022. Welcome Carl. And we are having our third discussion on the 45. So welcome to the Act In Flav, everyone. We're a participatory online lab that is communicating, learning and practicing applied active inference. This is a recorded and an archived live stream. So please provide us with feedback so we can improve our work. All backgrounds and perspectives are welcome, and we'll be following video etiquette for live streams. Head over to ActiveInference.org to learn more about the live streams and other projects that are happening in Act In Flav. We're in Act In Stream number 45.2. It's the third discussion of the paper, The Free Energy Principle Made Simpler, But Not Too Simple. The first author has joined, so thank you very much for that. And we're going to be discussing the paper and related topics. So if you're watching live, feel free to write questions and comments in the live chat, and we'll try to address it if it's within the regime of attention. Otherwise, we'll just briefly say hello and then dive in to any number of places. So just to begin, we can say hi and we can just mention something we're excited about or something we liked or remembered or something that we might want to resolve by the end of the discussion. So I'm Daniel. I'm a researcher in California, and I think there was one question that was submitted by Brock that I would very much like to get into and also understand the role of mass and inertia for some of these statistical distributions. Blue? Hi, I'm Blue. I'm a researcher in New Mexico, and I have, you know, probably 100 million questions, but one that's been bugging a long time about internal states and sensory states and the influence, whether it's reciprocal or not. Okay. And Carl, anything you'd like to begin with or lead off? Well, I'm Carl and I was the free energy principal on this particular one and a bit of focus on curiosity. I'm curious to know what you're curious about. So I was just trying to remember Blue's question there and how I would provide a clear answer to it. That's what I find fun answering the questions. Let's start with a question about curiosity and then Blue, you can ask the question you had. Where's curiosity in these curious particles? They are technically an expression of curiosity is an expression of the imperative to select those courses of action or policies that yield the greatest information gain. And they are properties of certain particles, not all particles, but certain particles. Interestingly, the kinds of particles that conform or look as if they are behaving in a way to maximize their information gain or minimize their uncertainty about states that are external to the Markov blanket are those that show very precise dynamics, usually associated with larger particles, things like you and me. So if you work through the physics and the dynamics that particles who preserve their Markov blanket have and make the further assumption that you're looking at a large classical like particle that conforms to classical mechanics, then you can demonstrate that the active states, the way that it acts upon the environment are the product of beliefs about action that render the most likely actions those that maximize information gain. So that's a technical answer. For me, mathematically, it's all fairly straightforward, but I imagine it might be a bit more serious if you don't know the underlying maths. Okay, great start. So, Blue, to your question. So, this is something that I've read many times before, but since I have the opportunity to interrogate you about it now, I think it's a good chance. So, here it says particular partitions can be defined in terms of sparse coupling. Perhaps the simplest definition that guarantees a Markov blanket is as follows. External states only influence sensory states and internal states only influence active states. And I find this troubling and I found it troubling often in the past, because I feel like internal states can influence sensory states, like especially in the circumstance where someone takes LSD. Your sensory states are totally then influenced by your internal state of what's going on. Is it like poking holes in the Markov blanket, or what is happening when the internal states I feel have a great ability to influence the senses? Right. I like the notion of poking holes in your Markov blanket. That's something you wouldn't really want to do too much if you were that Markov blanket, because that would basically destroy the Markov blanket. And interestingly, I conceive of much of neuroscience as trying to sort of peer through the Markov blanket, and sometimes you do poke holes in Markov blankets. For example, in phase of electrophysiology, for example, you actually sort of get inside and look at the internal states. But in answer to your question, which is, I think quite an astute question, those kinds of effects would be mediated technically by what we call active states. So it is certainly possible that you can alter the gain or the sensitivity of your sensory states and your receptor organs, but that effect is mediated by an active state. So if you look again at that sort of influence diagram with little arrows in it, you will see that there is a precarious path in which the internal states can have an effect and sometimes a profound effect on the responsiveness or the sensory states per se, but usually what that translates into is the sensitivity or the responsiveness of sensory states to external states. So in the instance of psychedelics, for example, you might construe that as if the drug was acting at the level of the retina, for example, that might be activating certain active states that are mediating attention like phenomena at a very, very low level. It is probably not the best explanation for psychedelics. I would contend that anything that affects your perception is really something that can be understood as a different kind of belief updating or perceptual synthesis given the same sensory information. So I'll come back to you and ask, well, what did you mean by, I know that my internal states can affect my sensory states, and do you really mean your sensory states? Do you actually mean the rate at which your photoreceptors or any of your sensory receptors are actually firing? Did you really mean the sense making that ensues from the same sensory input due to some internal, we sometimes call it internal action, but some internal dynamics that change the belief updating and change the kind of thing explanation you bring to the table for this particular pattern of sensory input? Yes, and so I think it's a little bit of both. I think, especially like I think of some, I feel like it's maybe MDMA that affects the way that your eyes move in your head, it literally affects like the eye stickings. So I would think that like, I guess that's involuntary action. So I guess then it does impact the action, even though the action is involuntary. I guess that makes sense. I just endorse that. That makes perfect sense. And notice what you've done is you're talking exclusively about action, palpating the world. And that obviously underwrites the sense making. But of course, that's not internal states affecting the sensory input, it's affecting the way that we palpate the world and the sensory input that we get. And in a sense that that's the one way in which we express our curious behavior that we actually palpate volitially or subpersonally the world with our eyes, with our sensory receptors in our skin, and even our interceptive receptors that mediate those gut feelings. These can all be deployed in particular ways, but these are reflections of acting on the world. In order to produce different kinds of sensations. So that, you know, that answer I think, sort of highlights exactly that sparse coupling that underwrites the statistical separation of internal from external states. Give me another example. I think then the ones that are sort of counterintuitive or sort of violations are the most useful to drill down on. So like I think of extreme circumstances also not just like psychedelics but but you know when you're absolutely parched like so when you're absolutely parched or absolutely famished. So the food smells so much better when you're absolutely starving. So that's another example of how like my internal states, you know, I feel like affects your sensory perception. That's a great example as well. So I would read that example. There are makes a number of excellent points that certainly the salience of certain sensory cues in different states of being that you yourself infer will be very context dependent. So you're a nice example of smooth, you know, the food smells nice when you're hungry. But to if you like articulate that and to realize that and also make plans in terms of what you're going to do about being hungry. You have to have an internal representation, not just of niceness, but also being hungry, you know, to recognize that you are hungry. So I think that example almost paradoxically speaks to extremely high level internal states trying to make sense of all of these interceptive cues that are coming in. So that's a nice example of interceptive inference. So in the same way we try to make sense of the patterns of visual input in terms of what caused that. What's the best explanation of this particular pattern? Is it a face? Is it a bird? Very low levels? Is it an edge? All those rules in principle should also apply to the pattern inputs you get from your autonomic nervous system, your sympathetic and parasympathetic nervous system and all the receptors that I repeat literally report your gut feelings. And you have to make sense of that. And some, you know, creatures like you and me can make sense of it by bringing these very high level hypotheses like I am in a state of hunger or I am hungry. But this is just another percept, you know, and it's as, you know, has the same status as the percept of a face. A face is a sufficient explanation of what I'm currently seeing and enables me to make all sorts of predictions about what's going to happen in terms of, you know, looking at that face and hearing the sounds emanating from that face. Knowing that I am hungry also has enormous benefit in the sense that it now constrains or provides or conditions certain prior beliefs about what I'm likely to do next. Including these mental actions that enable me to attend to certain cues as opposed to other cues. And I think it's really the smelling nice part and parcel of that is that you are actually attending to olfactory input. But in order to do that, you've got to recognize that the most likely thing I am likely to do as a hungry creature who knows that she's hungry is attend to my olfactory input. So the mechanism of attention, interestingly, gets into your example as well. And I think that's a nice observation in the sense that many people think that the attention is a kind of mental action. It's an internal action. It's sort of basically appealing to this notion of Markov blankets within Markov blankets within the brain and certain high levels acting upon lower levels just to set the gain. You know, if you're telling a predictive coding stories with the precision of certain signals and a folk psychology rhetoric that this will be the salience of your particular certain modalities of input. But that are whole precise information for you at that point in time that resolve uncertainty about the state that you're in, hungry, happy, fearful, you know, whatever, you know, all of these things have to be inferred and recognized by the blimp updating and the dynamics of the internal states. I could first welcome Thomas greetings, talk a little bit about action and how action integrates these complex nested chains of causal narratives. So we talked about different perturbations, whether it's a pharmacological intervention or some optogenetic or something like that. There could be modification at the level of the retina. And that's directly modifying the sense states. So there we don't need to appeal to any internal states changing to change perceptions like you're seeing something different when there's different light on the retina. Then, as was mentioned, there might be modification of active states of ocular motor behavior. And then Carl, you just brought it to the idea of attention as a mental action. And so, looking back at the the partitioning in this figure, the particular states are the internal states mu and the blanket. So that's the whole particle as it moves around. That's the entity as distinguished from the environment. But there's special attention paid to the autonomous states, which are the active states and the internal states. And it's almost like these are the ones where control can be applied, either in this terms of direct ocular motor action or the possibility of attention as mental action. And so sense cannot be directly controlled, but in this strange loopway, it is what uncertainty is being reduced about, which is the outcomes of the observations. And so distinguishing these different subsets of the partitions, like the particular states being the entity, the particle, and then the autonomous states being the ones that have special control degrees of freedom. And the blanket states being the ones that intermediate or make the internal and external states conditionally independent. Each of these subsets and maybe even other subsets have special roles in understanding different cognitive informal phenomena like curiosity or drive and motivation and so on. And then also the example with the food, I am hungry. And then I thought about, well, what if someone had to believe I am fasting and that food smells good, but it would not be pleasant if my fast were ruined by this sensory outcome. So there's even degrees of cognitive action beyond impulsively following drives that are of one sensory mode. Thomas, would you like to say hi or remark on anything you're seeing here or anywhere else? Or if there's some questions in the chat. So as of the last time, I apologize, I'm a bit late joining run over a little bit at the hospital. So lots of things on the slide that certainly I'm interested in thinking about the roles of attention and you mentioned the word salience as well. And I guess the thing I might pick up on is that in relation to the idea that attention can be a form of action is that we can think about things that are salient as being. Things that are worthy of attending to worthy of performing that action that then obtains those sorts of data. Which I find quite a useful way of thinking about the relationship between those two terms and a way of understanding what we mean when we talk about salience or attention and those sorts of things. And for me, this is one of the big motivations for using computational neuroscience methods for using mathematical modeling as a way of trying to describe what's going on. And that a lot of these terms are used interchangeably and with very different meanings by different authors in a lot of cognitive sciences. And it's quite useful to be able to pin that down to something mathematical and unambiguous that says what I'm talking about is this. You may use the same word for something different, but at least we can be clear on what is meant by it here. So for me, it's one of the things that I found got me interested in active inference in the first place. This idea of being able to formalize things in psychology in a very precise way. That makes me think of pinning insects and sometimes by just pinning a few parts of them, the whole pose might be stabilized. And then other times you might pin this bug and somebody else says, well, I have this one that's floating around. It could be pinned down or just trying to walk this path between being clear about how the terms are used, but not partitioning ourselves off and saying, well, we're ignoring the casual use of the English word surprise or belief or curiosity or attention or salience. And we're just going to call that string onto this other formalism. So how to navigate the way that sometimes the formalisms do have words that tag them like risk or ambiguity and how to be clear about something yet when the term itself may not have sharp distinctions. Maybe a lot is cut out or is not fully encompassed by that definition and has issues with then people taking an unrelated issue or definition of that word and projecting that back onto the formalism. Yes. And as you say, I think it aids with translation to quite a large degree as well. And it's not just shutting yourself off with one specific definition that's different to everybody else's. It's a way of saying how can we actually relate all these things together and see if there are multiple different terms that are used that apply to the same underlying construct. Lou, would you like to ask anything or I can go to a question in the chat? Go ahead. Let's go to the chat. Okay, I'm going to ask one shorter question while preparing another longer one. So Ali asks, how can seemingly irrational non optimal behavior. In some organisms be accounted for in active inference, different types of non optimal or seemingly irrational behavior, whether altruism or making mistakes. How can these behaviors be understood within a behavioral optimality framework? Did you have any thoughts on that one first? Thinking about balls rolling to the bottom of bowls. One could say, I don't like where this ball rolled, or I wish things had been different so it wouldn't have rolled that way. But given where the ball starts and the structure of the landscape, it's just playing out a certain way. So if somebody has a very strong aversion to risk, maybe even with the total accurate information, they could still make a decision where the outcomes are post hoc perceived as sub optimal due to an extreme attention being paid to the possibility of a downside 1% chance you lose $1, but there's a 99% chance that you get $100. But if you're paying a lot of attention to that 1% chance of loss, maybe that causes somebody to not engage in this offer that in the mean field or in this sort of other way or other set of priors other attention would be quite relevant. So given how things are set up, they play out a certain way and we don't need to take that too much onto the determinism path. But given a set of priors, Bayes optimal or variational Bayes utilizing organisms or those we model as such may get to outcomes that don't always result in them achieving their even specified goals. I think about it in terms of the old expression if you want to go fast, go alone and if you want to go far, go together. It's cooperation and really in the effort of achieving any goal. I think that the altruistic behavior in terms of cooperation. Well, it's not always really seen that way. I think overall, it is that way. I mean, while giving $10 to a homeless person in your community might be depriving you of $10, it's also giving that person the opportunity to have a meal or whatever. It just is it's in our nature to be cooperative whether it's going to directly benefit us or not because it's only through things like cooperation that we can build rocket ships and be sitting here today discussing this paper. So we wouldn't have affordances like the internet if we didn't instinctively cooperate regardless whether it directly benefits us or not. Great point about the scale mattering. You zoom in. Why is that liver cell giving up all the glucose? Isn't glucose something important or pragmatic value for that cell? And it can be true and it can be playing a service at a higher level. So Thomas or Carl. I was just going to pick up on your analogy. We are metaphor with the bulls rolling into the different bowls. And I thought that was quite a nice way of putting it in the sense that the optimum trajectory of a bull rolling into a bowl for any given bowl is effectively going to be the most likely path it takes. Now, of course, you could swap that bowl for a bowl of a completely different shape and you would get a completely different trajectory. So what might seem optimal as a trajectory for one bowl might seem completely suboptimal for another. And there's a question that comes up here a lot in terms of thinking about the psychopathology or disease processes where people say, well, how is it that you can describe that as engaging in optimal behavior because clearly what's happening is so far from the optimum. And suppose the answer is that you're referencing optimality with reference to the wrong bowl. And you need to think about what what is the objective function. Whenever we're talking about optimality, we need to say what what is it we're measuring optimality in relation to. In active influence clearly the objective function is always going to be a free energy functional, but the specific priors that go into making up that free energy function will have a big impact on the shape of the free energy landscape. And therefore where we're going to end up moving the kind of trajectories we're going to have. And so it turns around the question of saying is something optimal. So given that it is optimal, what is the appropriate objective function defined in terms of the appropriate prize. Thank you, Carl. Yes, just to reinforce all of those answers and provide a succinct way of responding to this important question. What is the relationship between active inference and bounded or optimal behavior. So one way to answer that question is just to query the nature of optimal behavior and bounded rationality as it's usually defined in behavioral economics and sometimes in behavioral psychology. And one theorem which brings those kinds of formulation into question is called the completed plus theorem. And what that says is that for any pair of loss functions and behaviors, there exists a set of priors that render that behavior base optimal. So to translate that into the current conversation, what that means is active inference can describe any behavior in terms of base optimality given any loss functions if you allow for different priors. So what that tells you is the notion of sub optimality is itself a little bit suspect. All you're saying is that there are unique priors that explain this behavior. And what you normally end up realizing is that base optimality in the general sense of active inference is not a way of prescribing optimal behavior. It's a way of describing it. And that description requires an identification or specification of the prize in the generative model, which is effectively what Thomas has just said. And indeed practical applications of active inference to say in the context of computational psychiatry are just that they are just using active inference in order to summarize or to understand or characterize a particular person. In terms of the prior beliefs they are bringing to the table, given their behavior and given their sensory states and you're usually using some kind of economics game. So I think the key thing here is that the principles that underwrite active inference are there to describe, not to prescribe. Just in the same sense that Hamilton's principle of least action will describe how a ball travels through the air or a massive body or the ball in the bowl that you were just discussing. It does not prescribe optimal behavior until you specify the shape of the particular prize at hand, which will be by definition different for every particle, every creature and every person. So there is no optimality in the sense that there is a right way to behave. It's always conditioned on the prior beliefs of the person that you're talking to. There's a further twist here which moves on from expected utility theory and I'd also suggest bounded rationality, which is that the objective function that underwrites choice behavior and planning has two aspects to it. It's got the expected information gain and the expected cost or the negative expected utility. But the expected information gain depends upon what you currently believe, which means that the optimal way to behave depends upon what you believe in this moment. So there is no, if you like, invariant optimal response because it's always conditioned upon what you know at the moment. So there may be times when you behave in a way that seems to ignore your prior preferences in order to resolve some uncertainty. And there'll be other times when that will be less, the uncertainty will be less, if you like, potent in determining what the best next thing or the most likely thing you're going to do next. So I think active inference gracefully subsumes bounded and optimal behavior, but as special cases when you can ignore the way that we handle uncertainty. Awesome. And it makes me think of even in the fully observable chessboard situation, there still might not be a single best move. And so situations with less observability and more openness in their affordances. It's even more that case. So it's great how you mentioned Carl about the description rather than the proscription and that is followed nicely by a question from chat. So I a Roland Rodriguez asked. First in has said the free energy principle describes particular existence. Maxwell Ramstead says storms do not follow FEP while water does. I deduce then that for FEP the stone exists, but the storm does not. Is that after it or otherwise can anything be said to exist that is not able to be described by the FEP. So if FEP is about describing rather than proscribing, what can be described with the FEP? That's a difficult question. I'm tempted to get Thomas to answer that one. I'll give him a starter while he's gathering his thoughts. So, yeah, Kristen has said FEP. I think probably Ramstead talks about the talk about storms or stones. Because interestingly, storms are somewhat difficult to describe because defining the Markov blanket of a storm is not quite so easy as defining Markov blanket of a stone. So I'm going to assume that we're talking about stones as opposed to storms. And so the answer would be the FEP applies to anything where a thing is defined stipulatively in terms of possessing a Markov blanket that enables you to distinguish the states that belong to the thing from any other states. So if something exists, then the FEP applies. So that's one thing that can be said very, very clearly. The next question is, well, are there different kinds of things? And the answer would be yes, absolutely. And it depends very much upon the causal diagrams we're looking at, the influence diagrams that we're looking at before in terms of the sparsity of what influence is what. So stones may be construed as Markov blankets that don't have any overt active states. So they could be construed as inert things where inert things could be described as Markov blankets or possessing a particular partition, which is another way of saying it possesses a Markov blanket and you can distinguish it from a stone from the millier in which the stone is found. But that particular kind of particle or Markov blanket is inert in the sense it doesn't move around very much. In fact, it doesn't move around at all in an autonomous way. So that immediately distinguishes between two kinds of particles with and without active states. You can go on sort of defining different sparsity structures and different kinds of things until we get back to the... Daniel, you mentioned strange loops before and there is certainly an interesting strange loop that emerges when one's active states or one's actions are hidden from the internal states. And what does that mean? Well, it means that now my active states become causes of my sensations, but I don't know that. Therefore now, because I can read my internal dynamics as belief updating about the causes of my sensations, I now can be read as forming beliefs about the causes of sensations that include my own actions. And at this point you have a mathematical image of planning that you move from things like thermostats and viruses through to people and anything that has a sufficiently deep or expressive generative model to cover its own action. So now you get into the world of kinds of things that look as if they engage in planning and sometimes that's framed in terms of planning as inference. And that would be an apt way to describe a lot of the biological self-organization. We are interested, but not all of it. You could contend that the virus doesn't plan. You could even contend that at a different scale, evolution doesn't plan. So not everything plans, but there will be certain kinds of things, Markov blankets of particular petitions that look as if they do plan. Thomas, your turn. I don't know quite what I'm supposed to add to that. I suppose an alternative perspective on the question, or more accurately an alternative question, is which systems are useful to try and describe with the free energy principles? And I think what Carl's described in terms of some of the special cases, is it useful to describe the stone in terms of the free energy principle? No, certainly not. You sort of add in the additional baggage and things, but it's not really doing any inference of an interesting sort. And perhaps the best way of answering that question is to look at the sorts of things that the free energy principle has been applied to and that have found some sort of purchase. Clearly looking at the cognitive neurosciences, that's where it's had a particular impact. And I would imagine that part of the reason for that is that what's so interesting about brains and behaviour is that what goes on inside a brain is almost, by definition, is only interesting because it's something that relates to what's going on outside of that brain. So the whole business of doing neurosciences about relating internal and external states and the way in which we relate to one another. So it's only once you start looking at systems like that that you really need to appeal to this kind of formalism. Great. So just a few notes on this. Carl, you mentioned different kinds of things. And what taxonomies or classification schemes of things exist along what dimensions do things exist and how does the FEP bring cohesion or highlight tensions in the ways that kinds of things have been identified like mental versus physical constructs or just a few that were mentioned. Different extent of seemingly internal cognitive states. Then you mentioned the difference of the extent of the autonomous action. And then I think what the question and it was clarified a bit in the chat about the storm and the stone was about resistance to dissipation, resistance in its persistence. So where do dissipative, persistent, regenerative, propagative, cascading failing systems, growing systems, how do we think of those as different types of things? Technically, it all boils down to the functional form of the dynamics so from the point of view of the free energy principle, I should say that this is a very colloquial view which is just emerging over the past few years and indeed over the past few weeks. So this is not accepted wisdom nor is there any consensus about it. But if you want to announce the first principles, then you start where the free energy principle starts which is in the dynamics and the dynamics are defined by a stochastic differential equation. Sometimes there's a random dynamical system in physics. This is just the launch of an equation. And so any different kind of system and implicitly if there is a thing in the system then there has to be a particular partition that's something which is at the heart of the free energy principle. It's defining thingness in terms of the ability to separate states of the thing from other states. So by definition we're talking about Markov blankets. And as soon as you write down a particular partition that implies or requires a Markov blanket then you have the opportunity now to look at the functional form of the influences of different states and other states. And then you can now, if you like, knock out further causal influences to create a taxonomy of increasing sparsity of systems and ask questions what sorts of behaviors would these more and more sparse systems demonstrate. And one nice example that we've just been discussing is if we cut the connection in the functional form of the coupling among the subsets of the particular partition between the active states and the internal states so that now the active states, my action, now becomes hidden from the internal states because it's been uncoupled then it becomes a hidden cause of my sensations. And I get a very different kind of interpretation of the internal states in terms of belief updating. So that's one nice example of what would emerge just by thinking about things in terms of their dynamical architecture namely the functional form of the dynamical coupling between different subsets of things. You also introduced a very important, another distinction. So that would be if you like one dimension along which you would be able to elaborate different kinds of things, different classes of things, different categories of things of increasing sparsity and probably in the increase in the sparsity increasing sophistication and more closer to things like us basically, the sparse you get, the closer you will get to things that show autonomous behaviour of an intentional kind that has this planning aspect. But there's another crucial distinction which you refer to which is the distinction between particular states that are dissipative versus conservative. Now in this instance, for all things that exist for a given period of time then the Markov Blanket has to be there because that definitely specifies the existence of the things, that's an existential imperative. But within that non-dissipating Markov Blanket you can have particular states, namely autonomous, internal, active plus sensory states that belong to big things. And that's important because with big things all the random fluctuations that do the thermodynamic dissipation are averaged away and they start to behave like classical objects. So you have this distinction between small things whose internal states and sensory states and active states could show lots of random fluctuations, possibly even down to the quantum level, versus big things like you and me that actually obey almost lacrangian mechanics, classical mechanics, Newton's laws of motion for example, or electrodynamics in terms of the behaviour of neural populations and the like. So these classical dynamics are subject to exactly the same rules and principles that underwrite the free energy principle. But in this special case all of the internal active and sensory paths, dynamics are paths of least action because you've averaged away, because you've coarse grained to an extent that there are no more random fluctuations and that creates a very different kind of behaviour that again moves us into the realm of creatures that plan basically, simply because they have this precise kind of dynamics. So in summary from the point of view of just the maths I think the taxonomy you're looking for, which I don't think is actually out there yet but could be in the form of time, and I just said the reason I don't think it's quite out there is if you look at physics up until now, no one's really got beyond just saying there exists a heat bath or a scroting of potential or a heat reservoir. So they haven't really thought about a partition into a heat bath, the thing that's contained by the heat bath and the thing that contains the heat bath, which is the external states. So there isn't really a mechanics that really talks about thinness in this explicit way, which is one reason I suspect there isn't actually at this stage a physics of different kinds of things. But if there is, my guess is it would be articulated in terms of the sparse coupling amongst partitions of a particular subsets of a particular petition and whether or not certain subsets or their dynamics are subject to non-trivial random fluctuations and uncertainty. Awesome. Well, fairly we all expect and prefer such a taxonomy. So people can probably guess how that will influence our future policy selection. It also takes us to a question that Brock had raised in the week between the dot one and the dot two. So Brock asked, I'm interested in how Markov blankets interact and affect one another, specifically when their scales differ by many orders of magnitude. For example, baseball versus the air molecules, DNA in a cell nucleus versus the environment, alpha particles versus DNA. It sometimes seems the larger scale overwhelms the smaller scale and vice versa conditioned on some particular time scale. I'm wondering if we're considering the Markov blanket as its Markov chain into the past and future if that relative scale is taken into account. Some things have shorter or longer Markov chains that could possibly explain this seeming flip-flopping of direction of effects in their relative mass scales. Or is there some generalized synthesis of action slash free energy that considers both of these and others to explain the direction of effects? So how do dynamically, potentially dynamically changing boundaries and multi-scale systems come together to help us understand cascading top down or bottom up effects? I'm telling you, you should answer that from the point of view of evolution. Okay, here'd be one thought and a precursor non-FEP perspective on this is synergetics. The version of Herman Haken, 1983 classic citation, multi-scale systems like the ripple and the wave. Where is it that changing the ripple influences the wave and vice versa? As well as synergetics, which is a more geometric perspective that Bucky Fuller provides. So people have been thinking about what kinds of top down and bottom up effects are causing cascading changes in multi-scale systems. And then to bring it to evolution, like you suggested, the structure of the system. So the system as received by the current moment sets up certain affordances for very similar stimuli to propagate through a system in a nonlinear way. So for example, like a pheromone of a given shape might bind to a receptor with high affinity that opens local channels that makes that neuron fire and that amplifies. And so that causes a change at the organismal behavior level because of the shape of that perturbation, but then just adding one methyl group or modifying the chemical very slightly. So a very similar stimuli or a photon of a wavelength outside of the receptive range of a given visual receptor doesn't induce that kind of a cascading change. So perhaps it could just be said that the structure of the system and maybe even the sparsity of the connection amongst subunits predisposes it in a given moment to have what appears to be almost exquisite sensitivity for certain kinds of changes while very similar changes next door may have very little effect. And that potentially at a higher level evolution through the intra intra and intergenerational time scales gets to a multi level position where the kinds of stimuli that one wouldn't want to have propagate through the system don't. And the kinds of stimuli that one would want to have propagate through the system do because that multi scale system is the most responsive and adaptive over multiple time scales. Otherwise, there might be something highly sensitive at one scale, but it's at a blind alley at another time scale or vice versa. Yes to blue. I instantly thought of evolution as soon as I heard that question and really thinking about where a situation where the particle of larger mass is overwhelmed by a particle of smaller mass I thought about the sperm cell. So like upon fertilization I think that the particle of smaller mass completely overwhelms the particle of larger mass although the blanket of the sperm cell I think is then subsumed into the mother of a baby. But but I was reminded of the paper by Matt Sims that we discussed about counting biological minds and the interaction between Vibrio fissure I and the squid when the the I mean the squid go in and out or the Vibrio fissure I go in and out of the squid in a very like pretty it seems like very magical daily cycle and compose the light organ or cause the squid to like light up and enable its hunting behavior or facilitate its hunting behavior because then it doesn't cast a shadow over the organisms that it's hunting. And in this situation. It's interesting. It's these nonlinear effects. It's not one little bacteria that can make a squid light up but it's got to be like a sufficient concentration of bacteria that caused the lighting up of the squid. And it's like a compounded effect. So here like the the small Markov blankets of each individual bacterial particle really maybe overwhelm the larger system and I don't know. I feel that way also about humans the microbiome and the work that we've you know learned over the last 20 years 10 years maybe about the microbiome and its importance in human behavior the gut brain axis just overall human function functionality and toxicity and so forth. So so I think that that all these small Markov blankets really that they have have a very important effect and nonlinear but it does seem to take a quite a mass of them to make an impact. Can I just come in on the just to pick up on this synchronization of large ensembles of similar kinds of Markov blankets. I think it's a very important notion. It does strike me that sort of three fundamental issues buried in this question. There's this sort of ensembles of Markov blankets and then there's the link between different scales so Markov blankets of Markov blankets and then how does this manifest in terms of intuitive concepts like mass. So just to rehearse what you what's just been said from distinguishing between ensembles of Markov blankets at the same scale, then you now have a mathematical image of multi-cellular organization through to eco niche construction through to your cultural aspects of your evolutionary psychology. In terms of lots of people interacting together and this has many many fascinating aspects. The one if you like heuristic that I always keep referring to when trying to understand what a free energy like principle would would bring you is that because the very very initial free energy is an extensive property. It means that the free energy of the ensemble of a collection of multiple Markov blankets all talking to each other or coupled to each other by their blanket states is the sum of each individual free energy. And if you just simplify the notion of free energy as predictability what you're saying is that the the likely state of an ensemble is that in which they've jointly minimized their free energy by making everything mutually predictable. And this normally is manifest as generalized synchronization, which is exactly the sort of you know the the the behavior of bugs that you know that light up in together or birds that flock together. So you see this and you could actually argue that sort of things like language are just one expression of a way to ensure that we're all mutually predictable and ensuring that we are jointly mutually predictable means that jointly we can minimize our free energy. So there's lots of really interesting sort of directions that you can take that notion of what would happen if you put lots of Markov blankets together. The other the second thing which I think speaks more to the evolutionary thing is the the link between blankets of blankets at different scales. And, you know, we've just heard in the previous answers that this normally entails not only a coarse graining, moving, if you like, up in terms of the spatial scale but absolutely crucially a temporal scale. And I think that that's absolutely crucial. The fact that the the the Markov blanket at a lower scale is likely to last for a shorter amount of time than the Markov blanket at the bigger scale. So it's perfectly permissible to apply the free energy principle to Markov blankets that exists for a short period of time from the point of view of a larger scale and allow for a metamorphosis of face transition and merging a Markov blankets with the other sperm and the egg cell. But from the point of view of the scale above, this is perfectly permissible. So the free energy principle only applies at the time scale that is associated with the level in question. But now the game is to understand how the Markov blanket of Markov blankets at the scale above couples to the scale below and vice versa. And this brings us to all sorts of delicate issues about, you know, how you would read evolution and epigenetics as prescribing shortening Markov blankets in phenotypic time, for example. It also, again, speaks to this fundamental role of the separation of time scales that Daniel was talking about in relation to synergistics, you know, fast and slow modes. So I think that that kind of thinking about things from different point of view of a separation of temporal or spatial temporal scales begs some deep questions about top down and bottom up causation that is, if you like, can be articulated in terms of the endurance of Markov blankets at any particular scale. Finally, the mass thing is interesting. It relates very much to the scale. So mass at a quantum level is usually read as the inverse amplitude of random fluctuations. So if you're looking for the sort of, you know, the intuitive notion of mass as dictating the behavior of a very small particle at the quantum scale, then the quantity that underwrites that is the amplitude of the, on random fluctuations. So as things get hotter, with more and more random fluctuations, they tend to have the look as if they have smaller mass. Whereas you get bigger and bigger and bigger, and you move into the world of Lagrangian mechanics and classical mechanics, and the random fluctuations disappear, then you're in the world now where mass takes on a very different meaning. And it effectively can be regarded as an active state in the sense that my mass attracts you. So I act on you through my mass attracting you. And then you're in the world of general relativity, and body problems. And I mentioned body problems because the only thing that makes them interesting is exactly the nonlinearities that Blue was talking about. So, you know, if you didn't have those nonlinearities and this solenoidal flows that underwrite the Lagrangian mechanics, the orbiting of the heavenly bodies or anything that oscillates, you know, in a classical sense, then you would not have the itinerant dynamics that are characteristic of nonequilibrium systems that conform to classical mechanics. So just the notion of mass, I think, is itself really very interesting and takes on very, very different meanings mathematically, depending upon whether you're working at a very, very small scale or a very large scale. This is kind of a callback to our previous discussion. We talked about three generalizations. We talked about generalized synchrony, generalized coordinates of motions and generalized homeostasis. And so I'm actually seeing these reflected in your response and in your trilemma, Carl. The ensembles of Markov blankets at the same scale of similar or different kinds of things is a general synchronization. And again, that doesn't mean lockstep. That means it could be turned taking as a generalized mode to allow for this mutual predictability. And that's generalized synchronization. And then the second point had to do with the relationship between scales, whether that's accomplished with a renormalization or with some other framework from synergetics. And that made me think about the generalized coordinates in space and time and kind of like a GPS coordinate. Not exactly, but it's like a scales of 10 in space. And so there's some coordinate systems that help us navigate multiscale spatial and temporal with this notion that things that are things for shorter amounts of time tend to be smaller. And even things that have large spatial extent and are short lived, the smaller parts of them are even more short lived. So it's kind of true within and between different kinds of things. And then this final one, I feel like should connect to generalized homeostasis. I was thinking about the orbiting of the planets as a generalized homeostasis in some way, like the nonlinearity is such that it isn't just collisions and repulsions. But actually there is an attracting set. And there's bio rhythms that are predicated upon the generalized synchronization and the generalized coordinates somehow. And that allows for multi scale generalized homeostasis, which is to say persistent multi scale things. Let's go to a slightly different and technical question. This is from ML Don, who wrote say in our generative model, we have n numbers of possible states s. Given observation, oh, we would like to estimate PS conditioned upon O equals O using variational inference. What if we needed the n plus first state variable in order to describe the true posterior better similar to thinking out of the box by adding dimensions. So partially observable states would mean two things. One, I don't know how many states I need. So I need to be flexible in thinking out of the box for better estimation of P s conditioned on O. And two, given observation, oh, I am not certain what state I am in. Could you please discuss the idea of flexible growing and pruning of states s while estimating the true posterior. So this is a question that's come up occasionally over the last few years. And I remember Ryan Smith did some work on exactly this question, specifically on how you might grow the states in your model. So I think it's still an open question as to what the optional way of adding in new states or new hypotheses is. But I think the way of evaluating whether those additional hypotheses are useful is much better established. So the answer to the second part, how do you evaluate whether it's useful to have an additional state or hypothesis as part of your model or not, is simply to perform a Bayesian model comparison. So you calculate the marginal likelihood or a free energy approximation to that when the state is in play and another one when it's not. It's interesting to note that the question highlights both growing and I think pruning was in there as well. And that's interesting, it's much easier to prune them away than it is to work out how to add in new states. And we can do that very efficiently using methods like Bayesian model reduction, which effectively say let's sort of run the model or fit our model when we've got all of our hypotheses in play. And then test, post hoc alternative models in which we've set the prior probability of certain states to zero or to nearly zero to compare how much better would our model be without it. And it's better in the sense that we managed to reduce the complexity of the model we're using to simplify things and comply with Occam's razor while maintaining a high degree of accuracy in how we predict the data, which are the accuracy and complexity terms that would get an marginal likelihood or free energy. So if we had a simple way of adding an additional states, then it's just simply a matter of then calculating with that additional state is the marginal likelihood improved or not. An alternative way is to say, well, no, actually, let's just take the pruning perspective, let's just start from a very large number of states and an over parameterized generative model of our world with many hypotheses that we don't think are ever going to really be there and then simply prune them away in relation to the data we have until we've got the simplest accurate explanation for those data. It's very interesting about after a proposal is on the table, whether we're in the Bayesian or we're in the frequentist world, evaluating alternatives can be done with a defined approach. But it is how you get the proposal onto the table, what generates new alternatives because the space of model modifications may be like pragmatically or even in principle open ended, especially if we allow for very complex structural maneuvers like nesting of models that is basically open ended in which proposals are possible. However, those may be modeled as actions of some model builder, and then they have certain affordances, I can add nodes, remove nodes, connect, add a connection, remove a connection, or perform these structural modifications. And then it could be something like a genetic algorithm or some other policy selection approach that results in the space being heuristically or adaptively explored in terms of model structures, but it's never going to be as defined of a question as comparing alternatives because that has a very specific kind of plug and chug basis, unlike the creativity and the open-endedness associated with making model changes. Yes, and the point you make about finding a way of a set of operations to build a model is an interesting one as well. And Sidney had some discussions with Giovanni Pazzullo as well, not long ago, and thinking about exactly what sort of operations those might be, whether it be duplicating parts of a model, adding in higher up levels or taking them away, adding in states, taking them away, and how that might manifest in terms of natural selection of models over and development of cognitive architectures over evolutionary time. Cool, we did discuss at least one paper of Pazzullo at all that was talking about the simple homeostatic architecture and how that could be turned into a more anticipatory structure and so on. Let's turn back to the roadmap. Daniel, before we go on, I just mentioned just three things which people might have heard about, which would be sort of used related to what you've just been discussing. The first thing is structure learning in radical constructivism, so the work of people like Josh Tenenbaum in How to Grow a Mind, where I think it was this title of one of his popular papers. So this is exactly the same from what you've been discussing, what are the rules and principles for growing models and learning the structure of a model that could include an N plus one state. So that's a very interesting literature. The other probably more technical take on this is called nonparametric bays, where there are priors used to decide whether to include or not state N plus one, given you currently using just states one to N. And they rest upon things called stick breaking processes like the Chinese restaurant process. So there is quite a literature out there in terms of structuring the growth of a model, usually under the rubric of nonparametric bays. And then finally, in economics, there is an ideology that rests upon this distinction between not knowing, having uncertainty about states of the world and having uncertainty about states that have never been encountered or represented before. And that's sometimes referred to as radical uncertainty, so which is meant to distinguish between just being uncertain about states of affairs to be knowing that I just don't even know that these states could ever happen. So interesting a radical uncertainties usually uses a motivation for not committing to a probabilistic or Bayesian description of things. So you move into sort of heuristics of the Gert Gigerends assault and narrative explanations for things. But I think if you cast radical uncertainty as exactly the kind of problem that you'd be trying to think about principle solutions for, you can probably bring that kind of economic perspective back into formal maths. Just one final point, just relating to the previous question about ensembles and the conclusion that we all try to make our worlds as predictable as possible. That sort of suggests you don't need to grow models very much. If there is a case that we construct our own universes in a way to make them predictable. Then we actually don't need very complex models. So I wonder whether in fact the shrinking on the Bayesian model reduction is perfectly fit for purpose and certainly is consistent with with me in the sense that I was born with too many neurons and too many connections. And as I get older, I just lose stuff. But at least I become more fit for purpose at modeling my little world, which you all carefully keep very predictable for me. And thank you. I'll connect that to one common casual framework. But in a recent paper with Scott David and RJ Corday, we connected Rumsfeld's Quadrants, the unknown known unknown. And interestingly, in that famous quote by Rumsfeld, if people look at it, there's he only mentions three of the quadrants. And the unknown known is actually like a hidden quadrant. And we sort of in a preliminary way asked, what did these different situations mean for different knowledge about particular states? So known known includes the metacognitive aspect of knowing not only all the partitioned states, but also the policy. And then in the unknown known, the partitions are known. But what is unknown is actually the policy. You got the atom level description of the car, but you still might not know what policy to take in terms of using the car. And then knowing unknown is basically the bounded rationality bounded at the perimeter of the particulate entity because the internal and blanket states are known as well as potentially even what to do. But the external state has high uncertainty. And then this unknown unknown, which is that radical uncertainty might include like not knowing what's out there or how to pursue it. And there might be different mappings too. But that was just coming to mind with this question about uncertainty about states of the world and then uncertainty of states that haven't been observed before. Let's return to the roadmap on our road trip. And is there a section that somebody feels like plays well here or section that we haven't been able to cover any of the figures or formalisms of? So I don't think we've covered any of the figures except for figure one. Maybe we want to look into the figure two autonomous flow manifold flow. Great. So yes, we did spend a lot of time talking about the particular partition in this discussion and in many other discussions and it's really important. It's the particular physics, but it would be awesome to just spend a few minutes on each of the subsequent figures. What is being shown? What do the axes or formalisms represent? And what are the implications of the figure? So here we have figure two or Karl or Thomas, what is being shown in figure two? I suppose there are a couple of things being shown in figure two. I think the graphic on the left with the spiral is quite a nice way of demonstrating this idea of there being some solenoid element to the flow, which is causing it to have this circular structure that then as a consequence of having the dissipative part of the flow, this gamma term, so the difference between the cube being the solenoid and the gamma being the dissipative part of this flow, that then reduces the amplitude of the spiral over time and you end up with this settling to some point. So it's just setting out the two different directions really that things can evolve in. And I'm trying to remember exactly what the equations were in the paper that led to the expression in terms of the alpha and the bold alpha. How can they remember that better? Karl, was this a least action path that we were talking about here in terms of the bold alpha? Is this the settling to least action? Yeah, and it's an analysis which people might come across when they look at things like the center manifold theorem or indeed the slaving principle that Daniel was just talking about from Synergetics. Here you've got these sort of fast fluctuations that are drawn to a center manifold. In our case, the center manifold or the manifold that sort of draws trajectories to it are the paths of least action that constitute in the joint space of the external and internal or autonomous states this synchronization manifold. But in this instance, not synchronization between people, but synchronization between the inside and the outside. Great. And how about the right side? The dark line on the right hand side was meant to be the synchronization manifold to which all trajectories are drawn and largely lie upon that synchronization manifold. Any perturbations sort of collapse back to or drawn back to that manifold in the spirit of the left hand side of the diagram. So using words like manifold is to emphasize this is very much related to the center manifold theorem. It also speaks to one way of describing generalized synchronization of chaos or generalized synchrony which is collapse to or constrained flows on a synchronization manifold. So this is just one way of graphically illustrating the... Awesome. Any other thoughts on too? Where's the housekeeping term, the lambda? Because there's the Q and the gamma. But as we've discussed in live stream 32, there's another term there too. So I think that the Q and the gamma here are sort of implicitly being treated as constants so that they're not varying over state space in this particular graphic. And so the housekeeping term will only appear whether it's some variation depending upon where you are in state space. In principle, you could put it back there by allowing those slightly more complicated trajectories with more interesting sort of nodal flows. I think normally we tend to just assume that the space is defined such that the gamma term can be treated as effectively a diagonal or even an identity matrix or at least a block diagonal type matrix. That gets a little bit more complicated when working in generalized coordinates of motion because of the structure of the blocks how to account for the correlations between different orders of motion but doesn't necessarily have to vary depending on where you are in space. Awesome. You should feel licensed to ignore the housekeeping term in most of our papers. It's too complicated. Okay. On to three. So three was labeled as Markov blankets and self evidencing. What is being shown in three? I'll do this briefly then one of you can take over. It was just this slide is usually used to illustrate that the free energy formalism or understanding dynamics as gradient flows on an energy functional is unremarkable in the sense that it has a lot of construct validity in relation to existing and legacy formulations of behavior usually of a normative kind and by normative I just mean that there's some objective function that is being extremized. So it's just meant to illustrate that the other this physics like formulation in terms of dynamics and gradient flows on this free energy functional looks and in a position to formalize lots of existing theories and constructs such as reinforcement learning not control theory provided you now read the objective function as a log preference or the log of your prior prior preferred states and you know I'm just showing that taking different perspectives on normative accounts of behavior is very they're all formally very, very related in the sense, for example, that the negative of value so defined becomes self information. If you're not an economist and you were an information theorist, you would read the same quantity basically as self information or surprise or surprise and you wouldn't talk about expected utility theory you talk about maximum efficiency and information theoretic sense. If you were somebody like Herman Haken and thinking about synergetics you'd be taking the average of the self information or surprise and then these dynamics would be basically the thing you are aspiring to understand and describe from the point of view of synergetics and if you're a physiologist a homeostasis and then of course if you're a statistician or somebody like us who's interested in the pure probability theory then you can interpret these gradient flows as articulating or at least describing things like the Bayesian brain hypothesis. So it was just an attempt to show that there is just one simple dynamic under the hood of all of these well accepted well proven accounts of normative behavior. Why is free energy principle in the blue cell? I think just because it could be described as a principle articulated in terms of information theory where information theory is just probability theory with some geometry. Okay, and while we're on this question of unification, what does the cover of the active inference textbook mean? I'm curious to hear what you think. Prediction dictates that blue go first. Isn't it like the one ring to bind them, the one ring to find them and the one ring to rule them all or like that that. No really I've given really it's like at the Lord of the Rings reference except like instead of the inscription there's math. Anyway, there's the funny take. Of the things I see before we hear two thirds of the answer. Yes, there's the Lord of the Rings reference and also it's a blanket and it's a particle and it's the equations that are about partitioning internal from external states. And so the with a minimal amount of text. And the minimal number of threes presented mind brain and behavior par basulo and first in there is the partition column and two minds as to whether we we tell them. I thought it'd be like it's more interesting having these different interpretations. Yeah. Okay. I won't push it. Let's go to figure four. Well, just to say that Markov blanket and the three three threes that's very clever. Also, you know, we didn't realize how clever I think this is Thomas and Giovanni came up with this. But completing the ring with an extra equal signs was quite an insight for me at some point. Can you remember that Thomas was that was like you or Bob who suddenly spotted the idea of making everything equal with a true causality. Yeah. And I think it was Giovanni actually raised in this equal sign. Now, but now we have three other ways of selling this very clever, clever little icon. Yes, I wish we thought of it. Great. Okay. In figure four, it's describing generic and precise particles. So what are being shown here? I'll take this one because I did this one. So this is exactly the trying to depict the distinction we were talking about before when we asked about a taxonomy of different kinds of things. And I was just saying it's useful to make a distinction between very small quantum scale things and large classical scale things where the particles are sufficiently large that the the random fluctuations go away and every particular path is a path of least action. And in this specific case, there are certain sources of uncertainty that disappear that manifest as a collapse of certain uncertainties and that can be captured using what's called an information diagram. So this is an information diagram and it's basically you can understand the areas as uncertainty or entropy. And the left hand set of panels is in the general case that would be suitable for your very small particles showing the shared and unique information between the a partition into external sensory and for simplicity autonomous states here. And the areas of intersection can be read as a mutual information. So what you can see on the right hand side is that there is uncertainty still in the game even when I know everything else. So there is a unique uncertainty associated with every one of the states. But if I now move to precise particles or particles that behave as if they were more classical in nature. There are a lot of uncertainty disappears in the sense that the only source of random fluctuations is on the paths or the dynamics of the external states, which means that if I knew the external and autonomous states, then I would be able to predict precisely the path of least action or the path that a sensory state would take in the absence of any random fluctuations. So the unique uncertainty associated with the sensory states S disappears. Similarly, if I knew the external states and the sensory states, then there is no uncertainty about the path of autonomous states. So the unique pink part associated with alpha, the autonomous states disappears. And that has the interesting implication that the uncertainty about that both the sensory and active state autonomous states has to be the same. If the if you knew the external states here denoted by eater and other things. So for example, there is no uncertainty about active states given sensory states, which means that the active states on the right hand side. Have to lie within the sensory states and you may be asking, well, why is this useful? Well, it becomes incredibly useful when you apply those constraints, those things like the uncertainty about both autonomous and sensory states, given external states have to be the same, for example. And that there is no uncertain the uncertainty about external states given sensory states is the same as the uncertainty of external states given both sensory and autonomous states. For example, all of these little sort of like constraints that ensue from the simple new from big from small to big particles, get into an expression for the probability distribution over autonomous paths. And it is this, the set of constraints that lead to a functional form for the expected free energy, which can be decomposed into what we were talking about before, which is expected information gain and the expected cost. In the absence of those constraints that expected information gain would disappear, the things would cancel out in the general case. So this is, if you like, a visualizable way of looking at the information decomposition among the particular partition in the context of the general case and in the context of precise cases to sort of visualize the constraints that lead to curious behavior. So if you remember, it is expected information gain that describes apparent curious behavior because actions become more likely if they increase expected or have a high expected information gain or resolve uncertainty or minimize entropy of the conditional sort. So this is, if you like, a visual aid memoir or an unpacking of what can be quite dense and unintuitive expressions written out mathematically in terms of information theories and expected lograngians or surprises and mutual information. I say that for Thomas's benefit because we've both been working on trying to get the simplest way of expressing these important constraints intuitively just using equations and it's not easy. And the reason it's not easy is that in these information diagrams, when there is no uncertainty, when the uncertainty reaches, the entropy reaches its lower bound, the area disappears at which point for discrete systems, the entropy is zero. So when there's no uncertainty, area is equal to zero, H, the uncertainty, the entropy is equal to zero. That's how it should be. The problem is when Shannon wrote down entropy for continuous systems that no longer holds and you get negative areas. So there is no lower bound on the entropy of a continuous differential entropy. So you really have to go around the houses to recover an area that disappears when in fact it's lower bounded by minus infinity. So that particular bit just was for Thomas because I've just emailed him today about that particular. And James saw this problem with Shannon's differential entropy decades and decades ago and try to repair it using something called continuous entropy and limited density of discrete points. So all of that worry and all those delicate issues just disappear if you just stick with these nice information diagrams where you know that if the area isn't there, it shrunk to zero. Awesome. It's like you could be 110% sure about something. Let's go to figure five. What is happening in figure five? Well, I think figure five is showing more or less what we've seen on the previous figures but now just interpreted in the context of neuroscience. We've got the same action perception graphics we were talking about in figure two or the same equations where the free energy has been interpreted in terms of accuracy and complexity components. So that brings us back to what we talked about in terms of interpretation of free energy as a means of estimating marginal likelihood or approximating marginal likelihood and a Bayesian model evidence and the models should be both as simple as possible while accurately accounting for some data. And I think that's here really to emphasize that interpretation of self evidence and that we've spoken about for the idea that by minimizing your free energy, your free energy, you were implicitly maximizing the evidence for some internal model. And by putting this explicitly in terms of the brain, we're now talking about the kind of model that the brain might implicitly hold about the outside world and the kind of models that might be required. Sentient, but appearing behavior. And again, a nice ring to suggest a mock of blankets and put it explicitly. Many questions for another time about the order of the action perception loop and whether perception precedes action. In principle, or in different computational implementations, how different aspects of the timing of the execution of different stages might converge upon similar outcomes and might not just noting it in figure six. We have sentient behavior and action observation. So what is being shown with the brain and with these three graphics beneath. So this is a nice example of what's actually a slightly older version of active inference now and is a slightly different implementation than what we might use. But I think it's a nice example in that it highlights the transition from behavior in terms of continuous differential equations through to something that's a bit more sequential. And behaves in the way that we might actually think about plans as being one move and then the next move and then the thing after that and an implicit discretization in time. And so that the the the plot shown in sort of the upper left where we've got a series of lines that are kind of overlapping red line in bold. Exactly that one there is showing what is more or less a sort of predator prey like relationship that creates these sorts of sequences. So you can imagine this that you have a alternation between a predator and and some sort of prey, whereas the prey population increases in size. The predator population will also increase in size behind that as the predators have more to feed on, which will in turn cause the prey population to drop, followed again by the predator population dropping as food resources become more scarce. And this is known as, as you've written here, winnerless competition or as a lot to a full terror type dynamic. And the generalization of this you can have many such sequences where one thing follows the other and you get sequential set of peaks in what's going on here. Now if we would interpret each of those lines as representing the probability or the log probability or some some mapping to the probability of one of several alternative discrete states, categorical states. So here imagine we're talking about blue, blue, green or red. Then we end up generating a sequence of blue screens, reds depending upon the underlying dynamics. So have a continuous system that's now has the feel of something sequential. And what's happened here in the graphic on top of the brain here is to associate each of those discrete categories with different points in space. So we're now saying that if we're looking at the blue point, then that's going to be associated with this sort of upper left location in space or sorry there's a light blue as well bad choice of color to choose my part. And so each of those sequences is then associated with the generative model that says, I expect that when the blue state is present or the red state is present, that my hand will be drawn towards one of those states. And I predict that it's as if there is some sort of spring attached to from my hand to where that location is. Which means that as I as I infer this sequence I'm sequentially drawn to or I believe I'm being drawn to each of these locations in turn. Now if I'm predicting if my if my free energy or my modern likelihood says that the most likely states are the ones that would occur if my arm were really being drawn in this direction. The actions that then minimize the free energy are those that fulfill those predictions. So all I have to do is predict when the kind of proprioceptive or visual data might get based on where my hand is at any one point in time or predict where the data that I would obtain if my hand really were being drawn in this direction. And the resulting action fulfills exactly that and creates this sequence of movements. And so what we get out is a relatively complex motor trajectory that has an almost autonomous type feel to it and simply by predicting that I'm going to be drawn to each of these points in sequence. And this has been used in this particular example as a way of simulating handwriting or something that appears to be like handwriting simply by moving the piece of paper underneath underneath where my hand is moving so that I end up with this sort of repeated pattern. And you could imagine that in real writing that we may implicitly have a series of points that we would associate with different bits of the letters that would effectively draw our hand around those letters as we write them. And categorical shifts at the letter scale at the word scale and so on could be modeled in a in a hierarchical way. Absolutely. And that sort of exactly the point in terms of this move from this formulation of active inference that used to be used now to explicitly writing down a series of discrete categorical states. And as you say putting that together in a hierarchical model where you can now mix predictions of a discrete categorical sort with short trajectories in a continuous state space. Awesome. We'll complete our lightning tour of the last three figures. So here's figure seven. Just for reference we looked at figure three. We saw some faces and names and now we're seeing figure seven. What is being shown here. So here I think we're dealing with what what is a limitation of what we saw on the previous slide. And so when we were thinking about these fairly complex trajectories that had an almost autonomous feel wiped to them. It's still only a relatively simple sort of behavior implicitly it's just dealing with the replication of some potentially quite complex or even arbitrary pattern. But to become really autonomous to behave in an intelligent like way. Really what we need is to be able to select between alternative sequences and alternative patterns. And what this slide is showing us is this idea that one way or the way that we appear to evaluate different trajectories or the probability of pursuing those trajectories is in terms of this expected free energy function, which can be decomposed in a similar way to what we expect of the free energy functional. So we can see in blue the free energy functional being decomposed into our complexity and accuracy terms and various other decompositions. But in the in the upper panel, we're now looking at the decomposition of the expected free energy into into analogous type terms and we've spoken about some of those already. So here the intrinsic value is equivalent to the information game we were speaking about before. Whereas the extrinsic value is more like our expected utility or the degree to which we prefer a particular outcome. And just as on the previous slide each of the faces relates to different ways that that can be interpreted under specific circumstances where particular parts of that expression are irrelevant or don't matter. So in the absence of ambiguity, we end up with behaviors that are just entirely risk driven. And so we might get back to things like kale control type formulations. Any notes or we'll go to eight. Here we are in eight. I thought for a second that we'd seen this before, but it's a subtly different reworking of one of the previous figures. Isn't it with them? Here's five with the expected free energy. Yeah, correct. F in five and then just an A path. And now we have an eight G in the top. I suppose this is the move that we were talking about before in terms of when the active states are effectively hidden from the internal states and become something that are themselves inferred. So you can see the absence of any arrow connecting a to the variational free energy box, which is implicitly saying that the action itself is something that we have to draw inferences about. And the expected free energy is the way in which we can do that. It's inference on action about action about the causes of action. Just many ways that have been highlighted for how inference and action are weaving together. And different model architectures that might be more didactic, more illustrative, more pragmatic for a given context or a given type of thing that has a certain sparsity to its coupling. And to close on figure nine and then we can have last thoughts. What is shown in figure nine? Well, so figure nine is a nice example of the application of this information seeking aspect of the expected free energy and to a task in which there is uncertainty about the orientation of the face that's presented to the active inference scheme. And the alternative trajectory is the alternative parts of autonomous states that can be selected here are the alternative eye movements or saccades that can be performed. And so choosing the evaluating the probability of each saccades depends upon evaluating the potential information gain associated with those, which is effective the information gain I get in terms of the eventual orientation of the face. I think in practice this, if I remember rightly that this formulation focused very much on the ambiguity term and I think has subsequently been updated in various ways. But again, it kind of focused on the continuous schemes and provides a nice early example of how information gain is worked into the active inference type formulas and so we tend to use practically. Awesome. In our final few minutes, we look at the questions that we've looked at the end of every live stream. And it would be awesome to hear where things go from here from this recent preprint from the textbook. What are some of the areas that any of you are interested in or that are now possible to head in. I'll go first, then I don't have to go last again. Difficult question to answer. There are so many ways that you can take this. I think it's gone beyond the capacity of any single group or person to really pursue all the directions. So nearly every field that we've mentioned, you know, is probably the focus of at least one of your several groups around the world. Small, embryonic groups, but certainly working earnestly on the different levels of application. Technically, what Thomas and I know that's not quite true. So one thing that Thomas and I have been doing is trying to write the next simpler paper. So the free energy principle made simpler way wasn't that simple in the end. So the idea is now to work towards a four page paper that has everything you need to know to understand the free energy principle. The most recent one is down to about eight pages. And what it does is it just drills down on the path integral formulation. And in particular, the simplifications afforded by not worrying about non-equilibrium steady state densities and just looking at the densities over paths and just taking away all the unnecessary equations and just drilling down on what matters. And to me, what really matters is in the last sentence here and implicit in the last figure that you were discussing, which is curiosity, which I think is the key thing that distinguishes current AI from AGI and, you know, true intelligence and understanding, resolving uncertainty about your sense making understanding of a world and understanding the mathematical imperatives that describe that kind of behavior. So this is our end where I started, which is all about curiosity. Excellent. Thank you. Thomas? As Carl says, it is a really difficult series of questions to answer. I think there are always more interesting directions to go in than there is time to perceive them, which is why, again, as Carl says, it's important that there are so many groups out there who are taking an interest in focusing in on these issues. Certainly the idea of making it simpler, I think, is an important direction to going. I think it's also, for me, one of the interesting things is also thinking about application and the setting of disease processes and trying to understand how different disease processes can be framed in terms of alternative priors, just as a way of, I hope, at some point providing useful clinical characterizations of different kinds of pathology, different kinds of trajectories and different kinds of study states, and understanding how different therapeutic options might change the path through some space of prior beliefs at various different temporal and spatial scales as we've spoken about. So, for me, I think that's one of the areas I'm particularly interested in seeing how things develop. Thank you. Blue? As I was saying the other day, I'm really starting to think of active inference the way that, like, in every situation and in every circumstance that I find myself in, I find myself wondering how can I formulate this in an active inference model? And I did the same thing when I was in art school and learning how to draw in a very realistic way. I was learning to draw people and objects in the way that I saw them. And now I'm looking at situations that I find myself in and, you know, where before I would be in conversation with someone looking at their ear lobe, like wondering how can I best represent that on a piece of paper. Now I find myself in situations like wondering how would this fit into the active inference framework? So I think that there are many, the horizon is very broad and there are many avenues where we can tread. The art and science of curiosity. So it's an excellent conclusion. Carl and Thomas, thank you for joining these discussions. And everyone who is listening, thank you. And we hope to see you in Act In Flab or in any other fractal aspect of what we're all doing. So thanks again, everybody. Thanks, Blue. And see you in the next discussion. Bye.