 All right, it's January 28th, 2022, and we're here in Active Flab Livestream number 37.0, discussing the paper, Free Energy, a user's guide. Welcome to the Active Flab. We are a participatory online lab that is communicating, learning, and practicing applied active inference. You can find us at the links here. This is recorded in an archived Livestream, so please provide us with feedback so we can improve our work. All backgrounds and perspectives are welcome here and we'll be following good etiquette on Livestream. Go to ActiveInference.org if you wanna learn more about anything happening in Active Flab. All right, so today in 37.0, we are going to learn and discuss and introduce and provide some background for this paper, Free Energy, a user's guide by Stephen Francis Mann, Ross Payne, and Michael Kirchoff. And just like all the dot zeros and discussions, we're just gonna introduce some ideas, give a little tip of the iceberg, and kind of warm up to talk about this paper in the coming two weeks. And so today we'll be going through a lot of the background and some of the key formalisms in the paper. And the coming two weeks we'll be discussing the paper, so let us know if you wanna join. So first, there's this comment in red. Is this yours, Dean? Yeah. Do you wanna go for it here? Pardon me? You wanna describe it here or in your intro? Yeah, in the intro. All right, well, we'll each just go around and say hello and then I think we can just say one reason why we're excited to discuss it. I'm Daniel, I'm in California and was just enticed by the title short and direct. And I think it has some great solidifications and perspectives and implications on multiple levels. So how about Stephen? Hello, I'm Stephen, I'm based in Toronto. Yeah, I was interested by another sort of way to surmise the free energy principle and active imprints type approaches in relation to the philosophy of biology and cognitive science. So I thought that was interesting to see how that pitch was being made. I hope it's Dean. Hi, I'm Dean, I'm in Calgary and anytime I see the word guide, I'm always curious because of my background in setting up programming where people were going into novel situations. So the authors kind of had me at that word and then I wanted to sort of learn more of what they had to say. What are some key features of guides or things that guides should keep in mind before we head off on this journey? Yeah, well, for me, I've sort of adopted the term wayfinding kind of as a blanket, not just a markup, but as a way of maybe describing what that path could turn out to be without necessarily constraining people. So, I'm sure there are other descriptions and definitions of the word guide, but that's the one I tend to gravity towards. All right, Stephen? Yeah, and there's also what kind of guide we end up inhabiting. So, you know, this biological grounding, it feels like there's a desire to get into our feet and into the soil somehow, as well as the cognitive science. So I feel it's a little bit like I've been stretched up and pushed down into the ground. So there's something there. Okay, so one big question, probably not the only big question that could be raised, but just the question that approached us to this paper in some way is, how can we learn and apply active inference? And then, Dean, do you wanna talk about what you wrote here? Well, I just like to kind of read it out, I don't. Yes. Okay, so assuming that making moves, which we could describe as behavior now, has effects on what happens next, and that those moves are at least loosely based or inferred on evident, and the authors use the terms, probabilities, predictions, and fitness, so that's in the paper, then how might we begin to look at energy applied? Okay, in calculating out, our best guess is measurable as or of, what we think or believe will happen next. Step on that road to prediction matter expertise. Now, prediction matter expertise is a term I coined and brought up with Dr. Friston. But I think it might, if we're guiding, we might think about that in the background. The other part was they talk about models and in our last live stream set, that's all we were doing was talking about modeling ourselves. So it seems like a natural extension now to talk about models, models as inference and action models, as selection models, as extenders, IE tools, and then how does a model curate, curated, run down, help us get outside the guide when using energy in ways beyond guide explication and interpretation. Do we need a user's guide for the user's guide, which was something that I used to ask all the time, the kids that were participating in the program and that I was setting up, because they were going out with their background and they had sponsors who had a different background and we often wondered whether or not they would have to give a user's guide to somebody who is necessarily more experienced with the field but not necessarily experienced with bringing a young person along. So that's why I think that when we're talking about how can we learn and apply inference in this guide theme, those are things we might wanna keep sort of at least partially in our thinking. Thanks a lot for sharing that, Dean. A lot to say, but that's what the dot one and the dot two and beyond are for. So the paper is Free Energy, a user's guide by Stephen Francis Mann, Ross Payne and Michael Kirchhoff. It's from the last days of 21, I believe, and it is on the Filsai archive. Okay, so here are some of the aims and the claims of the paper. Over the past 15 years, a novel explanatory framework spearheaded by Carl Fursten has inspired both excitement and confusion of philosophy of biology and cognitive science. Active Inference, whose most famous tenant is the Free Energy Principle, purports to unify explanations in biology and cognitive science under a single class of mathematical models. So not math under biology, but biology under math. There are broadly three reasons why the Active Inference framework is difficult to understand in the author's words. First, the mathematics are unfamiliar to many philosophers and even to biologists and cognitive scientists and mathematicians and all other kinds of peoples. Second, the framework was developed rapidly by a small but dedicated group of researchers limiting its accessibility while expanding its scope. Hashtag recent history. Third, the framework makes claims across both mathematical and empirical domains and the dialectical relationships between these are unclear. So what is the math bio dialectic? Who's nested within who? Yes, Stephen. Yeah, I think this, the way these three kind of points are made is useful for seeing the rationale for this paper. I think there's seen as a desire to come at these questions as if it was coming from people less steeped in all the math. Cause previous papers might have come out which had had an awful lot of particular information that comes from this dedicated group of researchers. I think by trying to speak about it from the perspective of people arriving, it gives those core researchers, which I know for sure Michael Kerchoff has been part of, a chance to clarify the sort of framings or misconceptions that might end up being placed in other papers by philosophers and I think has become a little bit of a vicious cycle of a distraction at times. And I think they may be trying to reclaim that ground. So that's just my own strategic guess, but it looks likely that might be why this paper's been put together. Well, great and helpful feedback, thank you. Then the last parts are here, we attempt to redress the situation by targeting each source of potential confusion. So they're gonna identify three frequently asked questions, frequently raised concerns, underpinning frictions, and then redress them. And their aim is overall, we aim to increase philosophical understanding of active inference so that it may be more readily evaluated and maybe we can even add applied and developed. Okay, so what other of you like to just read the abstract? I'll go for a dead. Okay. Over the past 15 years, an ambitious explanatory framework has been proposed to unify the explanations across biology and cognitive science. Active inference, who's most famous tenant is the free energy principle. As inspired excitement and confusion and equal measure, I guess it has. Here we lay the groundwork for proper critical analysis of active inference in three ways. First, we give simplified versions of its core mathematical models. Second, we outline the historical development of active inference and its relationship to other theoretical approaches. And third, we describe three different kinds of claim, labeled mathematical empirical in general, routinely made by proponents of the framework and suggest dialectical links between overall way to increase philosophical understanding of active inference so that it may be more readily evaluated. Cool, so nice and funny abstract, we all laughed. So the roadmap, how did they structure this guide? Is it meant to be read linearly or do you flip to a different page? I don't know, but we have added the page numbers on this slide, it's on the bottom right, but it'll be on the bottom left and that's what page we are in the guide. As if it were in paper in our hands. And it starts with an overview and then there's a discussion on first the inference of perception or observations in hidden states and then action comes into the loop. Section three gives a brief history of the free energy principle and section four discusses the aforementioned dialectic of free energy and its rhetorical ecosystem, we could say. That's the part we're not going to focus on in this dot zero but hopefully we're gonna be able to go into each of these really important troikas in the dot one and two. Then there are some concluding remarks. Okay, so first, how is the word model being used here? It's really good that the authors included this warning because we hear about modeling all the time, not just the photography kind, the mathematical kind. So they write, let us begin with a warning. The word model takes on two distinct senses throughout our discussion. The sense more familiar to philosophers is what we call a scientific model, a representation of some possible or actual system which a scientist uses to reason about or discovers features of that system and related systems. By contrast, in the active inference literature a narrower sense is typically meant, what we will call a generative model. That's what we're gonna be specifically talking about. This is a mathematical object with applications and statistics and various sciences. Our simplified models of the free energy principle are scientific models. So models is a big category but we're gonna be talking about kind of a narrow sense of statistical models like linear regression models but instead of a linear regression it's a different kind of model. Not at the exact same elevation of the linear model or taxonomy but like in that category. And then note further that some scholars opt for a deflationary stance on generative models using them only to describe the dynamics of agents. So descriptive modeling. It is an open question whether this kind of model building precludes any form of scientific realism about the relation between the model and the target system. These issues are discussed in section four and also have come up several times including with most recently Majeed Benny but model-based science and what does it mean to use and make models and maps and territories and all of that. So any thoughts on models? Because that's gonna be one of the key terms going forward. What would either of you say would be the best way in layman's terms to show the difference between what they mean when they talk about an empirical model and what they talk about in terms of a generative model. What sort of real physical material things could we use to make clear the difference? Because in the last stream we did a good job of taking columns and showing how they were discreet. How would you guys describe this? Yes, Stephen. I think one of the ways this distinction comes about is the difference between most systems type approaches and nonlinear dynamical systems. Generative models are tendens to be coming from a sort of a swarming. It's like a swarm of bees creating and evolving as compared to something which has been given a perspective and drawn out somehow in our environment as in a scientific model. Now, the fact that generative models are being applied in the kind of scientific way, inverted commas, or are they being used in a more instrumental stroke philosophical way and they can never be fully science is another question. But I think the difference there between this kind of biological generative sort of process and something which can be given a perspective on and drawn out as a defined type of model is where there's a kind of a clear difference. I don't know how others see that but I think that that also might be useful. All right, Dean, here be my answer would be two that are on the extreme end of the yellow type. Scientific model would be like the standard model in physics or a working model of renal function, like kidney function. So it's like very transdisciplinary and includes a huge number of domains and kind of like logical motifs and has axioms in a specific history and software toolkits and all of that. And then we're gonna be talking about like a kind of model that's basically a parametric statistical model. So like a Gaussian model takes two parameters the mean and the variance and that can be written analytically like with equations or can be done computationally but it's only ever gonna take those two parameters and it only ever outputs a certain family of distributions. And so it's like parameters in distributions out and the parameters can reflect like the connectedness of the graphical model too but you get parameters in which can include model structure and then some type of output but it's closed ended within that paradigm versus the model of renal function is like this open ended question. And I think you hit on something really critical there it's distributions out, right? So something falls out when we're talking about a generative model whereas when we're talking about something empirical we tend to keep the focus stabilized, right? Like this of the process is fundamentally different. And so again, when we're talking about in sort of layman's terms what the difference would be I think you hit the key word out of that which is something that's in and stays in versus something that may start in one place and fall out in a completely different place with is that an expectation we should have as a difference between these two? Yeah, just one thought then Steven would be both modes are so important. It's like yin and yang or left and right hand or convex and concave. And our discussions do move regime of attention from like the generative model to the bigger question and in some ways it maps onto instrumentalism and realism too but we really need to know when the pendulum is swung one way and when it's swung the other way. And so we can keep an open mind and they're all happening all the time that's both eyes open like you're always saying Dean but we also can't go too hard on just one side. So Steven. Yeah, I think, I love what you said there Daniel about the axioms. I think in generally in science you try and have reproducibility and you have tolerances and you say how much accuracy and what's as tolerances on something but there's a general recapitulation of similar types of results again and again and again and the same sort of thing comes up it has been there in psychiatry and psychology and that's where the idea of syndromes being classified as opposed to symptoms which must be much more fluctuating, more variable. So you could run a generative model and it can come up with different results with the same input and that now there will be some kind of characteristic patterns that generative models do reveal but there is a point where it's quite different and I suppose when that difference I think there's almost paradigm based difference but maybe there is some point where they blur, I don't know. Yes, Dean and then we'll go onto the paper. Yeah, so at the end of the last live stream one of the things comments that one of our colleagues blew me was that we have to be able to hold two things up at once which saying to me because I think that's what you're saying both of you are saying right now. I think if we look at, whether we're looking at a starling and or a murmuration whether we're looking at an ant or a colony we can keep something inbound or we can have something fall out from that because it's basic research like we weren't expecting something repeatable. That's how I would distinguish it but again, let's get onto the paper but I think that's important that we get that established first if we're talking about guides. That was the big epistemic disclaimer and that's what they put in their paper so we allocate our attention as well. Okay, so onto the details of the paper. So in two-two they introduce this simple model of inference with W and X. So this is figure one, there's W for world that's the hidden world state and then there's X the observed data like X marks the data and you can think about like where do you estimate where the sun is in the sky? That's actually unobserved but then there's gonna be all kinds of observed data coming in or we'll talk more about the coming in versus the generative model later but the observed data are what are being modeled as being observed in that sort of sensory way. So figure one is the basic model of inference. The inference problem addressed by the active inference framework. What other kinds of inference problems exist? What problems do or don't have this form? What happens if there's no line with W and X? So of course it can get more elaborate but within this exact format like W might output X and Y and that'd be like getting two numbers back in the computer program instead of or generating two numbers in the computer program rather than just returning one or you can imagine X is like a vector but we're within that specific model framework. We're not in the realism of agent environments we're just in the statistical framework of hidden latent variable observable variable X. The unobservable state is assumed to cause unobserved, assumed to cause observable data and so this is a little bit of a connection with dynamic causal modeling because statistically we call that causing just like the proportion of variance explained by a PCA. It's like given very active terms but it's actually not causal in the mechanistic sense and that of course has been to no end of confusion with how statistical conclusions are interpreted as mechanistic conclusions about the world and so is it cause in the sense that we mean it in one sense or is it associated in a causal model and that's kind of the causal in dynamic causal modeling that's functional connectivity. So that's the realm that we're talking about not within the world environment interaction. Steven. Yeah, this is actually interesting. They've gone back to a pre Markov-Blanket kind of formulaism and it raises that question then of what does it mean to infer and it can be that we infer things because that's what we get data on. So it talks about well, what is it that we can even use to understand and what are the assumptions about what's actually important to read and those two things I think this leaves both of those things open and that actually is a bit of a bridge between the scientific and the generative processes. Cool. So if we're modeling like height is not measured but we observed weight and we're gonna do some regression of like the unobserved and the observed that is the type of model that we're gonna be talking about loosely and someone could say, oh, but this other factor influences too. Yes, that's the realism claim. This actually matters in the real world and it very well may but we're talking about measured models relationships. Steven. Yeah, I think this is also really helpful to be able to look at paradigms because there's a difference between say the paradigm of psychology from the perspective of looking at classification and problem identification and treatment and coaching psychology, which is about not so much trying to diagnose what's there but what sort of actions someone's trying to take and how you might help them with those actions to reach outcomes which may vary moment to moment. And I think, you know, in some ways that this is maybe the level you have to go back to because otherwise the noise just swamps the discussion. It goes deep on both ends that action orientation, the pragmatic turn ecological psychology is gonna connect deep and this is going to be deep but in a sense narrow because we're just talking about statistics here. So take statistics courses and learn it because this is kind of where it comes from here. Okay, onto the example of active catference. That's not actually what they called it but that's the cat example that we're gonna be discussing. We're going to imagine that you have a cat animal that spends its time in either the kitchen or the bedroom. When it's in the kitchen, it often meows for food. When it's in the bedroom, it often purrs loudly. Suppose you tally the proportions of the times your cat is in each place and making each noise. The results might look something like, okay. But the cat also goes somewhere else. Okay, not in the model or this other factor or this other sound, okay. Change the model, add another column. It's totally fine. It's a fork on the GitHub or the discussion but that's the difference between kind of modifying the model structurally and using it parametrically within the narrowest sense of model and then keeping the discussion alive about that broader sense like what Steven is talking about like a model of psychology or a model of complex adaptive systems. So one thing to note is that this idea of setting the prior in a Bayesian sense from observed data is called parametric empirical base. So parameter, parametric, we're setting parameters like frequencies of things happening and it's empirical, which means observed. It doesn't mean like the only truth. It just means the measured values and then it's Bayesian as we'll talk about. So let's just say 10 or 100 or 10,000 measurements were made and they come out to these numbers. 60% of the time they're in the kitchen and 40% the cat is in the bedroom. And so that's like summing across. That's called a marginal probability because it was in the margins of papers and they marginals have to sum to 100% because something has to happen in the model. Oh, but the cat can be in the, yes, but within the model it has to be this way. Then that's what makes a probability. Then 50% of the time there's meowing and purring but as the numbers show the location and the sounds, they have a statistical association. There's some divergence that could be tested for statistical significance depending on the number of observations but that is what the numbers in this table are reflecting. And both columns and rows sum up to a probability. So they're all like proper probabilities. Then how do you infer about this? Like if you were to observe the room, how could you reduce your uncertainty about what sounds you'd hear? And then if you'd hear sounds, how could you reduce your uncertainty about the room? Many philosophers will be familiar with one famous method for solving this problem, Bayesian conditionalization. This method can be stated as a principle saying how an agent using a model P of W and X, W, world state, hidden, X observed data, ought to choose their beliefs Q of W, Q, that's the one we control upon observing data X. What to believe now conditioned on incoming data. Okay, any points on that? Just they did a really good job of explaining it. I'm waiting for the next slide. Okay, please continue here. So from the actual paper, the table describes a joint probability distribution which we just explained where W ranges over possible cat locations out of the kitchen bedroom and X ranges over the possible cat sounds. Again, very clear, straightforward. From the next slide, usually the scientist, not the person who is trying to build a generative model aims to improve the accuracy of a generative model of some real world phenomenon which would mean improving the accuracy of PWX. This learning task is relatively difficult. So I wanted to kind of parse what relatively difficult meant. And I'm not going to read all of that stuff there in red but essentially what I want to kind of state is the authors here said, this is simple. Don't go to that place of high complexity quite yet. Let's just see this for what it is so that what Daniel said 90 seconds ago which was go take a bunch of statistics courses doesn't have a bunch of people running off and screaming into the night. There is a complexity to this but it's not the kind that will swamp you. What they're basically trying to say is let's just slowly work our way into this without all your historical grievances around statistics trapping you before you even really set out set sale. I'll add one note on that. This is like learning how the horse moves in chess or learning how the castle moves in chess. And so it's possible to get super connected to your internal life narrative in the game of chess and people who would have no problem losing and connect for or checkers will feel very engaged emotionally affectively by chess. It just something I've observed empirically. Somebody else might have different priors on that. And so this is a way of just starting with how the pieces move and then as we're finding out playing in a fun we hope ecosystem and playground and sandbox. So we can figure out how the pieces move and then we're gonna connect them and do all kinds of fun stuff. Stephen. Yeah, I think, like you say, having the ability to stay with the pieces do lots of fun stuff and look at the way things are generating with that data is different. For instance, to saying, oh, the meowing cat is an assertive cat and the purring cat is a docile cat which could be this kind of imposed higher order model. And then everything's trying to fit to, is it a docile cat or is it an assertive cat? But it's flattened out all of this. You know, you loses this ability to generate. Professor Helen Longeno has a book called Studying Human Behavior and Aggression in 2010. That's like, once you define oh, there's a significant difference between these two groups or in the count of this behavior per minute and they've been pre-labeled. It passes the modeling in the broad sense into modeling into the narrow sense. We found a difference between the groups. So this is reifying our understanding of those categories. So it's really important. So this is gonna be where the base enters the picture. The posterior, which means afterwards is calculated about how should the distribution P be updated as new information comes in. So just like the last sentence in red, what's it believe now conditioned on incoming data? So now that is gonna be symbolically or graphically described. So probability, the P distribution of world states conditioned on vertical line, new data coming in. And there's Bayes theorem. Other videos and other channels and groups will cover Bayes better and more comprehensively. But here we can suffice to say that first we can write it in words. Like, what's the distribution Q that we control? The probability distribution of us thinking that it's in the kitchen. That is the prior probability of it being in the kitchen conditioned on it, or it's conditioned on it meowing. So how likely is it to be given that it meowed? And then the numbers that we just looked at can be plugged in to that verbal equation and a number is gonna come out. And so we heard meowing. And if we just looked at meowing, like four times in the kitchen, one time in the bedroom. So four to five times it was in the kitchen, not the bedroom. And then here's an equation that does literally that. It just looks at the column and it does the conditional likelihood or the posterior of the world state after the data come in. And then there's, I think, did anyone wanna say anything here? There's this, learning task is difficult. I think we already kind of mentioned that. So yeah. Yeah. And the whole thing is, is that most people, I think I'm not gonna say philosophers. Most people don't automatically go to being able to figure out what the probability is. But I think what these authors are pointing out is there's a way of being able to show that there is a probability in play if you are not certain. And this is how you might split those probabilities out. That's all they're trying to do. Okay. So now we're gonna go from, as we've seen in other papers, the sort of exact Bayesian approach, most similar paper to this in this sense is probably Axol Constance, number 34, with a bacterium. We first looked at the Bayesian bacterium and then we looked at the variational Bayesian bacterium and then in both cases, we looked at how basically sometimes new data could come in and you could still make wrong decisions. Whether you're doing exact Bayes or variational inference, you can still make bad decisions as data come in. So this isn't a Pangolossian paradigm. This is like a statistical tool that's already been detached from the philosophy or at least sort of docked at shore. And then now also we're breaking the assumption that it must go the right way or that it will model real world systems especially. So again, back within the narrow sense. Sometimes the specific distribution of the world states conditioned on data coming in is intractable so that distribution cannot be calculated. This usually happens when the state space is continuous rather than discrete which came up on model stream number 5.1. If there's only two decisions left and right, all you have to do is do two calculations and then compare which one is preferable. Whereas if there's a steering wheel or a trim tab, then there's an open-ended number because 87.1 degrees and 87.11 degrees might be very different in their long-term consequences. So continuous state spaces are very challenging from a sensory as well as from an action perspective. In these cases, what is needed is a way to choose Q of W, the distribution we control on the world states to make it close to P of W conditioned on X. So P of world states conditioned on data is the hard one to compute exactly and this is going to be choosing a distribution that gets close to that other one. When the problem is formulated by statisticians, we usually begin with a family of possible distributions Q and search for the member of that family which lies as close to P W conditioned on X as possible. So just like a linear model, it's like conditioned on this being a linear model, we're gonna find the best linear model or conditioned on the quadratic form, we're gonna find the best quadratic form. This is like a little bit more general, conditioned on the family that Q is, we're going to do basically linear regression fitting, but not exactly linear regression. How do you make those distributions converge or align as best as possible? We can do this also given that a structure has been chosen. This is the fitting part, this is not the structure learning unless the parameter's about structure. We can do this indirectly by using a measure of inaccuracy. Active inference employs a measure of inaccuracy called variational free energy labeled F. Because it is a measure of inaccuracy, smaller values are better than larger values. So it's like Frisbee golf and the lower the value is, the lower the difference is between the actual P that you would have calculated perfectly and Q the way simpler, lower dimensional model that you can control and can save in RAM. So that's described here. Variational free energy captures two sources of inaccuracy that we're going to go into in the next slides in belief and dictates how they ought to be traded off against one another. The two sources which we're gonna explore are overfitting and failing to explain the data. So those will be introduced and discussed soon. Stephen? I think this also, if we're just to connect it to that scientific standard models and this type of work is, this is looking at contextuality. It's like what's based on where the starting point is that the probabilities are being picked up on. You start to roll out the statistical results. What sort of meowing is actually happening. And as opposed to a model where you're modeling car engine and you wanna know what energy it's gonna give out at a particular moment. It's gonna do that whether I'm watching it or not. It's just the same basic model applies independent of the observer. So I think that's also useful here. Yes, like getting the model to this stage like depersonalizes it in a sense because that model can be just transferred and used in another instrument. Okay, so here's those two sources of inaccuracy that they described. The first that we're gonna discuss is overfitting. The cost of overfitting, they write, can therefore be measured by checking how far q, the one we control on world states diverges from p. The first term of F, free energy is a measure of that kind. This expression too is also called relative entropy or pullback library or the KL divergence. So this is like the first half of the two-part equation that will constitute F. And it looks like this graphically, symbolically but here's a kind of cool way to think about it. If you were trying to fit a single hump into this two-hump model in the empirical distribution. So you're fitting a family that is gonna get sort of coerced into one of two extremes. Either it's going to end up fitting one very closely like the higher one, if it's like mode seeking and it will have no probability density, the blue line onto the other hump or it'll conflate them in a sense into two kind of subpopulations blurred together and have a solution basically that kind of goes between them. KL divergence is a way to fit distributions optimally given this kind of challenge which exists for the one-hump fitting two-hump. It exists for the two-hump fitting three. It exists for the one fitting 50. It's a general statistical problem and KL is a method that helps bring that blue line as well as it can on a trade-off frontier into alignment with the black one. Okay, Stephen? Yeah, I suppose this also gets into the reason when this is useful is when you're in complex, non-equilibrium situations where things are fluid, then you're gonna hit these situations and overfitting is quite common. Social science has been one of the areas where we see these problems. Where it's equilibrium-based and it's fairly clear, then you don't really have, it's not really as applicable. So it sort of speaks to where this type of approach is applicable, i.e. where there's this kind of fluid ambiguity in the situation. Thank you. How about the second source of inaccuracy? That is failing to explain the data. Mathematically, they write, explaining the data means assigning high probability to events W that make the probability of X high. The penalty for failing to explain data is captured by the second term of F, which looks like this. Higher terms of the actual value are matched with high values on the distribution we control Q to keep this term low. So this is kind of like if it were a linear regression and all the data points were lined up and the line just went right through them, then it would be doing really well on explaining the data and that would be given a low score here. Whereas if that weren't the case, it'd be flipped. Dean? So when I was doing my work, I didn't have access or maybe I wasn't looking hard enough for the KL divergence. So I can remember even on here in some of these live streams saying, I had no idea what that even means. I have a better idea now. But one of the metaphors that I tried to use was in terms of avoiding overfitting or failing to be able to explain the data was, so all those horizontal and vertical lines creating a mesh acting as a filter, how far apart do we set those? What gauge do we set our mesh for in terms of what we want to have stay above the mesh and what's passing through? Now again, maybe that's not sophisticated enough for a lot of philosophers, but to get the basic point down, that's the kind of example I had to provide to be able to give people some sense, not just that these lines are now rigid and the 20s on this percent and the 40 is on this percent and never shall the two pass again, but the lines actually move. Thanks a lot. So you're so right about that. The example with the active catference and the variational catference is like discrete by discrete is four quadrants. So in some ways that's like the simplest model. It could be two continuous variables. It could be like the volume and then the cat's position on the x-axis in a room. Now imagine if it was the x and the y was continuous or something like that, but just even two continuous variables and you're totally pointing out where it's like you're gonna put the points around, whether you put them right in the middle of the four quadrants or whether it's more scatter shot or something in between and then there's two parameters. There's like a linear regression through the points and then there's how fine the mesh is and you're observing like pixel densities and doing a regression through the pixel densities. And so you could have a super continuous situation. It is a continuous variable inside the bedroom, but then in the model we just looked at, it's in one quadrant. And we talked about that when we talked about Serval's paper and how it was just the park and the cafe. And yes, there's locations within the park, but they weren't within our model. So we didn't deny the reality of that physically. We just were modeling statistically something specific. Okay, Stephen? I mean, this, the filtering could also be relating to the sort of temporal sampling rate because if you're measuring every millisecond, if it's meowing or not, it becomes pointless. There's a kind of a making sense rate at which the entropy is being converged at. And as we know in the brain, we seem to have multiple levels of that. Okay, awesome. Next two ones we're gonna go through just kind of quickly and specifically. So let's put those two terms together. This is the overfitting and the failing to explain the data. Variational free energy F, the non-metaphysical version, is the sum of the penalties for overfitting and failing to explain the data. It's those two terms that we just described added together so that the best explanation in a sense is the lowest on both of them. It's not overfit, it fits it well and it doesn't do it any more than that. So simple as possible, but no simpler or any other number of quotes. That's kind of this equations. It's not the only way it could be written. It's not the only thing that fulfills that process but it's something that can be used and it's very tractable. As it turns out, this free energy has tractable computation, 13, page 14. Page 14. So now it's possible to actually use that tractable approximation and do decision-making. And here is where they have the decision-making. Here, argmin means choose the distribution q that makes the following term as small as possible. So this is where F goes from kind of being like a set of pipes with nothing really running through it to a specific finite set of tested alternatives. Those are affordances in the case of action selection. They're discreet in the case of a discreet inference like is it in the bedroom or is it in the kitchen? It's continuous in the case of a continuous inference problem, like how bright is it outside on a continuous scale? But even for continuous things, sometimes we just do one through 10 or one through 100 or one through 1,024. So discreet state spaces are really important even if there is a continuity to the world. So those are the kinds of computations that variational inference helps perform. It takes something that's descriptive and moves it into a decision-making imperative. Just like the L2 norm or least squares is like a decision-making imperative for linear aggression. This is like an imperative for model fitting in this framework. The free energy principle, the form here inference is the same as that of Bayesian principle discussed earlier. In both cases, you perform a calculation and set Q of W equal to the resulting value. The difference is that Bayesian principle counsels a direct calculation via Bayes theorem. In contrast, free energy principle counsels what might be called an indirect calculation. Happily in practice, this can be done by trial and improvement rather than trial and error. So that's Axel constant bacteria, the Bayesian and the variational bacterium combined with going to the bottom of the bowl on the smooth gradient descent landscape with a straight arrow utility and the solenoidal epistemic component. Various algorithms for finding Q, the distribution we control are available depending on the details of the generative model, classic citation that gets referred to a lot. One of the developments that prefigured active inference was the implementation of such an algorithm in a neural network, first in 2005, old citation, classic. We're seeing that again in Model Stream 5, in Alec Chance's work and of many others that this format even long before active inference brought it onto the scene in our little side stage using neural networks to fit variational Bayesian methods was a common technique in machine learning. Stephen? So when we say trial and improvement rather than trial and error it's basically saying that the error can be used and utilized in an ongoing way. It's not like you have to start from scratch and come up with another attempt from scratch. You're using the same basic architecture and just keep running. It's so true. I never really thought how demoralizing trial and error was. Like how should we get stronger? Oh, well you're gonna try and then you're gonna fail. When error is perceived as a failure naively but trial and improvement would be like we're gonna try and we're gonna improve. So yes, of course error is implicit in that, Dean. Yeah, and I'm glad you used the bowl analogy because one of the things that I read and I was actually the one that highlighted trial and improvement thing because, for example, if you had the bowl filled with water and a ping pong ball floating on top and you had to drill a hole because you wanted the ping pong ball to settle at the bottom of the bowl at a certain time you could continuously improve by adding more holes of a certain diameter until you were able to get that flow rate and have that ping pong ball arrive at the bottom of that bowl at the particular time, at the discrete time that you were looking for. That's not an error thing. That's just, we'll continue to get a little bit closer and closer, we'll figure out what the dosage is. And I agree with you. A lot of people think that the basic research part of it is, oh, I failed. No, a lot of this stuff is I test it and then I test it again and then I, oh, I came really, really close. So now the difference is really low, which is what these authors are talking. All right, I'm just gonna continue on because we do have a ton more to do. So that's still on the perception side. We're not talking about where we're gonna go walking. We're talking about the sound of the cat and where the cat is located in the inference. And so we're still within this empirical variational catference area. This is the bowl. This is fitting that distribution, the con joint distribution that's the free energy distribution. And that is the one that's bowl-like. So here is the variational free energy at the black line. And it's basically the composition of two factors, penalty for overfitting and the penalty for failing to explain the data. And each of those have a certain distribution underneath them in this setting. So their combination is like f plus g of x. It's like h of x. It's just adding functions together. So it's just another cost-fitting function for the really specific kinds of models that we're talking about here. Let's bring in action. So now it's still the same case of w and x going to the agent. Now, notice that the agent wasn't drawn in the previous model. It was just w to x. So that was probably a graphical as well as somewhat of a conceptual simplification. Like, I mean, these measurements, it shouldn't matter who's observing them, right? Oh, wait, quantum, it does. But now the agent can also take action, z. So z is going to be the whole question of control theory and cybernetics and action. The previous section dealt with inference rule. How to choose Q of world states. Beliefs on world states. This section is about acting. Now, suppose you can perform an action, z, that will place the cat in one of two rooms. By changing the hidden state w, you can indirectly change future values of x or at least change their proportions or their likelihood. So decision rules stem from measures of preference because if you don't know where you're going, you're lost. Or if you don't care about the two things, then it doesn't really matter. One of the confusing aspects of active inference is that it treats the statistical model P, the one that is the actual distribution we're trying to get close to. And this is like the key point that we'll be returning to for our whole life. That model P is a measure of both probabilities and preferences at the same time. And that's gonna be what we continue to talk about because it's one of the most important points that this paper makes clear in a way that other papers haven't really, perhaps harped on in exactly the same way. Let's look at figure three again. Look at the yellow part. Active inference employs a controversial dual interpretation of P of W, probability of world states and probability of observations as probability distributions and preference distributions over hidden states and sensory states specifically. Dean with the red text. Again, I don't wanna take up a bunch of time but if we're having one of the authors or a few of the authors on, we can probably pull this out a little bit. For now, that when we introduce preferences and probabilities, it isn't just a second, it isn't just a second consideration. It can almost run away from us really, really quickly if we're not really careful in terms of pulling back on the reins a little bit and really thinking about what does that imply? So I'll just leave that for now. Great point. Thanks for sharing it succinctly. They write, recall that free energy principle inference counsels choosing beliefs by minimizing a function that measures the cost of inaccuracy. And that's because the free energy calculation includes both of those very features, the overfitting penalty and the failure to explain penalty. Action selection is governed in the same way as inference. Remember, we were talking only about inference when we were talking about the variational methods. Now there's gonna be a twist. A slightly different cost function called the expected free energy. So that's why we were previously just talking about the F variational free energy. Now we're gonna be talking about G expected free energy, labeled G. The definition of G is closely related to that of F. The interpretation of the two penalty terms changes. As the formalism is updated to reflect the fact, we are now making measurements over expected future states. So this is not just measurement error. There's the fundamental unknowingness of the future. Since future states have yet to be observed, the agent must average over them to obtain expected values. The penalties are associated with failing to satisfy preferences and failing to minimize future surprise. Overfitting, preferences, failure to explain, surprise, expectation, preferences, when, now, or then. Let's look at how those play out in this sort of action selection through variational free energy, through time, aka the expected free energy formulation, and how those two pieces, satisfying preference failure and failure to minimize future surprise, look now. But this is sort of the elaboration or the cousin of this one. This is like snapshot inference, and now we're gonna be looking at expectations through time. Okay, so first failing to satisfy preferences. So again, keep that simpler version in mind. This is gonna be a slightly different one in six. They even write compare equation two. This is me, I hope, feeling to satisfy preferences given a clear enough term, but it totally brings in something different, which is something about the agent as Dean wrote. So there's big implications, of course, because there can't be a non-preference-driven action selection in any useful sense. Whereas the variational one was just like, given the scatter, I want the best mesh and regression, that one, it does embody implicit assumptions about the world and so on, but this takes it to a whole another level. And here's where that controversial dual interpretation comes into play. Not only is it unusual to treat P as a preference distribution, it is unusual to treat the goal of decision-making to produce a distribution that matches that distribution rather than maximizing expected utility. So you don't need a preference distribution other than more for reward learning. Whereas here, we're trying to realize our preferences and match distributions rather than maximize from minimizing the divergence from these realistic but optimistic expectations. So perhaps it is best to keep in mind that preference in this sense might mean something different than utility in the traditional sense. Steven. Yeah, without going back into that, but the whole point of having this matching distributions rather than expected utility, which effectively again becomes kind of like equilibrium states, places where things settle and can be measured and they're kind of stable. You're in this kind of realm of a more fluid flux type process. So I think that's one thing. And I think the other one is this focus on the future, the focus on prediction and things going into the future and how to make sense of that rather than looking back and trying to, which is often what science is doing is an explain retrospectively in the same way psychology is often trying to explain what it sees before it and coaching psychology is trying to see what can be done to get closer to something that's more suited. Great point. Thank you. It really embodies the forward lookingness rather than the optimal reward or prediction on previous observations. So failure to minimize future surprise. This is the other term. Failure to satisfy preferences. Now failure to minimize future surprise, Formalism 7. One of the tenants of active inference is that agents should act, this is in a normative stance to ensure that future data are not too surprising. And so here is the Formalism as written. This is the failure to minimize surprise. In addition to conditionalizing on Z, Z is action. So we know that the vertical line is conditioning on and then Z are the affordances like the action states. In this simple example, we're not gonna go into the Markov to the policy into the action state yet. This is from the simpler cybernetic or sort of agent environment framing that doesn't distinguish that as clearly but Z is just an action that the agent engages in. The failure to minimize future surprise is conditioned on Z. So it's not that there's some sort of world that we're not altering and then we're doing some strategy in that inalterable world, leading to this total ad hoc way of integrating the outcomes of action into the niche. The world states in the future have expectations that must be calculated as expecting on policy selection. So it's not just that some policies have effect and some don't. It's that the inference of the future failure to minimize surprise is conditioned statistically in the algorithm on the choices. Now, when that includes choices about beliefs, doesn't it get so interesting? So that is one of the cool parts about this calculation. And then this expected free energy. Okay, Stephen, yes, go ahead. That also brings in an element of this contextuality. It's what sort of choices were made then in context X or well, let's not use that context, right? It's important and it changes things and you can't do that if you're taking averages and sort of pre-built models. Okay, so now just like we kind of looked at the two parts separately and then summated into F, the variational free energy. I then looked at those two parts here. We've gotten to equation eight in the paper, which is G, which is a function of those two distributions, P and Q, as well as action. And here are those two pieces that we just discussed. And then they write the third input to G is Z rather than X. So not observations, but action. As mentioned above, this is because we are calculating the expected value over possible future sensory states rather than inferring on the basis of a sensory state that has just occurred. Okay, so that's exactly what Stephen said. F is really good for sensory inference. It's about given the just observed sensory data what the estimate of the world should snap to. But action is totally different, not just because it entails preferences like Dean raised, but also because the consequences of action in the future are unknown. So the distribution for planning as inference, for action as inference rather than for perception as inference needs to require conditionalizing on action, not on sense. Even though sense does come into play, it's a little bit like hidden away. Here's what we're getting at, which is the sensory distribution, noticeably not here, but on the right side only. And then the function that we're minimizing on is action inference, active inference. As with F, the measure G suggests the principle. Free energy principle in terms of action, not as inference is minimizing the free energy on action selection. That can be read as an approximation of not optimal Bayesian sensing, but optimal Bayesian design of experiments and optimal Bayesian decision theory. Dean. And I think right here is, I'm just pausing for one second. I think that we're gonna get into some more evidence of this, but I think this is the first moment when we can say active inference and free energy, well, not free energy, active inference does not necessarily constrain itself to being just a framework. It's actually a filter as well because of the active part. It's not stable, it's constantly being updated because of that active piece and the preference piece. So we could call it a framework, but I don't think we're doing it justice. There's a framework aspect to it and there's a filtering piece to it. And so that's why I talk about search fields all the time. So I just wanna drop that seed now so that when the authors come on, we can talk about that a little bit more. In terms of the context that Steven mentioned about guiding. Awesome, thank you. So now we're gonna return to the active catference. So let us present a solution to the cat example. For the problem to have a determinant solution, we need the conditional distribution, world state's condition on action Z. That means that the consequences of action have to be estimated. If we put the cat in the kitchen, it usually stays there. So this is again, the two pieces that action introduces into the puzzle is the question of preference, otherwise why bother? And the question of the consequences of action. So the red is a statement that's empirically observed from observed data about what happens when you are estimating location conditioned on action. This is that Q of W conditioned on Z. And so you can see when it goes into the kitchen, it stays in the kitchen nine out of 10 times. Whereas if it puts in the bedroom after some period of time, like Steven brought up, not one millisecond later, but one hour later, one minute later, it's model specific. It's time scale friendly. It's not time scale free. And then you can compute numbers having to do with action selection within this model. It is worth restating how unusual it is to interpret P as a measure of both probabilities and preferences. But I mean, it's the letter P. There's nothing wrong with treating a distribution as a measure of preferences. Distributions don't demand to be interpreted as probabilities after all. But what is unorthodox in what church? And in need of justification is giving the very same mathematical term to different interpretations within the same equation. So that's the two eyes at once kind of looking back. And then they go into a little bit more detail about what that actually means. We are not aware of proponents of active inference taking this interpretive line, but it appears to be a viable option. So just awesome and clear writing and drawing something out through the re-understanding and the communication, which happens synchronously and asynchronously. One of the ways proponents of the framework turn to this unusual interpretation to their advantage is by casting action as a form of inference. So here's from a Buckley citation. And that's why it's called active inference. And we just kind of talked about it a little earlier. So the mechanism underlying minimizing expected free energy is formally symmetric to perceptual inference, formally symmetric, overfitting, failing to satisfy preferences, failure to explain the data, failure to minimize expected surprise on teacher data. That's why it's the only one that has the X in there. Rather than inferring the cause of sensory data in organism must infer actions that best make sensory data accord with an internal representation of the environment. Statistical representation, not the debate on whether organisms have representations. 2.4, simple model of selection. This is where the Markov blanket is gonna enter. In our model X and Z are the inputs and outputs of the agents, X observations, Z actions. The set X and Z is the agent's Markov blanket. The term is derived from Gidea Pearl's work on inference using Bayesian networks. Pearl 1988, classic citation. Other live streams, we talk more about Markov blankets. So we're not gonna go into it too much more here, but it's definitely the tip of an iceberg if you wanna check out more. But here, in the sense required here, a Markov blanket can be understood as the set of nodes that screen off the agent from nodes considered external to it. Screen off the agent or screen off the internal states of the statistical model. Another toy model will help illustrate. Consider an agent whose surface temperature X can safely lie between negative four and four units. Temperature's continuous, okay, but it's a discrete model. And so here, death is coming into the picture on either end, too hot and too cold. So this is gonna be like variational catference, but now it's a freezing and a burning end of the room and the cat or the robot cat is going to be making decisions about what to do. So the external world state is W, the temperature, and then the perception is going to be X, the observation. Notice that the value of W does not affect the agent's preferences. So the preferences for living or for homeostasis or whatever, regardless of the observations in that snapshot moment, but not precluded over longer time scales. All the agent directly cares about is its surface temperature denoted by X. That's why the two rows are identical. So these are sort of the observations and the different actions that can be taken to move between one end and the other, perceiving only the temperature. The agent tries to guess how to act, how to stay within its expected range. And then this is sort of the intuitive pseudocode. When the temperature is high, you wanna go towards the lower end. When the temperature is low, you wanna act to go towards the higher end. And so here is like the simple example. This is kind of like the bacteria that has the right priors versus the one that's just deciding randomly or without any resemblance of relationship to the other observations. And so here is the smart agent that's just keeping its free energy every time it gets up as it moves away. It acts to bring it back down into alignment. And then here's the agent's control over its external state is 95% accurate. So 5% of the time it slips. And then that's what they say it's grasp slips. The optimal control, the handle has been lost because of a motor ineffectuality, but that can be recovered. Whereas the randomly acting or just sort of like the sort of Brownian walk diffusion process ends up dying. Okay, Dean? All I was gonna say here is did a great explanation. And if this was your first encounter with the W and the Z and the X, you'd have to really, really slow this down because the first time you encounter it, it takes more than one pass. You might have to go over it a third and a fifth and a seventh time for it to actually make sense because there is, we all have a certain capacity in terms of the amount of variables we can juggle at once. And when I first looked at this, it made sense to me, but I could sense from somebody who's maybe coming from a philosophical background and not necessarily having the same degree of statistical hands-on experiences, maybe the three of us do. This would be a moment where you'd really want somebody to hold your hand. That's what community and active lab is for because then it can be like an interaction and every question is welcome. So as long as there's attention in the game, everybody's gonna make it. Stephen? So this also does give an indication that with variational free energy, having some sort of perturbations can be useful, particularly if you have multiple parallel kind of sources of information being integrated. So having some noise is a good thing, right? It's actually gives you a way to start to make inferences. One question, is there that on the left, there's no real learning happening? No. But it still makes it, even without the learning component, it still is more stable. Yes, it's not doing parameter updating. It's just like a previously learned association between the temperature and the direction to move. And also notice like the observation of sensory is assumed to be perfect here. So there's many layers which can be brought in, but this is kind of like the kernel motif. Okay, the correspondence between high values of F and life-threatening states lead to a third form of the free energy principle. So we had inference and action, perceptual inference, then we had action selection and here is free energy principle selection. Any system that survives long enough will act so as to appear to be minimizing F. That's the first time we got any discussion of far from equilibrium thermodynamics of anti-dissipative systems, resilient systems, anticipatory systems, except indirectly through action. But this is the sort of selective Darwinian side of FEP. Okay, of course there's a ton that could be said here. This is not a normative principle, not a suggestion to agents regarding how they should perform inference, but a means of describing how agents behave. Axl's paper, minimizing free energy is not living alone, but living systems will be those that appear as if they've minimized free energy. And then here, I think someone wrote that in there because that's going to tell you what your autobiography is going to sound like given that you're still alive and can write one. Yep, my last words are, ah, in recent work, Friston gives a deflationary interpretation on which agents do not in fact minimize anything but perform acts which can be interpreted as minimizing F. So, see live stream number 34, zero, one, two, three. That is the reason for the emphasized phrase, so as to appear to be minimizing F. That's deflationary instrumentalism, highlight. Despite this deflationary approach, there is a link between this and the earlier principle. Agents subject to the free energy principle of inference, perceptual inference, ought to minimize F. So if this is ought is tied to their survival, then the normative principle has the same underlying justification as the descriptive principle. So this is sort of, I don't know if it's been noted before but basically it's equivalent to saying the evolutionary ought is an is. So it's kind of like not from the fitness side, the Malthusian Darwinian side, but from the anti-dissipative side. This is how things have looked. It's like saying fitness isn't necessarily being projected into the future, but if you do do the computations, fitness can be assigned to like different single nucleotide polymorphisms that have had different success. And you will only see successful ones, but it's a bit more complex than that in the moment. So here are the three piece, the tale of three piece. We had the tale of two densities, what other jokes have we had? In each of the three examples discussed in a section, there's been a distinct role for the distribution P and a distinct interpretation of each model. Narrow sense. In our first model, P was a generative model employed by an agent. It was therefore interpreted as representing probabilities. Like where the cat is, but now in order to introduce action in a meaningful way, we had to have preference. Like I want the cat to be in the kitchen. So when we brought action into the game, we had to introduce preference. What makes active inference similar, but also different from other frameworks, probabilities and preferences are represented by P. In addition to representing probabilities, in the second model, P measured the desirability of certain future states over others. I expect you to go to school every day. That's something having to do with action, if it's serious. It was therefore interpreted as representing preferences. In our third model, P tallied the historical frequencies of a set of hypothetical ancestors, fitness. It was therefore interpreted as representing the fitness of different states. So supporters of the framework often point to the third role, fitness, pH, to explain how P can simultaneously fulfill the first two. It's like, if that thermal bacteria is rocking its niche, it will have high fitness. Okay. So that is where the three P's get us. Here's just a brief look forward. So for any of these next slides, we're not gonna go into the actual, very nicely fleshed out arguments themselves, but Steven or Dean, just raise your hand if either of you have anything to add on like each of these section headers. So section 2.5 is extensions to the model, more things to learn, more ways to act. We've seen adjectives added to active inference, deep, sophisticated, contrastive, affective. What other adjectives have we seen? That's sort of this section of the guidebook. It's like, there's your USB port. What can you have added into it? Or how can it be developed? And so that's where we see sort of the adjectival family of something active inference, maybe even N active inference, coming more to the philosophy side, but also there's all the elaborations that we've seen of the actual parametric model in the narrow sense. Sometimes it's in both camps, like affective inference has to do with a model derivation or metacognition as well as something about like the model in a bigger sense, but we can recognize both those lanes at once, Steven. Yeah, I mean, one of the ones N active inference, reciprocal active inference, there's a couple more, like that's definitely true. Just one thing I was gonna throw in, I noticed they talked about this, and I know it's in the quote, they talk about any system. I always wonder whether that's a bit of a, that a little bit of a piece of a paving stone waiting to be tripped over. Because I kind of feel like, if you're gonna say it, maybe any nonlinear dynamical system or nested systems of systems or things, I just always think that that can be a little bit of a hazard. Yeah, it inherits a lot of the legacy ambiguity around system like you've brought up a bunch of times. So it's really helpful. So in section three, notice how the history, brief history of the free energy principle comes after what we just spent the last hours and pages on. So it isn't a history of science perspective, but it is being recognized as important. And so the free energy principle is a modern incarnation of ideas that have been raised sporadically over at least the five decades. It combines traditions from physics, biology, neuroscience, and machine learning and other areas, especially the modifications and increasingly the applications. And it's a bi-directional freeway. It's not just two lanes in both direction. It's going both ways. So this is sort of the history, which is fun because it's a recent history in some parts. Okay, section four moves from the historical to the philosophical. So section four, dialectic, the free energy principle, and related claims. Mathematical, empirical, and general claims, section 4.1. This is where all that mathematics groundwork pays off because in the author's words, we think a great deal of confusion can be overcome by considering three kinds of claims. First, there are mathematical claims. Those were the ones that were just brought up in earlier sections. So that's the theorems, the scientific models, and the statistical techniques in the more narrow sense. And by doing good scholarship, we show that the core features absolutely predate active inference. And there's less controversial than some might suspect. However, first and colleagues have since introduced many novel mathematical elements like the perception and action interpretation on the Markov blanket. Importantly, claims in this category do not need to be interpreted as statements about real systems in order to be evaluated. So part one is its instrumentalism. Part two is separate the parts that predate active inference from some of these more recent derivations even within the only technical area. Second area of confusion, or the second way that confusion can be overcome by considering another kind of claim is partitioning off, we can add like a line here from that mathematical, everything that was in the section that we just discussed plus stuff that was more recent like 32. There are empirical claims about cognitive and biological systems, how brains and bodies actually work. These are the remit of cognitive neuroscience and biology. So those are empirical biological claims. Third are general claims that typically abstract across a wide range or by class of empirical claims. So that's like the sort of everything nested systems, collective behavior angle. So separating those and maybe there's more is really helpful because sometimes people will say, well, this general claim is true because look at how this in the skin works. Well, what about the skin? Well, that look at how the math works. And why does the math work that way because of the general claim? Is that an argument? What is the justification and the links among these dialectical categories? Well, if there's three kinds of claims, mathematical, empirical and general, then there are all the edges of that triangle. If you had a fourth one, you'd have all the edges of a tetrahedra and then you'd almost have a model. The mathematical to empirical direction invites philosophical analysis due to novel interpretations of scientific model terms. So how can math be used to justify empirical claims? Why does equations have anything to do with what people say about the brain and the body in a niche? Why is that justified at all? Other than just other people in their epistemic authority have done it or it's how it has been done. Stephen? Instrumentalism also gives a kind of a bridge between realist science and applied science and it's kind of sitting somewhere between the two. So I think that's quite a useful piece that this adds to the equation. Nice. And then the last two edges of that triangle are how can mathematical, not both directions on each edge, you could add, but how can mathematical claims justify general claims? Well, how could you say that about nested systems, about nested Markov blankets? The math. Okay, that's important to investigate. And how can general claims justify empirical claims? How could you say that about the ants? Well, fitness or dissipative systems. So how can general systems claims justify anything about empirics? So those are some of the edges of these three kinds of claims and there's also claims and subclaims which is why it's really important to have rhetorical ecosystem mapping and an ontology for active inference so that we can actually learn and apply across languages and through time. Stephen? Yeah, and I think also we can think of this as structural and functional insights being gained rather than necessarily repeatable measured outcomes every time being gained. So there's an idea that these insights might reveal something about the structure or the functionality of the dynamics, knowing that no one simulation may ever be repeated quite the same way, particularly if it's got a fair few variables. Yes. So the concluding remarks. The active inference framework is incredibly ambitious. Is it the framework or the people? In its explanatory scope. From humble beginnings as a theory of brain function, it is now positioned as a framework for understanding life itself. There's a critical tradition in the philosophy of biology inspired by Richard Levin's with regard to such ambitions. Many then will approach active inference with skepticism. Healthy skepticism is a good thing, but healthy skepticism is informed skepticism. Unfortunately, getting one's head around the details of active inference is no small task. Our goal in this introduction has been to clarify the basic mathematics, history, and internal dialectics of active inference. In that order, those were the sections. And draw attention to some key concerns. With these details on the table, philosophers of biology are in a better position to critically evaluate the framework. We look forward with interest to seeing the results. Classic future looking last sentence. So that's the final lines. We have a ton of things to discuss in 37.1 and two. It'd be awesome to have any of the authors as well as any other people who just wanna jump on for a returning or a first time discussion. But we'll close under the buzzer in the third period. I know that's how they count up there, Dean. So any final comments on just what you're looking forward to discussing? Dean first, because you never get the last word at the end, Stephen. Yeah, I don't want the last word. So one of the things I wanna talk about in that point one is that the general claims, because I get to trot out one of my favorite parables that I made up, which is three functions walk into a bar. And I wanna be able to, because I get lots of mileage out of that and whether it's three functions or it's probability preference and fit, the same holds true. It can be great taste and less filling. So I wanna have a look at that because I think of that really tags onto the idea that what they're talking about here in terms of a guide that also set us up really well for creativity. And then I also wanna be able to say that that creativity isn't because we're framed in, but because we can filter as well. Three interpretations of a function walk into a bar. Stephen. Yeah, I'm really curious about how, if especially speaks the authors, these first principles of hidden states and inferences and which ones are selected and which data is available to actually make inferences on can be used, particularly for me, the example of psychology versus coaching psychology and that type of paradigm shift. Awesome. Yeah, I really appreciated both of your perspectives on this paper. It really shows how different life experiences can be in conversation with some of the technical details. So thank you both and for everybody who's watching and participating. See you around the lab. See you in the upcoming weeks for .1, .2. Peace. Thanks guys. Bye.