 in behavioral ecology and ecology and evolutionary biology with models that sort of having some sort of empirical concept. Cool. Anyone else want to share something they're working on? Or I can also just raise the second warm-up question just in case people want to answer that one, which is what is something that you'd like to have resolved by the end of today's discussion? Whether it's a thought about the paper specifically or something general, sure. Fellow Jitzer, I'm not sure that is Shannon, go ahead. Hey, I wonder whether the like structural representation that we might be defending later today, whether that structure could just be a set of attractor dynamics, like is that enough to be a structure. So I think Alex might be raising his hand and have something to say. Yep, Alex. Oh, yeah, I mean, I was going to speak anyway. I mean, I would say yes to that question probably. It's not just my opinion, but I was going to answer both questions, I guess. So apart from trying to figure out the free energy principle still in lots of conversations with people, I've been working up to trying to implement a variational autoencoder to figure out how that actually works. So I've been doing a lot of sort of more technical work. Something I'd like to have resolved is just to figure out exactly where inactivism or the version of inactivism that's defended in this paper is inconsistent with representationalism, if it is. So I don't need there to be residual disagreement, but I just want to figure it out. Oh, sounds good. Any last thoughts on these warm-ups before we jump into it? We have a lot of very interesting slides. Kate, go for it. Yeah, I guess the biggest thing that I want to have resolved by the end of the session is I'm still not really clear on the status of the generative model because at some point it sounds as if that something is almost causal, that it's controlling the animal's behavior or the organism or the system's behavior. And at other points, it's a model, it sounds more like it's a model that we use to describe the system's dynamics rather than something that is encoding beliefs or being causal. And that's something I'm still not entirely clear on. So I think that's a big thing I want resolved. Cool. All really interesting topics. Thanks for sharing. And we're going to jump into it because a lot of these are on the slides ahead. So the paper that we're going to be coming back to today is A Tale of Two Densities. Active inference is an active inference by Ramsted, Kirchhoff, and Friston. And just to rehearse the goal of this paper, they aimed to clarify how best to interpret some of the central constructs that underwrite the free energy principle and its corollary active inference in theoretical neuroscience and biology, namely the role that the generative models and the recognition densities play in this theory aiming to unify life and mind. And so last week, we talked about what these two densities are, and we talked about the tail that integrates the densities. We also walk through the roadmap, how they build from A to Z, by talking about statistical models and representations, introducing active inference, and then combining that with an active inference in this sort of inactivism 2.0 framework before building towards multidisciplinary research heuristics for cognitive science. So that was last week. Let's get into the new and exciting parts. So this was one of the feedback comments from a participant last week. And what they wrote was, a key learning goal for me in building my understanding of active inference is to develop intuitive examples for the critical concepts that can act as the basis for abstraction later on. It seems to me like there is probably something like a logical sequence in which this should be done, but I don't necessarily always know which concepts and ideas from active inference to prioritize. As an example, coming from a cognitive linguistics background, I tend to think of beliefs as either a causal reasoning, or the underlying sensory motor simulations that connect to linguistic concepts. I've been trying to find new intuitive examples that can help me to understand beliefs as probability distributions, as clearly this is the key for making progress in understanding active inference. So my question for all of you is, what is a system or a metaphor that helps you understand active inference and why? And while people are raising their hand on the bottom, we have just a few of the systems that we've been discussing over the last weeks. We have a brain, the technical and the artistic parts of the brain. We have insect colonies. We have dance and embodiment and movement. And then on the right, I guess it reflects something like the economy or changes in dynamical systems. But what is a system that helps you understand through specifics something about active inference? Any thoughts on this? I mean, I hate to be that guy, but I started understanding active inference when I went into the math. It's a formal theory and I think we can actually be misled by metaphor. Some of my reservations about the philosophical work that's been done on active inference is that it operates on the basis of kind of informal, intuitive, conceptual understandings rather than trying to see what the math that says. So I hate to be that guy though. No, that's perfect. We'll go to Alex Vyatkin and then to Kate. Go ahead, Alex. Yeah, for me, most important changed my views was taking seriously approach from like for process and understanding concepts about states and steady states of possible processes. And later on, all this concept with Markov Blanket as a border and underlying structure like generative models in links with cybernetics. It's bringing me some kind of top level picture and now I'm trying to focus on different parts to have more detailed understanding. Cool. Kate and then Mel. Yeah, I was also going to say cybernetics. I think I only really started to understand what was going on once I started seeing the links was like Ashby and accounts of survival as stability. That was when things started to make sense for me. And there's also just like a nice metaphor I quite liked and I think it's Bruno, but it's a paper by Jellie Brunerberg. I think the Union keeps running right felt with the crooked scientist metaphor, the web scientist is trying to obtain the result that it's looking for. So it has to have some sort of latch on the world in order to bring about that result, but it's still trying to obtain the result that it wants in the first place rather than passively find out what will happen. I thought it was quite a nice metaphor. So do you think we are helpful? Cool. Mel. So I remember I'm remembering my first sort of aha moment with the IPP and active inference. I was explaining this stuff to my friend who works on on similar things, but in a very different framework in the MIT Media Lab. And they had like papers thrown everywhere and scribbling all of them. And my realization in explaining this framework to him was that there is fundamentally at the lowest level at the level of the simple sort of systems or at their kind of lowest scale, right? No difference between belief updating and action. So a change in the system is a change in the system. Right. And that is both a belief update and an action. Right. And once you build in further levels of hierarchical organization and complexity, you get this strong differentiation between them. But at the lowest level, the same thing. Cool. And Kate, did you raise your hand there? And also I'm just going to, yeah, Alex, Kiefer, go ahead. Oh, yeah. I mean, I was going to, that's, I like that point that at least at the lowest level, the belief updating and action are kind of the same thing. I guess that's one reason that's one reason that I'm skeptical that we need to explicitly, I think implicitly, more traditional Bayesian brain stories kind of are built on that claim as well. But I was going to say what got me into, what got me into active inference also sort of a technical point was just that you really can't, you can't optimize the thing. You can't minimize surprise just by adjusting, adjusting and at least not, okay, the surprise depends in part on the sensory observations, right? So you can't minimize that just by changing your generative model. You have to change the observations. So that's the technical thing that got me into it. Cool. And yeah, thanks everyone for being flexible and fun on the tech. Maxwell, I really agree that the metaphor, as always with metaphors, it's a question about compression of semantics and meaning. And at the end of the day, we're talking about something specific. And so we can think about different systems that help us understand how active inference is deployed in the real world, but often metaphors will take us off the mark because it removes us from the real underlying details. So let's return. And the details are really where the devil's at in this case, I think, but yep, I agree. And we're hopefully in the end with the figures, we're going to be able to look at those details really specifically. And I think that the way that figure two, three and four lead into each other helped me understand a lot about the system. So just to take another look at these two cities, the two cities that are being linked, the densities. On the left, we have the Bayesian structural representation list and Alex freely add in any details here, but Bayesian models move between the data parameters, the observations and hyper parameters or higher order parameters. And from the data to the hyper parameters, we call it a recognition model. And to go from hyper parameters to a set of data or observations, that is a generative model. And the outcome is a statistical convergence of a multi level model that represents structures of the world. For example, through an expectation maximization scheme, we can contrast this with the inactivist school of thought, which has linked rather than data and hyper data, but rather has focused on the world and agents and how they're linked through perception and action. So a lot more adjacencies to areas like niche construction and ecological psychology. And we remember hearing from Maxwell last week about how it was the desire to mathematically rigorize some of the inactivism that led him to the Bayesian approach. And let's now think about how these two densities are linked with a little bit more feedback from one of our participants. They wrote, the domino metaphor finally makes sense. The physical dominoes are the physical state of the system corresponding to the recognition model. And the falling of the dominoes is like the process or the dynamics of the system reflecting the generative model. So here we've kind of combined these two different ways of thinking, and we're looking at the relationship of the world and the agent or the system and the system's surroundings, and directly went ahead and combined perception with the recognition model and action with the generative model. And then also just to combine that with what Kate was just talking about with cybernetics. This is a quote from the paper that says, on this view, active inference can be read as a new take on the good regulator theorem proposed by content and raw shashby in 1970. Active inference tells us about the relation between a control system, the generative model with priors over action policies, and a system being controlled, the organism and its adaptive behavior, the actual actions undertaken in and part of the world. So what exactly does this clarify about active inference or what do people think about this metaphor or at least way to think about how, if not necessarily a metaphor, how would we apply active inference to this system? What are we getting at here with these similarities and differences between the two models? Sure, Mel. Well, so the good regulator theorem, the idea is that any good control structure for some larger system has to be a model of that system. It has to contain in it all of the key variables for that system. And basically what this says to me about active inference is that in order for a system in an environment to be able to adequately react to whatever kinds of perturbations are happening in the environment, it has to contain all of the key existent, I call them existential variables and the variables in the environment that would lead to the system continuing to exist or not continuing to exist. Cool. And so how exactly, I'm just curious, does the system come to embody all the key variables of the outside world, like don't the models of the organism reflect a simplified version of the outside world? Otherwise there kind of is a borehist story about the map that represents the territory exactly. So how does a good regulator arise in the context of an environment like that? Shannon and then Maxwell. Well, I don't think the key is to be an exact, like, imitative model of the world or an exact map even of the world, but just a model that is good enough to enable you to act and adjust in the world. Cool. Maxwell and then Mel. Yeah, I mean, so formally speaking, there's a difference between external states and hidden states. And I mean, when everything is going well, the external states that are modeled by the system coincide with the hidden states that are actually out there. I'm not sure if it's guaranteed, but it is strongly implied by the fact that we're minimizing variational free energy. If we're not generating a lot of free energy, then the external beliefs that we have about the environment tend to reflect the causal structure of that environment. But I mean, Manuel, Baltieri, and Chris Buckley have done some interesting work showing that actually there isn't a necessary kind of entailment relation. The system can be acting that don't actually exist. And I mean, this makes sense also just evolutionarily. Perceptual systems don't track truth. They enable adaptive behavioral loops. So if you're a prey item, like a bunny, it's adaptive to generate a lot of false positives compared to allowing for false negatives in terms of predator detection. It's more advantageous to make a few less costly mistakes than make a big one. So I mean, Jelle Breenenberg and Eric Rivell make that point in the paper that Kate was mentioning just earlier, but yeah, it doesn't actually have to hook up to anything in the real world, though usually it will end up doing so. Cool. Mel, I'll end Sasha. Yeah, so the key term in what I said is key, actually. So the key variables, right? That's just what keeps the system alive. And we can imagine that we first get a system on the scene that needs just basically one parameter in its environment to be correct in order to continue to exist as a system. And then we build up from there to systems that are progressively more and more complex and need progressively more and more complex sort of environmental scenarios to survive. And then they themselves, in order to survive in those more complex regimes, need to build in complexity, right, in terms of their the models of the world that they enact, right? And I think it says something interesting about how we conceptualize cognition and the onset of cognitive complexity as sort of, and this is Peter Godfrey Smith's line on the subject, is it's a response to a complexifying environmental circumstance, right? Cool. Sasha, then Alex Kiefer. Yeah, one key phrase that really helped me better understand active inference is good enough. And that's what really kind of put it all together that while we're going about and reducing uncertainty about our system, it just has to be good enough to make the next action or in the evolutionary sense to survive. And that metaphor has really helped me go through all this, all these high level concepts. So thank you for mentioning that. Alex Kiefer, then Alex Vyatkin. Thanks. Yeah, I just wanted to just note that there's, you don't need to contrast accuracy or answerable to the truth-ness with enabling adaptive behavior as like a dichotomy. I'm not sure if that's what was intended or not, but there's one reason I've had trouble getting into understanding the sort of anti-representationalist viewpoint is that structural similarity is a matter of degree, accuracy is a matter of degree in the sense. So I mean, yes, your hidden state representation doesn't have to map completely accurately onto reality, doesn't have to capture every detail. But if it doesn't do that to any degree, you're screwed. So, you know, maybe we need systematically sort of simpler or biased representations in order to do the job. But I think there's still a implied relationship to the truth there. Cool. Alex? Yeah, thanks. I want to change a system's level of consideration of possible application of active inference. And for me, interesting on a personal level, in terms of day phenomenology, and especially professional and working day phenomenology, how it works, what is a generative model, and like an example, and what I want to discuss, if I get it correctly, or how it's possible to develop to think about it. But if, for example, some doctor, he knows different disciplines and these different ontologies, and when he met a patient, he started his action by his generative model, which activates with recognition models to serve as a doctor depends on exact situation and exact case. And if it's so from, for example, from another level, from team level, if person behave professionally in some discipline, it's became like a perception model for generative model of the team. And what does it mean in terms of learning and education of team members? Because if we can have some in some way to link recognition models with ontologies and starting to work with it more again, cybernetically, it's possible, it could be very interesting, at least for me. Cool. Agreed the ontologies that how we think about the world definitely influences our perception. And Sasha, do you have something else? Otherwise, okay, Shannon, go ahead. I was just going back to Alex Kiefer's point that your model, like there has to be some truth relationship with the world. And if we're going to the dominoes and the process needs to be that the dominoes fall to make some cool pattern. If there's an interruption, like so now this domino can't reach the next domino, like that's akin to a disruption and how truthy the model is of the world. And it disrupts the action that you're able to take or it disrupts the you die instead of being able to survive because your model isn't accurate at that stage for the processes to continue happening to do the next action or to survive. Cool. Cool discussion on this domino system. Let's talk about counterfactuals and also return to some of the great questions Alejandra raised in previous weeks. This is from another participant's feedback. They wrote, What I'd like to clarify if possible is the following. Am I right in understanding the generative model as a control system that is dynamically instantiated between an organism and an affordance in the context and the recognition density as a relation between the sensory inputs and the possibilities for action policies encoded in the organism? If so, how does this work when action is instantiated by counterfactual forms of cognition or simulation, i.e. those that do not rely on affordances in the current external context? And so on the bottom right, there's a younger person looking into the mirror or maybe vice versa and seeing an older person. And this is reflecting how not only is there a deep temporal aspect to our action in the current moment, meaning that there's often affordances that aren't available to us at the second that we're actually making the decision, but also that we think about future possible affordances that don't exist based upon the sensory data that we're getting or do they. And then on the right side, I have a few bacteria because this is like an example of a system that we might not think of as having a simulation or counterfactual based mechanism of action. Often bacterial behavior can be summarized quite concisely by just the short term gradients that it's ascending or descending. So for anyone to raise their hand and chime in, how to counterfactuals play into this and how do we make sense of affordances that aren't in the current ecosystem but are still important for behavior? Maxwell? Well, I just want to comment on the first part of that question. So I mean, I think it's a bit simpler than that in terms of the relationship between the recognition and the generative models or densities. So your recognition model is just your posterior, all of your posteriors, basically all of your posterior estimates over states and precisions. It's like my best guess as to what the value of all these states are. And I mean, you can think of that as the system's current physical state. So under the active inference formalism, the current physical state of the organism encodes basically the specific parameters of these beliefs over external states. The actual physical states of the system encode these parameters, which so yeah, you embody a best guess. The generative model is the point of reference for the free energy gradients. So you can think of one as like the inference model and the other is the control model in the sense that the inference model or the recognition density is sort of like my best guess right now. And the generative model is essentially a model of what my phenotypic preferences are. And it's the tandem between the two that gets you behavior as in the generative model provides the point of reference for the definition of the variational free energy gradients, i.e. how close am I to my desired sensory distribution. And the recognition model tells you where you are effectively. So it's like where I am, where I want to be, and the free energy is sort of the difference between the two effectively. So I mean, the first part of that question, yes, you're right in understanding the generative model as a control system. The recognition density is just the best guess, it's the posterior. So I mean, yes, it is harnessing these relations that are being described here, but that's precisely because it's a posterior estimate of the states and precision and all that. Cool. And one more thought on that is for an agent who knows about, for example, how hammers and nails are related through their internal models, they might look at hammers and nails and pieces of wood and think, you know, later on I could build a house with this. And so it's actually a future affordance, but it's based upon sensory input at the current moment. Cool. I'm going to just move on to the next slide. And just a short little quote by Buckminster Fuller, because sometimes I just have to, he wrote, rope may not be much like water, but not is like the wave. And this is from the synergetics framework. And it's getting at this notion that there's a difference between structural integrity, which is like the rope in the water, which are quite different. They're both static and stable to pattern integrity and knots and waves can travel through medium. And so by virtue of traveling through medium, kind of like that domino wave, there's something similar about these traveling waves that goes beyond the mere structural integrity of their media. And so I just, a thought question for later is really, is active inference a structural integrity or a pattern integrity? I would say that it's a pattern integrity. And just bringing up this notion that many far from equilibrium systems, though their mechanisms are extremely different, for example, the bacteria and the person, there's going to be similarities because of the way that they're needing to persist from a cybernetic perspective. Enough there. Let's get to that question about structural aspects of attractors. So here, a participant provided feedback and wrote, why can't the recognition and generative models be considered as structural representations? So they're coming from a structural representationalist perspective. They wrote, in which case the dynamics of the structure are taken into account as part of the representation. Even if we're appealing to dynamical systems, the states of any particular dynamical system over time make up the structural representation. Take the Lorenz attractor. The Lorenz attractor has a specific approximate structure around two attractor states, but the states and the trajectory of any point are sensitive to initial conditions. So any two initial points will lead to different precise physical states and dynamics, but will still travel through a very similar trajectory which hovers around these attractor basins. The states and the dynamics of a set of points over time can be taken as a whole and be identified as embodying structural representations of the Lorenz system. So this is something from the realm of chaos and complexity theory. And the Lorenz attractor is an attractor that describes a simplified system of equations describing the flow of fluid, and it's a fluid of uniform depth. And there are also parameters for the imposed temperature difference, the gravity parameters, the buoyancy, the thermal diffusivity, and the kinematic viscosity. And the simple physical analogy here is like heating up a pot of water. There's heat coming from the bottom and there's cooling coming from the top. And there's this churning happening and it goes around these two different attractor states. So in this case, maybe Alex or anyone else who raises their hand, what is structure and what is representation? And if we agree on what this system is, then what is this representation debate that's happening? Mel? Mel? Unmuted, Mel? Or Maxwell? Go ahead. All right. Okay. Let's go to Alex. Yeah, Mel, if you want to speak, then we'll go to Alex. First, I think. Yeah, I was going to say. Okay, perfect. Okay. Well, I was going to say, hey, Mel talk, I don't have much to say about the Lorenz attractor. It looks like a, I mean, what's I'm not sure what's supposed to be represented in this case. Like, I have an intuition in the case over dealing with like a biological system, because, you know, we're intuitively representing some kind of environment. I'm just not sure. I'm not sure how to begin with this example. Right, we've got the state space and this nice graphical representation of the states of the state space. But unless we interpret this entire system as representing something, I don't know what to say about its sort of content. Okay, cool. Mel? I don't know what a representation is. And so I, for that reason, I guess, I think that the question of what a representation is, and what in the world is a representation is is maybe too ambitious. Only because I haven't solved it. But there's a related question that's less ambitious, which is what is the difference between a biological function and a representation? And that I think is more manageable in some respect. In field, bio, mine, we speak of the disjunction problem. So we speak of, and I know Maxwell, like, hates this framing and thinks it's updated and we don't need it anymore. But the idea is that in order for representation to be a representation, it needs to have the capacity to represent. And we have a similar idea in biological function, which is that in order for a function to be a function, it needs to have the capacity to malfunction, right? So you need malfunction to have function, you need misrepresentation to have representation. But how do we differentiate a function from a representation? Good question. Maxwell, then Alex. Cool. Well, so regarding the question, what is the representation? We targeted a very specific account of representation. In the philosophy of cognitive science, a representation is basically an internal thing within an organism that stands for some feature of the environment or the organism's own body in a way that the organism can leverage to do something interesting. We focused more specifically on the claim that generative models are structural representations. So, I mean, this has been an account that's been, you know, Opie and O'Brien first worked on this in the mid-2000s, and then Gladzeyevsky and Milkowski in the context of predictive coding, and then generative models. I mean, yeah, I think Alex and I are going to still disagree about this, even after talking about it for like dozens of hours at this point. But, yeah, the main reason why in this paper we argue that they're not is that the generative model isn't encoded in anything. And this is like one of the main differences between more traditional Bayesian brain architectures and active inference. As Dan was pointing out really early on, in traditional architectures, well, the recognition and generative models are just inverse the ones of the others. The recognition model is a mapping from the data to states, and the generative model is a mapping from states to data. So, it's really just the same kind of set of connections and everything, but it's just like, which direction are you considering it? Like, from the top-down flow or from the top-down flow or the bottom-up flow of information in the... Yeah, well, I mean, that was a really nice explanation of what structural representationalism is, so I don't have much to argue about there. And in general, I don't want to argue more than is necessary. You know that about me, but I think the reason I want to insist on some of this stuff is just to, for my own basic sanity, to just feel like I have an understanding of what's going on with this stuff, right? So, the reasons, despite... I understand, I think I understand exactly why, from an active inference perspective, you wouldn't want to say that the generative model is encoded. Let me just say it first. I think the question of whether or how it's encoded is maybe a slightly distinct question from whether it's a representation. And the reason I think that it has to be a representation, while there are many reasons, one argument, the recognition model is an approximation to the generative model, right? And so that's one example of like a rational or probabilistic relationship between these models, and I don't think that that makes any sense if the generative model is not a representation also. I'm not sure that that's how that works on directive inference. I agree that in a more traditional Bayesian brain sense, the recognition model is trying to approximate the generative model. I mean, that's essentially how it's working. It's an approximate posterior. Approximately... Oh yeah, it's an approximate posterior, but it isn't approximating the generative model. Well, it's approximating the posterior under the generative model. Right. So that means... Right. And both the generative and the recognition models are concerned, their distributions in part over... I have to be careful here. Not the actual external states necessarily, but over hidden states. I don't know how on earth you get those into the picture if it's not a representation that you're talking about because the hidden states are environmental states. They're not... We have to be... As you point out in the paper, we have to be careful to distinguish the generative model from the generative process. We can't just identify what actually happens with the generative model. I think we need to have... This is one of the fundamental questions that I think in activism often is faced with, how do we deal with... Yeah, I mean, Mel brought up misrepresentation. So the fact that... The point is the way that you see the world, the organism or the creature system sees the world is not necessarily the way it is, but we still have that distribution over external states as essentially part of the generative model. Let me just say one more thing here about encoding. So the reason... I think if you look under the hood a little bit... So the generative model under active inference, as I understand it, can roughly be identified with the non-equilibrium steady state density, right? Something like that. So I mean, I would just say, right, the dynamics sort of instantiate the generative model. I think that's a really cool point. And I actually think that structural representationists didn't do quite enough to emphasize the fact that the generative model is used as a control system. But I think if you look under the hood, there'll be some features of the system, like stable structural features in virtue of which it has that nest density. So in fact, you will find some stable thing that you could think of as encoding the generative model, maybe. At least that's why I still have, you know, I'm still attached to my views on this. Cool. Let's do Alejandra, then Mel. Yeah. I think I agree with Alex where I'm kind of confused how the generative model can be conceived just like this inactive process. Really, I was reading the paper all over again. And I actually don't get it. For me, it was the recognition density is the inversion of the generative model. So if it is not encoded like anywhere, I feel kind of lost there. For me, yeah, these top-down connections, this is the generative model, talking about the brain specifically. And the recognition density is related with this button-up connection. So you can recognize your best guess, right? So if it is not encoded, what can be said about these top-down connections? I don't know. Maybe I can jump in here and just provide some points of clarification about the generative model if that's okay. Yep. Okay. So is screen sharing activated? You could share your screen if you want. Yeah, you probably should be able to. Okay, let me try this. Okay, so this is from a paper called The Graphical Brain. And this is what the generative models look like, all right? Can everyone see my screen? Yep. Also, we're going to be returning to this in the series. Yeah, so the generative model is defined as a joint probability density over all of the variables of interest in the system, right? So in this case, the variables of interest are the states that we're trying to infer, which are here s, the policies that I'm trying to pursue, which is here pi, and the data that I'm observing at any given time step, which is here denoted o. Just so everyone is clear, time flows from left to right. So this is the first state. This is the second state. This is the third state. And essentially, the model, you only have access to this data point. These are counterfactual data points. And the whole thing is updated, basically, at every time step. And so look, the generative model itself, like I said, is just a joint probability density. It's like the probability of all these variables connected with and effectively. And when we say that the generative model is not present in the system, I mean that this density here, this joint probability density is never encoded anywhere in the system. Because the generative model itself is factorized. So basically, you take this joint probability density and then using some Bayes rule and other chain rule, et cetera, manipulations, you can write this joint density as the product of a bunch of likelihoods and priors. And it's these likelihoods and priors that are updated constantly as part of the recognition density. So this prior about your data given your states and this other prior about your states given the next state and the policy and all this, these are updated dynamically. And their current value, all of them together collectively comprise the generative model. Yeah, that's right. Sorry, the recognition model. The generative model is how these quantities are all connected, the ones to the others. So if you want to think about it heuristically, all of these specific parameters like the s, the pi, the o, all that, all of those values together as they're being updated comprise the recognition density. And the generative density or the generative model is really just these relations between the different quantities as they change. So it's literally that the inference process itself is what holds all these quantities together. And the generative model is just a description of how those quantities flow together. Yeah. So again, it's like the dominoes falling over thing, you know, the wave of dominoes only exist in that motion. And similarly, the generative model only exists in that kind of coordinated inference or update dynamics of the quantities that are part of the recognition density. Yeah. And the reason it's called the generative model at all is just because it's borrowed, it's a terminology borrowed from machine learning. In machine learning, the joint probability density over all of your variables is known as a generative model. So it's called that way just because that's the technical term. It's also called that way because if you're starting from a joint density over all of your variables, you can actually generate fictive data that you would expect under this configuration of different parameters. Thanks for that clarification. Maxwell will have those mills. We'll have Mel and then we'll carry on. Well, that all sounds pretty good to me. Go for it, Mel. I think Kate and Mel have, yeah. Yeah. Okay, Kate and then Mel. I just wanted to follow up to what Maxwell was saying about structural representation. Can you all hear me? Yep. Go for it. Can people hear me? Yes. Yes, we can hear you. Beautiful. Awesome. Yeah, just to follow up to what Maxwell was saying about structural representation, the sort of detachability of the representation, right? That's what should get us the distinction between a biological function and at least a structural representation, right? Because my feet and legs and knees are a representation of ground in some very minimal sense in the same way that fish fins are representations of fish fins and bird wings are representations of fluid dynamics, right? But this isn't detachable from immediate environmental circumstance for offline use, right? So that's what should get us the difference between a function and representation, but I'm curious to see what role that plays in the FEP and active inputs, if any. Is there merit in retaining a distinction between function and representation under active inputs in the FEP, or are they just sort of continuous with one another? Nice. Very good question. Kate, did you have something there? Okay. Okay, I'm going to just continue with the slides because I'm not sure... I think Kate's up next. Yeah. Oh, sorry. I wasn't sure if there's one to reply. I think I'm still... In the case of the paper, it was really useful for me in helping you understand what the generation model was in the FEP almost all the way. When it's described as a statistical descriptor as a generative process, now you've explained it here, all of that makes sense to me. But then there's also parts in the paper where it says... where you think it's like the generative model sort of is what the organism expects and it guides what the organism does, which makes it sound like the generative model is playing this kind of causal role, not generative, but the generative model itself. And I find that kind of confusing if given that generative model is something encoded by the organism or by the system. I'm still... I get the idea of it as a statistical description of the dynamics. That sounds right. I just don't understand how in that... how that's all it is that it's something the organism uses. It seems like it's something we use to describe the organism rather than something the organism itself is in any way using. That's a good point. I've proposed in a recent paper that came out in entropy not too long ago that both are correct. So you basically... the correct way to interpret the FEP is as instrumentalism nested within instrumentalism, meaning that we can have a model theoretic philosophy of science reading of the FEP and for that matter any other theory of cognition as this is a model that we as scientists are using to explain the behavior of organisms. What I've been trying to argue is that the FEP itself at the theory level is also saying the organism is exploiting the statistical structure of its body and movement, aka the generative model, to guide adaptive behavior. So the model really is these harnessed relations between the variables that have to obtain for survival to persist. It's this idea that once the system gets moving, then there's an inertial dynamic thing keeping it moving. My core body temperature is 36.5 degrees and the fact that it is keeps a bunch of other processes in play that in turn keep me preserving my temperature by initiating adaptive behavior like putting on a parko and it gets cold and so on. So yeah, I mean both kinds of generative models are used separately in dynamical causal modeling of the kind that the first in group used to model the spread of COVID. We're not assuming that the process that we're modeling is itself an active inference agent. So you can do that, but you can also make the additional assumption that it is and then you're in the game of active inference proper. Cool. Thanks, Max. I'll go ahead, Kate. Can I come back to that? Go ahead. Yes, absolutely. I think the kind of where I get there then is sort of seems like you're losing the distinction between the vehicle of the model and the target of the model such that I worry it sounds a bit like just saying everything is a model of itself, essentially. I mean effectively that is what we're saying. And that's where it gets a little hazy for me. That's probably the point of disagreement between Alex and I. Also, I want to say the generative model just is the organism in the sense that like this specific statistical structure that harnesses all of these different existential variables as Mel was saying, this really just is the organism. This kind of statistical structure that keeps reiterating itself and coming into existence, that just is the organism. And that's one of the reasons why I don't want to say that this model is a representation of anything. You could just call it the phenotype. We call it a generative model because formally speaking, the same construct is borrowed from a field where in which it's called a generative model. But you can think of it as the organism's phenotype. And precisely because it's a bit weird to think of the model as modeling itself, I kind of resist that interpretation and say, well, you know, if anything is a model really in the more traditional sense is this recognition density that's performing posterior state inference. And I'm sure that Alex will tell me why I'm wrong in a second. It's a great response. Let's just move on through a few more of the slides because there's a lot in the figures that I think gets at this. Mel has her hand up. Oh, sorry. Go ahead, Mel. Oh, I just wanted to interject that. I think I think Maxwell is possibly mismodeling himself. I think he's actually a tad bit more realist than he's letting on. Oh, Maxwell, a realist at heart on my bad days. Exactly. If you can't if you can't take care of me at my most real. So in this next slide, one of the feedback just from a participant just to give an example and we won't spend too much time here. But I like this idea that the beliefs don't have to be verbal. That the physical therapy context shows us how the range of movements that the body does not want to go to, for example, due to a traumatic experience or injury, embodies a belief that doesn't involve language. And this was a paper with Limonowski and Friston from 2020, where a person's hand was opening and closing, and then they were getting a VR visual that was delayed or was inaccurate. And so there's so much here about how our sensory feedback, our proprioceptive, as well as our visual feedback, multimodal systems are being used. And that's the mess that would take us so many hours and more to clarify. This is what the representation questions are about. So it's like the person, what is the representation of the current state of their hand being closed at that just pure phenotype level? Maybe somebody with a ruler says, well, it's whether the hand is closed or not. But it's not just that, because if you're getting visual input that's messing with you, it changes your behavior. So there's something else happening. And that's really a lot of what we're talking about. I think going through the figures 234 are going to be really helpful before we have closing thoughts. And thanks everyone for bearing with this was sort of a weirdly technically altered stream. Briefly, the Bayesian networks. So Bayesian networks are from Bayesian statistical sciences. And in these diagrams, the nodes are variables, they're like random variables, stochastic variables or parameters. The edges are statistical relationships, they're probabilistic relationships, and they can be undirected or directed. These models are quite common in statistics. And also, there's some really nice aspects about them in that you can bring prior knowledge to the table with your priors. And you can also estimate it directly from the data with a parametric empirical base. And to give an example about what this looks like, here on the top right, we can see how like an accident increases the likelihood that you'll hear sirens. An accident also increases the likelihood of having a traffic jam, bad weather can increase the likelihood of accidents or traffic jams. So these Bayesian networks are pretty commonly used in a lot of statistical areas. And we can contrast that with what are called Forney factor graphs. And working through this device and first in paper was definitely quite an experience. This is a complex paper, and there's a lot in it. But in a Forney factor graph, as they write in the paper, it's a type of graphical model that shares qualities with Bayesian networks and Markov random fields. Forney factor graphs afford a visually insightful representation of generative models, especially beneficial for complex models underlying hierarchical active inference. In a Forney factor graph, each factor is represented by a node, and each variable by an edge. So it's like a flip of the Bayesian. An edge attaches to a node if the edge variable is an argument of the node. And so here on the right is a table from the paper of de Vries and Friston. And it shows how we can think about these functions like addition, subtraction, multiplication as having variables coming in and out. And just to really quickly summarize the similarities and differences in the Bayesian networks, the nodes are variables, whereas in the Forney graph, the nodes are the factors or the functions. In the Bayesian networks, the edges are the probabilistic relationships amongst variables, whereas in the Forney factor graphs, the edges are the variables themselves. A cool trick about Bayesian networks is this model inversion, which is that the model can be gleaned from the data. And also the data can be generated from a model. Another cool trick of Forney graphs is that they specify a specific order for factorizable computation, making extremely massive models tractable. And then a last cool fact about a Bayesian network is that's where the Markov blanket formalism comes from, which is it's the set of nodes that insulates one part of the network from each other. From a probabilistic perspective, though there's a lot of nuance and how it's used in the free energy principle framework. And the cool fact about the Forney graphs is that it's linked to information theory and message passing, computation and other math. So in figure two, we see this generative model in active inference. And this is actually the model that Maxwell had on his screen earlier. And I annotated all the pieces. So just as Maxwell was saying, time flows from the left to the right. And in this is a again, a Bayesian network representation. So the the boxes or the circles, and there's different types of them, which are described in the caption, the boxes in the circles are like variables. So the initial state is a vector, the hidden states are variables, the state transition matrix is how the hidden states map to each other. That's B, a is the map between the states and the observations. And then figure three is the same thing as a Forney factor graph. And so this is showing that there's this equivalence with representing something as a Bayesian net and a Forney factor graph. And the Forney factor graph allows for these extremely large scale hierarchical models to get factorized and then computed in a tractable way. And so I should say like the advantage of using a Forney style factor graph is that the message passing is just you can just read it directly off the Forney style factor graph. And there's an algorithmic way of moving from a Bayes net to a Forney style factor graph. So typically intuitively, the Bayes net is easier to write. Because you can, you know, you can just literally write down the dependencies between all of your factors, like this. So you know, like, I don't know, this state depends on this other state. And you know, there are these transitions and et cetera, blah, blah. And so, you know, from this, you can move to the Forney style factor graph. And that tells you just directly how that will be implemented in active inference. I should say that the reason why I we discussed these at all is really just to show in the next figure how the figure for how the generative model and the generative process kind of link up. To me, this is where the interesting stuff happens, where like literally the generative process, and this is a model of the generative process, right, like how we think it's structured. It's in this little box here. And then the Forney style factor graph represents the message passing that is ongoing, yeah, outside the box to keep all the quantities within, you know, phenotypic bounds. Thanks, Maxwell. And yeah, I know that was a little bit of a rapid sprint through these figures. But it's so cool that there's a formal way to relate the Bayesian and the factor graph models. And just as you specified, it's easy to specify sometimes the Bayesian network just like, oh, well, the traffic jam depends on the weather and the weather depends on this other feature. And then the factor graph allows us to have an order of computation and a really implementable way to compute it. And also it has some similarities with message passing, for example, in neural network systems. So it's it's an interesting question that I've been having is, okay, we talk about this as a Bayesian theory. But if it can also be represented in another way, is it is it really a uniquely Bayesian theory? Or what do we gain by saying that it's a Bayesian theory? It's within the domain of Bayesian intertransformability or interoperability. But that doesn't uniquely disambiguate it. And that's why I think some of these philosophical questions that we're talking about here, like about representation and about cause are so open. Because it's it's not uniquely specified by the free energy principle or by active inference, like how exactly we should answer these philosophical questions. Because in the end, as Maxwell said, it's the technical details. This is what we're talking about. When we're talking about the the free energy minimization and how that bears upon the policy selection. This is what's happening. And also one more point before if anyone raises their hand to provide a comment is, um, notice that the free energy is linking to the policy selection pie P for policy. And that means that the organism isn't just minimizing the free energy of the observations given a sort of neutral representation. Actually, the policy representation, specifically how it bears upon how states transition into each other. That's what the actual computation is being done upon. So the observations are being used to run then upstream through the hidden states and their transitions into the policy that ultimately dictates what's going to be done by being selected by the organism or the system as the most likely. So this is really how deeply action is embedded in these models. It's not that there's a latent representation of the external states, and then you're doing a best guess on what you should do. It's actually like action is at the top here. The policy is at the top. And so I think that's really a nice feature of these models. And it becomes graphically clear. And then if somebody wants to have a philosophical framework A or philosophical framework B on top of this structure, or if they want to propose a different structure for how observations and policies are linked, that's something great. It's something tractable that we can talk about. But this also returns to the metaphor discussion. This is the real thing as it is. Cool. I know we had a sprint through there, but it's just coming near the end of our time. So any thoughts on these images, or are people ready to sort of go to our closing thoughts? I was just saying the equations are more interesting, but the equations are interesting. And the 2017 DeVries and Friston and this model, it just reminds me that if we can write down that Bayesian relationship, the probabilistic relationship between variables, then there's going to be a process model that is going to be how we can link it. And these captions are very informative. And there's really a lot to say here, but enough for today. Let's close it out. We can hear any last thoughts, but while people are preparing their last thoughts and raising their hand if they'd like, I would just like to thank everyone for participating. We're learning and developing each time, figuring out the tech each time. We provide follow-up forms to the live participants. So just check the calendar invitation, and it would be really helpful to have feedback. We also request feedback, suggestions and questions from the participants live in the comments or otherwise. And just stay in communication. We really appreciate everyone's engagement here, and it's been such a great learning experience. So if anyone wants to raise your hand, Alejandra first, and then followed up by anyone else who wants to, let's just have some closing thoughts. Well, I was wondering if it's possible to model updating when you are doing offline cognition without an explicit policy. You are not doing something online when this interaction, bodily interaction, you are thinking about something and then you have this Eureka feeling. So this is kind of a model updating without action. The short answer is yes. I just linked the paper to it. We've created a ruminating agent. So basically this agent combines the affective inference scheme that we have developed over the last two years, where you're effectively inducing a new hierarchical layer of state inference based on state and precision estimates at the lower level. And these are affective states that then guide lower level precision estimation. So we combine that with what's called sophisticated inference, which is the new generation of active inference agents. Vanilla formulation, current vanilla. You could see that in the generative models that we just showed. They include beliefs about counterfactual observations that we would make as we propagate states into the future. Sophisticated inference is about propagating your beliefs about the observations that you would make in counterfactual futures. And so by combining these two things, we have an agent that isn't acting, but that ends up ruminating and catastrophizing about its own cognition. So I just linked the paper in the chat. We submitted it to the International Workshop on Active Inference. So it's a little poster now, but we're slowly expanding that into a full manuscript. But the kind of summary six-pager that just does what you were suggesting, Alia, is this is it. Thank you, Maxwell. But this is kind of confusing with the inactive view of cognition. This is why we're not like classical inactivists. Yeah, we reject the equal partners principle. It's just not the case that the body, the brain, and the cultural milieu are going to be involved in the same way for every task. So for rumination, you could, I guess, appeal to a history of interaction, which is the reason why agents are able to entertain beliefs about other agents at all and so on and so forth. But yeah, you're right. There isn't much inaction here, and that's fine, I think. Alex Kiefer, and then anyone else who wants to raise their hand. Yeah, so I guess in lieu of a proper closing statement, I just wanted to point out that Maxwell has some really cool new work on how the neuronal populations sort of can be seen as an acting inference on their own level and that this is one way you could see structural representations being sort of realized. So that's one thing we didn't have time to get into, but I just wanted to say in general, I think regardless of the precise philosophical take that you have on all this stuff, it's cool to disagree on that level, but this literature has produced some really amazing work, and I think it's pushing forward in a really interesting way. One advantage of active inference, certainly, is that you can actually write down the generative model and sort of transparently see how it works, and you don't get that out of just unsupervised learning. Cool. Any other thoughts? Awesome. Well, thanks so much for the discussion. We're gonna re-watch, understand technologically what did and didn't work, and for next week, provide all of you plus anyone else who wants to participate with more of a specific spec sheet for how you can prepare and make sure that it will be a really like seamless experience, and as far as the timing goes as well. Next week, we will move on to Act Imp 7, which is going to be about variational ecology, which is a really awesome paper close to my heart as a biologist as well, and we would really appreciate anyone who wants to participate in these conversations next Tuesday at 7am Pacific or the following Tuesday. Both of these discussions will be really great, and it's quite a complex paper, so I'm sure we'll have a lot to go into again. So once again, thanks so much everyone for your participation. Really brought up some awesome ideas, and I hope we conveyed to those within and without of the field that there's a lot of discussion ongoing. By no means are these questions resolved, and it's an opportunity for people to bring their unique perspectives to the table. So thank you all again for your participation. Really appreciate it.