 All right, greetings everyone. It is September 9th, 2022. It's the 15th meeting for Cohort One and we are discussing chapter six. We're in the first of two discussions on chapter six of the textbook. We're going to go to the questions page and start there where we have one question that we explored a bit last week in our welcome back. And then there's at least one question prepared but anyone is welcome to add more questions here. While we turn first to this question, what are the four steps in the recipe to construct an active inference model? And can we make templates that facilitate people walking through some of these stages? How much model building do people prefer to be engaged in this section of the textbook group? Do people want to have one or more model that they're personally developing? Do they want those to be scaffolded in a shared page to help increment models long and see models at different stages? Yeah, Brock and anyone else. I just, it's the same problem as like the first half of the textbook without math. It's not going to go well. This half without modeling, it's not, it's going to be quite difficult, I think. So I think we should just get that out of the way that it's, yeah, it's not about whether you want to or not or whatever. If you want to just understand it at a cursory philosophical perhaps level, then maybe not. But if you actually want to understand it, then we are going to have to model something some way. It doesn't have to be the most complex, most precise one. But yeah, I wonder if we could develop like maybe a couple examples or simple ones that people might like to model. I don't know, yeah. Those are all going to be purely agent-based, but. Thanks, I agree. I wanted to frame it as a question, but I agree with that perspective. And following along, as we structure a few specific cases, the rat and the T-Maze, the isocating, which are used in the textbook, then some of the examples that are used recurrently in the literature, like Birdsong and a few others. And for these many scripts exist and helping people who might not have the experience to set up either by with some walkthroughs. Okay, here's how you get octave running. So you can do this MATLAB script or here's the standalone DEM demo because this textbook really only gives the pointer to MATLAB scripts and methods. Like when they say that the standard schemes can be applied, they mean a MATLAB script. And that's exactly what the step-by-step guide, model stream one is built around. And chapter seven on modeling in discrete time is going to be akin to the step-by-step guide, but it's not step-by-step-by-step. It's more like it takes two bigger steps. Okay, how is this four step recipe for active inference modeling similar or different than approaches that people have seen for systems modeling from other frameworks? For example, in a reinforcement learning or cybernetic modeling. So something more recent and computational or something more pre-formal or computational. Should this recipe be surprising to people coming from a certain background? Are there sub steps that are relevant to consider? Brock, yes. I don't have extensive formal modeling experience, but I just necessary kind of most of the things that I've done for less half decade, like entail some form of this. And I don't see how you could model anything without going through these basic questions here. Like this is, it's a bit, I don't see how this is specific or exclusive or whatever to active inference. It's a bit ambiguous to me how that, I mean, besides specifically the generative model, I guess, but which system are we modeling? Another way to ask that is what questions are we trying to answer? Or who, I think Lyle brought this up, like who are you making the model for sort of thing? Because the same, like you were saying just earlier about the particular choice of how you're modeling it, it's just a choice on that system. And that entails particular kind of hidden states, markup, blanket, et cetera, that may be relevant or may not be, depending on the questions, what you're trying to model. But that arises from those questions, I guess is what I'm saying. Thank you, Ali. Well, yeah, I just wanted to mention that referring back to your previous question, I also am currently developing a model, actually an emotion perceiving agent by pretty much closely following those four steps. But I'm getting some helps in that regard in how to actually implement those four steps in my particular situation. But on the topic of modeling, well, at a very basic level, we can categorize different approaches to modeling in terms of answering to three different questions. The what models, the how models, and the why models. At least that's the kind of approach mostly done in neuroscientific modeling. So in my opinion, these four steps and this recipe can be applied mostly to the how and the why, what models are mostly descriptive models of phenomena. So I think my connection, yeah. What models are mostly descriptive models. But when we want to model an agent and self evidencing agent which tries to, which behaves as if it infers something about its environment, we have moved beyond just the descriptive stage and we're dealing with the how and probably even the why questions there. Thanks. When I hear why I always think of Tim Bergen's four questions, the four whys, Aristotle's four whys. Why is a subjective modelers pluralistic playground? And it doesn't lend itself to one specific even type of answer, let alone a framing or a specific parameter, a combination. And I think that's a huge aspect of the move from like, what are systems to how am I going to model the system, which is a part of the turn that happens when we start to model is we dispense with absolutism. All of a sudden these questions about what we're trying to do, our preferences and affordances and constraints and limitations and everything come into play and we might make a portfolio of models that are shining light on different aspects of even the very same phenomena. So it's kind of like an empirically grounded pluralism and pragmatism when we start to be engaged in real useful modeling. And that is something that I feel that one learns on the field, not necessarily bystanding. So that's just to say that the modeling is really important, even for philosophical reasons, to say nothing of some of the things that are coming to play in chapter seven and beyond where it's like, yeah, the C vector has this many entries because why and how many rows and columns does this have? But once we understand why the rows and the columns and the shapes of these different entities are the way they are, then we just have to use this scaffold to think about certain systems, consider their form at a very structural level like is it in discreet time or in continuous time and so on. And then we might set up not just one generative model, but again, families of them. In the future, we hope and expect that active block friends will facilitate us doing parameter sweeps across families of cognitive models and generative models. But even in this boutique phase of model construction that we're in, in current year, one would still say, okay, I wanna have one model where it's either acceptable temperature or too hot. And I wanna have one with a three-state model, too cold, just right, too hot. Here's gonna be a four-stage model. And just being able to specify that again instills pluralism through action about how we're modeling a system, which is the philosophical point that it seems like sometimes isn't grasped by those who don't engage in the specific discussions around modeling a system. Okay, so where does figure 6.1 come into play? It's a rehearsal of some of the figures we've seen earlier in the book and many, many other places. This is the particular partition. The particle is the internal and the active sensory states. So internal plus blanket states equals particular states. Those are the particles. If we were looking at dust under the microscope, like Brownian motion was describing initially, the particles were the dust particle, and then the external states were like that, which was not the dust particle. Partitioning the generative process and the generative model off from each other. We're dealing in Bayesian mechanics with particular states with particles, with things that are able to engage in sensory motor type loops. This is just bringing in a few more pieces than that, which is just to call attention to the direction and the existence of the arrows in the work. How particular is the free energy principle of Aguilera at all? And in follow-up work, there's discussion around which topologies on this particular partition facilitate or enable which kinds of formal claims to be made. For example, one might find it kind of interesting that sensory states have a bi-directional arrow with external states, or might find it equally interesting or differently interesting or less interesting that also active states have an arrow pointing back to internal states. And of course, the questions that we have raised all along the way of like, okay, let's just say that we're gonna model a person's arm. What are the active states? What are the sensory states? What does it mean that there's a bi-directional arrow connecting them? Is this the only topology for this particular partition? Could we have the around the clock model? External states influencing sensory, then internal, then active, and just going around the clock without any crosstalk in the blanket, without any back arrows. Is that plausible? Does that change the formalism? On one hand, how could it not change the formalism when it's changing the structure of how variables are related to each other? Or maybe there are formalisms that are actually abstracted away from any topology choice because what is being described are the flows and the gradients on these variables. But then what are those variables? I think, I mean, that just highlights this need for modeling. But if we just really briefly go back to that, since one, it's like the fact that there's this bi-directional arrow between external states and sensory states is a bit confusing in that, again, the language, like, well, if it is acting on the environment, isn't that an action state, you know, or something like this? In what way is it bi-directional? So could you not draw that as one arrow going to the sensory state and one arrow from the sensory state back to the environment and then say what functionally does that entail? And then you're doing this particular selection of how you're modeling, right? But I think you have to do that to understand why that is bi-directional there, right? In that particular framing of the generative process, generative model there, right? Which is still, to me, also slightly nebulous, but I think, yeah, just the bi-directionality of it between the action and sense states that kind of makes sense almost that you sense your actions and your actions are a response to your sense, but that's not really what's being done there precisely either, right? So... Dis-explicating that is what, yeah, we need to separate and explicate what, yeah. Yeah, another really important question is like, this action perception loop is appealed to early and often. It's the first picture in the textbook. Before the high road and the low road in 1.2 is 1-1. We're talking about agents and their interfacing with the niche. And we're gonna end up calling how we model agents or generative model, and we're gonna end up calling the niche and the dynamics of the niche and how it's influenced as the generative process. And we're going to call the interface between the generative model and generative process a Markov blanket, and it's gonna inherit this technical definition from Bayes graphs, and we're gonna add these variables, and we're gonna connect them a certain way, but we're talking about the feedback loop with the agent and the niche. Oh, so what is this then? Are the observations the sensory states? And this isn't just like a low hanging fruit, like, well, then why doesn't it, like, you know, why couldn't we call them? I mean, they are called as X and Y interestingly in the continuous time formulation. And I get it that they're being distinguished here because this is the continuous time, POMDP and the discrete time POMDP, but where do we see active states here? We see policies, so our active states along the branch influencing the B matrix, how the hidden states change through time. So there may be a very simple trail from this action perception loop framing to the discrete and continuous time POMDPs. Yeah, I think just to me the simple, if you instead of thinking of this as a circular thing where you're literally going from one state to the next state to the way you might do it in a normal state machine, if you're thinking of, no, the active, when you have an active state that however it is changing, that is necessarily entailing change in the sensory external state and the internal states and vice versa here with the sensory states, like that if you perform some action of some kind, there's some active change in that state, like you're necessarily changing the internal hidden state. So like if you grab a glass of water or something like this, like what you're modeling now has changed slightly and your internal states and the sensory states are doing the same thing where for the glass of water or whatever was this object just sitting there doing nothing, but now it's become a thing to drink out of and that's changed in some sense the kind of external states that are being fed into your sensory states, like it's changed the Markov blanket that you're kind of selecting for or relating to. And I mean that, yeah, the PMDP it's, I mean I think that is visibly there, same sort of thing where if you have this observation, you're necessarily creating a new internal or sense state that is gonna change your palsy selection. So and that's gonna change your sense states. So returning to the recipe one question that I think will reduce our uncertainty about as we really do the modeling is how do we account for residuals and understand which modeling residuals, which is to say variability in data that are not explained by the model? Which and what amounts of residuals are acceptable? So if we were doing a linear regression and the only data we had on hand was height and the only data that we wanted to predict were weight, then if we could just say we've used all the data and whether the model fits with an R value of 0.1 or 0.9, we know what the residual is of height on weight regression. And here's a number that describes what fraction of the variability was described with this regression, R value of one being like, it's all perfectly on a line, R value of zero being like it's just a total scatter plot. We can say how much variability is explained. And then we're done with the empirical data that we had so we can't explain any more variability. When we're looking at empirical traces of behavior to explain even a small fraction of behavior for some systems, it might be the case that the generative models already must be quite complex. In another scenario, it might be the case that a very simple model like looks to the left are 90% of the time followed by looking to the right. I mean, on average, isn't that true? So then when we are evaluating models, whether in block prints, sweeping across models or just qualitatively or in a boutique way, are we looking to reduce the residual to zero? What traces of data are we looking to explain variability in? And then if we're not going to be with empirical traces of behavior looking to reduce our unexplained variance, what is gonna be the criteria by which we do model selection on? The AIC and the BIC, which are information criteria, balance variability explained across families of models with their degree of parameterization, penalizing having more parameters, rewarding better explanations. So seeking to find a balance with models that are on the kind of Pareto frontier of explanatory and simple. That's great. That's like one extra nuance on top of merely explaining more variability with a smaller residual. It also adds in the penalty for having more parameters. So that in theory and in practice, stops the modeler from just, okay, well now we're gonna add in the temperature in this other part of the world because 1% of the variability just happened to be explained. So now we're just developing these totally spiraling, looking for more and more data sources to explain a decreasing amount of variability that's left. So it's good to pull back from that edge through the use of information criteria. But at the core of the criteria and is still the imperative to reduce the residual, to have a model that fits within a model, the way that we fine tune it so that it fits data well, but not overfitting. And then across models, we can see an analogous process of wanting the model structure that fits without, for example, overfitting or over-including parameters. So I think one question, which might be added here is like, what data do we have? If we're doing a didactic model and we just wanna sketch it out and run a simulation and just purely generate data, that might be useful in certain settings. But the question of what behavioral data we have is very non-trivial, if we're going to be approaching this as an empirical analysis problem. Brock? Just another question there for us, what observations must we make or something related to what data do we have? If you don't have the data that if there's not sufficient affordance information contained in the data to model the problem, then... Do we have to model our action selection at a second or a third order? Yeah, exactly, yeah, to realize the right epistemic value. Okay, so that was figure six one. Discrete and continuous, shallow, deep and hierarchical. So just to address this, which might come up, shallow versus deep is describing how iterated the model is through time within one type of time. So you could have the one hour shallow model where hours are our units or you could have the 10 hours where it goes one, two, three, four, five, six, seven, eight, nine, 10, where you could have a hundred depth of 100 with a time click of one or one could have a hierarchical model where there's 10 hours that make like a deca hour and then there's 10 clicks of the deca hour to reach a planning horizon of 100 time steps. So deep is describing within one unit or counter of time how iterated that counter is. Depth is describing strict hierarchical nesting. And so they can both be used separately or together to describe events over longer and longer time horizons. Crucially, this is in the context of language processing. The duration of the word transcends that of any phoneme and the sentence transcends that of any word in the sequence. And if we assume paragraphs are made of multiple sentences, paragraphs always transcend and encompass sentences. So anyone else wanna just add a point? I'm sure that you fellows are familiar with this distinction but just wanted to point it out since the temporality in these models and doing model selection on different forms of temporality is going to be an essential piece of the puzzle. Yeah, and it also reminds me of the Schenkerian analysis theory that if you remember we talked about earlier and it's hierarchical, I mean, it's temporal hierarchy that tries to describe the whole musical piece as a kind of hierarchical layers which I think maps perfectly unto this specific example of language parsing because that also is a kind of thing that should ideally be modeled in a very hierarchical way. But the caveat here is unlike language which is at least in some languages are necessarily hierarchical. Music is at least tonal music or even non-tonal music are not necessarily, they don't necessarily have this hierarchical structure. So depending on the kind of music or the genre of music, we might have either a very shallow and non-hierarchical structure or a very deep one. So that's something that I think, again, refers back to the kind of system we're trying to model but even that kind of system needs to be more, I mean, specified in a more granular level in order to have an effective modeling strategy there. Thanks for bringing in that domain. Like it makes me think of the intro, the chorus, the bridge, the verse. Perhaps these are only in certain genre of music but analogously and then the measure transcends any beat in the measure with a special case of like a one-one time signature or something and so on. And then although language is described in a strictly hierarchical sense, I'm wondering how syntax and grammar actually create kind of a strange loop. For example, it's trivial to say that phonemes are nested within words and sentences and so on. But what about the semantics of a sentence that contains commas and exceptions? And then at the end, the speaker says the previous sentence is not how you think it is. That is also causing kind of like a deep linguistic recall that can be trivially modeled as containing nested units but will a model that narrowly considers the semantics of words, then the semantics of clauses, how will it be able to address some of these questions? Here, I think we can point to a really excellent attribute of the active inference modeling framework. So in this discussion where the LW paper was cited, we could just marvel at nested systems and draw blankets on blankets and so on. And they could be drawn around every organelle and every little lipid granule in the neurons and or the brain regions. And so we discussed a little earlier like doing structural model selection on which one of those are relevant. However, this does not mean we need to attempt to model the entire brain to develop meaningful, pragmatic simulations of a single level. For example, if we wanted to focus on word processing, we could address some aspects without having to deal with phoneme processes. This means we can treat inputs from parts of the brain drawing inference about phonemes as providing observations from the perspective of word processing areas. So one could say, I'm gonna make a figure 4.3 where S is the true word. And phonemes are being passed as observations to my word inference engine. And someone can say, but aren't phonemes the result of inference? And someone can say, great, make that module in blockference, make that module and pass me that data. And then we can do a nested model and we can graft those models together. But through the Markov-Blanket formalism, not even the whole Markov-Blanket bounding a specific particular entity, but just the trivial Markov-Blanket that exists insulating any two unconnected nodes in a base graph, we can use that broader, still technical concept of a Markov-Blanket and just say, all right, phoneme observations come in. And surely we could make a model where another type of information comes in and the inference is on phoneme and that's what's passed out. So I think, again, to point to this advantage of active inference modeling, it allows us to situate phenomena as complex multi-scale nested systems and also bite off what we can chew and makes it an empirical question of how grafting together which different structures might be useful, but we don't need to go like turtles all the way down. We can just say, even if it were turtles all the way down, we're modeling the third through the fifth turtle. And the fifth turtle gets a top down prior and the third turtle gets sensory input from one below it. And that's the part of the stack and this is the lateral width that we're modeling. Another advantage of active inference which not even saying this is a unique advantage is that this goal directedness and corollary capacities like counterfactuals on goals and the way that counterfactuals on world states influence our goal selection, nested goals and transient goals or conditional goals can be resolved in an unfolding action perception loop. That's something that as this example of like wanting to enter an apartment, of being goal driven over multiple timescales and then being able to again zoom in. So it's like, oh, grab the keys. Okay, well, here's the key grabbing module. And so it says, well, the key grabbing is actually accomplished through this kind of grasping behavior. Then that could be modeled or you can just say we're subsuming finger grasping kinetics in the grasping module because the phenomena that we're trying to explain and the data we have is just about when things were grasped or maybe the only observations we have is when apartments were entered. So we can actually, I expect frame systems extremely expansively and clarify where we're doing formal modeling. I think maybe this is just a philosophical or ontological framing of it, but I don't see how you can model anything well without admitting that it is either incomplete or partially consistent or... But there is some hidden state, which is probably not included in the model. And to me, active inference kind of natively has an affordance, at least, at the very least a bookmark to do that. And so whatever depth or hierarchy you modeled the system, it's okay, it's okay that there's still hidden state and it's incomplete if it sufficiently models, answers the question at hand, then you can always add another layer or make the model more complex and that's just what we were going to do anyways. Nice comment, I totally agree with that. Again, thinking back to the linear model example, where is that extra information about like adjacent possibles for the model? It's unstructured. It's kind of like either the structured information is going to enter your linear regression just so and everything that's outside of your linear regression is we're gonna need to go back to the square one. If we include a second observable, we're gonna have to go to square one and do the model selection all over again and test for all the repeat interactions again. But in these multi-scale models that we're discussing, like you said it was a bookmark, it's kind of a nice way to frame it, which is like the phonemes. You're bookmarking or leaving an open USB port or something, anything that outputs a phoneme will plug in to that part of the model. Anything that is able to hear can be plugged in to the frog riveting. Anything that can see can be plugged into the visual component of the frog. And so it's almost like consistent with all of this blanket talk. We're being clear about what is modeled formally and the borders of the formal model are degrees of freedom as formulated to be expanded upon and composed. One other, I'm not sure the context is something here Ben Gert will say, attributed to Peng Shui, but Pei Wang that in circular reasoning, when your circle gets big enough, it becomes coherent. Maybe another way of saying like that if your model is insufficient, that if you add enough states enough, if you expand your blanket enough eventually, if you're modeling something correctly, I guess, it can, the general process can match. Your blanket states can come into alignment. Nice, some more active inferences. So extericeptive, proprioceptive, and tericeptive, we're going to use active inference modeling. Memory, attention, anticipation, planning as inference, we're gonna use active inference. That is what it looks like to integrate disparate fields using one model. One family of models. So, others can work out some fun ways to communicate and frame it. But I think when people are like, how is this simplifying? When it might feel like it's bringing in a lot or even stepping on namespaces that people already have familiarity in, like we discussed earlier, you could ask how is memory related to attention? Or perhaps more saliently, how will you model how memory is related to attention? Are you gonna go on archive and just look for neural network architectures? Are you going to read a book written before computers were invented and look at pros? Are you going to dive into the psychoanalytic tradition or something that's maybe even clinically biomedical? What's the move? And active inference addresses that. In these theme of, where's the people use that mean different things? Or 400. We have Bayes optimal behavior. Well, then why do things go wrong? And there's multiple layers to unpack. First is going wrong is about your perspective on how the system, perhaps how you prefer it to be or how you expected it to be. But like given where you put the dish when it fell off the table, that was just compatible with gravity. So it was finding its optimal position and fragmentation where you placed it. The ball rolled downhill as a consequence of where it was placed and the slope it was on. And framing behavior in that light, we're modeling behavior as rolling to the bottom of a bowl. What's the bowl? What's the ball? Those are the questions. But it's like chemical reactions proceeding under Gibbs free energy minimization when we have policy selection driven by variational and expected free energy minimization. But how can things that go wrong be optimal? Well, it has to do with different parameters in the model. Given how they are, the model performs optimally. That doesn't mean computationally efficiently. It doesn't mean adequately. Doesn't mean it's satisfices, but it may be framed as Bayes optimal. Fixed and learned behaviors. This is gonna come into play in chapter seven. There's an amazing continuity in theory and in simple examples with perception and learning, which is to say that perception happens over faster timescales or perceptive like processes happen over faster timescales while learning like processes happen over slower timescales or even just more generally, perception and learning refer to parametric update processes. And you might even be able to situate the same example both ways. Like if we see a ball move across our visual field, are we perceiving the location of the ball? Are we inferring the location of the ball or are we updating and learning our position of the ball? For those with a computational background, learning often equates to parameter updating. And so in that sense, any kind of dynamical perception is learning. So for some backgrounds, the difference between perception and learning will be seamless. For other backgrounds, those are gonna sound like totally different processes. I mean, isn't perception when you see the book, but then learning is like when you understand something about the book. And then speaking from a more modeling perspective here, a fixed model or fixing a parameter of model reduces the model's complexity immensely or another way to say it, taking a fixed parameter and making it learnable introduces multiple hyper parameters. And there's no single way to address parameter learning. You could say, well, we're doing moving average learning like a Kalman filter or just a simple sliding window. We're taking the average of the last three. But now you have to parameter sweep across how many? And with a single trial of data, there may be, it may not be clear which learning strategy is implemented because especially if we want to differentiate learning strategies empirically, we would want to see like multiple comparable trajectories within and amongst individuals in similar and different contexts. But now, you know, our lab has gotten quite big and we're doing quite large model selections even for learning and updating parameters that might just conversationally seem really simple like preferences being updated. So here's this description of when it comes to inference and perception as being fast changes and learning as slower changes, though they're being modeled in a really analogous way. It is worth noting in this book, we exemplify a rather simple generative models that are defined using tabular methods, e.g. with explicit matrices or tensors for priors and likelihoods in small state spaces. In comparison, more sophisticated GMs are being developed in machine learning, deep learning, robotics, et cetera. So is active inference deep learning? Okay, good fellow, generative adversarial networks but with Benjio and Mirza. But are we saying those are formally active inference systems or are we just saying that we can think of it like bird song conceptually and then are we just adding a wrapper or a descriptor to the GANs that are non-active colleagues or building? Is it shining any new epistemic or pragmatic light on GANs? I mean, here's a whole paragraph on GANs. Home is behind. Oh, that's chapter 10. What happened? But there's still a whole paragraph on GANs. Did you see that? How did that happen? I'm not sure what the optimal way to view a PDF is or that one exists, but I don't think it's a browser. It's not bad, but it's not optimal. Yeah, I'm interested in this, again, the generative model. Oh, I know what those are, so generative model like, and how specifically GANs here in active inference is one, is the rectangle a square, sorry, square rectangle, rectangle not a square here, is that what is being said or is there a degree, a index to blanket here of varying degree? Yep, yep. As far as I know, and also you fellows might know, like this is being pointed to variational auto-encoders and variational Bayesian inference are directly implicated as potentially recursive cortical networks are and world models, but if these bridges can be made solid, like you said, with squares and rectangles, so that we can say all some are none of X, R, Y, or here's how you project this model to that model, it's going to just break a dam with possibly certain applications of active. Any final comments on six? And I think certainly by two weeks from now, let's look at the SPM implementation and look at some of those that are pointed towards in the book as well as some of like the epistemic chaining and PMDP type models as well as anything that anyone else builds. So any final comments? Yeah, I just wanted to point out the recent work by Fields et al, especially one, the two papers that first and was also as a co-author of is one of their main objectives is to cross that bridge between active inference and VAE via the CCCD system. So that bridge they're talking about in this book is already being investigated to be filled. So yeah, they're pretty interesting developments going in that regard. Okay, any last notes before I stop the recording? What is that paper? I'll send you the link from the discord. I forgot. Maybe just one, I don't know. We haven't tried to make a effort on the code to lay down some basic examples from the first half. But should there be some, if we're gonna model stuff is should we have a subpage for that or do put something somewhere in the code for that? I don't know. We've had this code section from the beginning. Yeah. And just however it should be totally modifiable full control. So however this and anything subpages need to be modified and the GitHub link to a textbook, anything that people suggest, just let us know. Any hails?