 Hello and welcome to the Active Inference Lab. This is Active Inference Livestream number 20.0 on April 15th, 2021. I'm Daniel and I am here with Blue who will say hello. Hi, I'm Blue Knight. I'm an independent research consultant out of New Mexico. Thanks for being on Blue and looking forward to this conversation. Welcome to the Active Inference Lab everyone. We are a participatory online lab that is communicating, learning and practicing applied Active Inference. You can find us at the links here on this page. This is a recorded and an archived Livestream. So please provide us with feedback so that we can improve on our work. All backgrounds and perspectives are welcome here and we'll be following good video etiquette for Livestreams. 20.0 is a contextualizing stream for the upcoming discussions that we'll have on April 20th and 27th. About this paper, The Emperor's New Markov Lancets and we'll have some of the authors join us as well. So this will be an awesome discussion. Here we are in number 20.0 and the goal of this discussion as stated is to set the context for the upcoming discussions that we'll have on number 20 and number 20.1 and that is about the paper, The Emperor's New Markov Lancet by Jelle Brunberg et al. And the link is provided here. The video is just an introduction and us kind of working through the paper so that we're ready to interact with the authors and really grasp what they're discussing. And we'll be going through the usual sections of the paper. Any just overall thoughts before we sort of lead into it, Blue? Just what did you like about it or what drew you to the paper? I liked this paper. I thought it really provided an overarching view of The Markov Lancet and its use throughout like historical use and also throughout the first in context. So I thought it was really thorough and gave a good treatment to all those topics. Nice, agreed. So the paper is Emperor's New Markov Blankets and the links provided here. The aims of the claims of the paper, that's where we're gonna start with the aims and the abstract and the roadmap to just see how the authors laid out what they're going to be trying to do. So we're gonna evaluate their aims and then see whether the claims they make are valid and whether they support their aims that they laid out. And they write, I'll read the first one, Blue. The aim of the paper is twofold. First, we want to explain how it has been possible for such an innocuous technical concept as a Markov Blanket to come to be used in order to settle central debates in the philosophy of biology and cognition. So that's the main key aim here is how did we get here? How did we get to debating a statistical technical concept like the Markov Blanket and having people talk about consciousness and emotion and multi-scale systems. So how did we get here with the Markov Blanket? And then, Blue, you can go for the second aim and what you thought it meant. They say we will trace the development of Markov Blankets starting from their standard application in Bayesian networks through the role that they play in variational inference to use to their use in the literature on active inference, which is what I had said about the paper that I liked. They say we will argue that in the course of this transition, a new and largely independent theoretical construct has emerged, one that is more closely aligned with the notions of sensory motor loops and agent environment boundaries. Do you have any thoughts on that, Daniel? It's going to be the core distinction that we're going to hit probably 50 times, is that there's two ways to talk about boundaries, maybe the ones that really are there and then the ones that we infer and that's what this paper's about. So here we go to the abstract. Markov Blankets have been used to settle disputes central to philosophy of mind and cognition. Their development from a technical concept in Bayesian inference to a central concept within the free energy principle is analyzed. So that's the historical component of the paper. Then there's the sort of distinguishing part of the paper. We propose to distinguish between instrumental Perl Blankets and realist Friston Blankets. Perl Blankets are substantiated by the empirical literature but can do limited philosophical work. Friston Blankets can do philosophical work but require strong theoretical assumptions. Both are conflated in the current literature on the free energy principle. And then their conclusion is consequently, we propose that distinguishing between an instrumental and a realist research program will help clarify the literature. Good? Good. Onto the roadmap, which now has a 55 miles per hour speedway on the left side for the rapid run through. They start with introducing Bayesian inference and really walk through it in a nice way that we're going to walk through as well. And go from kind of the simplest form of Bayesian inference to surprisingly nuanced forms that introduce networks like Bayesian networks. And that's where we get to this concept of a Perl Blanket. And then it's by tracing the history and the citations and the formalisms around the Perl Blanket and the Markov Blanket that we find out where it intersects with active inference, which is where the authors propose that this Friston Blanket concept sort of surreptitiously or without being noticed starts getting deployed in a way that's sort of sometimes one way, sometimes the other way but it's actually the looseness around the definition that in their opinion gives rise to uncertainty which is not what we're about here. And in section five, they really just provide more examples with walkthroughs and graphs and figures that we're going to walk through today with how is the Perl Blanket different than the Friston Blanket sense of the Markov Blanket? And then how does that relate to this key distinction at the end between inference with a model versus inference within a model? So kind of like generative models versus generative processes, that's going to come back and it's a great paper. So we hope that you listen to these discussions and also read it. Any thoughts on the roadmap? Nope, it's gonna go fast. Yes, it's a long way. Yeah. So the title of the paper is an allusion to the story called The Emperor's New Clothes. So people can read through the way that it's written here but The Emperor's New Clothes is a interesting story. It's a polysemus story. There are many meanings of it. So what is The Emperor's New Clothes about blue in general or what is it doing here in the title? And we can definitely ask the authors but what do you think it's doing here? So in my understanding of The Emperor's New Clothes, it's definitely like without reading the screen, it's a story about an emperor who wants to goes to like the best clothes maker in town, pays them all kinds of money and the clothes maker proceeds to make them clothes that are invisible. And then when the emperor sees essentially maybe what he wants to see in the clothes, like sees this and dresses himself in them and then goes parading throughout the town naked because the clothes are invisible. And then the child says, hey, that guy's naked. Yeah. And then here's what I had forgotten was although startled, the emperor continues the procession walking more proudly than ever. So it's like, it's not like, oh, then he was so embarrassed that he stopped doing it. So maybe it's a story we'll keep in mind and we'll ask who are these characters or what do the authors mean by that? But also at this sort of general level, why does this paper matter? Why are we talking about it and studying it and discussing the implications? Because it's good to be clear when we're talking about science or philosophy. And an unfortunate reality of science is that it's easy to slip in metaphysical claims into so-called facts of the world or even just measurements of the world. And it's not good to slip in ungrounded metaphysical claims. And then also this realism, instrumentalism, distinction is something that has been dealt with basically in every field. Oh, is it the model of language you're talking about or are you really talking about language or those kinds of differences between systems as they are and systems as we model them is something that is a nomadic concept. So to see it appear in active inference is cool but also it's exciting to be involved with the authors and see how the scientists and the philosophers are actually dealing with this question in a new way at a new time scale because of maybe the way that they're communicating with each other. The big questions in the paper like sort of beyond just what they set out to address would be like, how is physics related to metaphysics? What can we study about electrons and temperature and mass that would tell us anything about whether a spirit exists or what's really out there beyond Plato's cave? How is scientific modeling related to reality? And then a big question for a lot of us and a lot of people outside of our active inference community is like, what does the free energy principle and active inference really say about pragmatic concerns and then totally independently what do they say about philosophical concerns? And it'd be awesome if they were 10 on both but what is the actual reality of what these frameworks tell us about pragmatic concerns and philosophical questions? What big questions resonated with you, Blue? The biggest question really that resonated with me I think is, are we using the right tools to study the right things? So I think that when we think about active inference and cognition and statistics, are these all, should these all be represented by a Markov blanket? Like is this the right thing to study this with? I don't know. How unified do we want models and systems to be? On that, this is sort of the last little general signpost before we plunge into the free energy principle and active inference and everything. It's just a little distinction on maps and territories because realism and instrumentalism, it sounds really philosophical and there's isms and it sounds like a belief system but it really boils down to this map, territory distinction. And so the authors wrote, in our view, the free energy principle literature, so whatever it is, even if you've never heard of this idea, it consistently fails to distinguish between the map, a representation of reality and the territory, reality itself. This slippage becomes most apparent in their treatment of the concept of a Markov blanket. So if we're ever getting lost in the weeds with Friston and Pearl and instrumentalism and realism, we think like map and territory. And then what's really cool about this topic, especially in the active inference setting, is that two of the papers that are cited in this paper as relevant prior work are Mel Andrews' paper and also Thomas Van Es and Ines Hipolito's paper and these were on active inference 14 and 15 and now we're here in number 20. So it's kind of like we're going quickly and we're speaking with a lot of the authors. And then also on the left is a paper that I co-authored with RJ and another colleague, Miguel. And we kind of stepped back even further than active inference and just asked, what does it mean to map across systems? And we highlighted that sometimes when mapping across systems and using the complexity science approach, people use Bayesian statistics, networks and predictive and counterfactual approaches. Those are kind of like three key ideas in making systems similar to each other or comparing different systems. And those are citations to the active inference literature because active inference is drawing together a lot of these different domains that might be seen as different like Bayesian statistics, network science, cybernetics, ecological psychology. I mean, there's different words because they're different areas, but we're actually coming at it in a new way. And that is why we're thinking about maps and territories in a new way because it's not just like cognitive maps, yes or no, or maps in the city, yes or no. It's a lot of new things coming together which is why it's good to clarify. Adding on to that is the Fields and Glaze book paper that we did that mapped the two spaces, right? From one to the other, like the category theory. Yes, nice, the mapping. And so, and then what if the map is the territory? So let's get to FEP. So for this breakdown, every time in reading a paper, we get to reintroduce the free energy principle because it's always good to start with sort of a first pass on what it is because this video might be someone's first time hearing about free energy principle. And also it's a good opportunity to see how the authors frame free energy principle and then use that as a scaffolding for our understanding and our curiosity instead of us trying to get to the best explanation week after week after week, we can develop our understanding week by week but then always be inspired by the way that authors have phrased it in their writing. So on the right side with the colors, I kind of broke down into like almost haiku or just short sentence, a little bit what they were saying. And what it comes down to is that the FEP is a normative mathematical framework. It's so normative like it says what should happen. Like evolution by fitness maximization says that it should be maximizing its fitness. That's kind of similar. It's not saying what you should do but it's saying how systems should be. And it's about systems that are self organizing and how they fit into their niche. FEP is drawing on physics inspired machine learning approaches and so part of the appeal and the excitement of the free energy principle is that it brings together machine learning and physics and math with also potentially some qualitative dimensions but even without dipping into the qualitative or the aesthetic or the embodied or the inactive at the very least from a control theoretic perspective, the free energy principle adds in a lot of features that are absent from other frameworks like network theory alone, Bayesian alone or predictive systems alone, combining those types of approaches in a really interchangeable way with system properties like learning, attention and action planning is very powerful. Active inference is a theory that is a corollary to or it's a compatible implementation of systems that are broadly construed under the free energy principle and then they kind of close the authors do with the recognition of a claim that you'll see in a lot of places that if we could have a theory of everything or a multi-scale system, then it would be great. So that's what motivates people to work on the FEP and why a lot of people are excited about it. Anything to add on that blue? Okay. It would be great. I'm excited about it. A unifying theory of everything. Let's get to it. Let's get to it. Now we want to just like you said with are we studying the right systems with the right tools? So this is what the FEP is and very intentionally it appeals to things like control theory and measurement. Let's take it into the deep end with metaphysics. So the authors, right? The FEP has been applied to metaphysical questions. So these are very different than action planning, learning, attention. These are gonna be related to questions that are metaphysical like, what are the boundaries of the mind? And the citations are in the paper. What are the boundaries of living systems? So what's really there ontologically? What's the relationship between mind and matter? Like what is experience? All these kinds of philosophical questions about qualia and even things like is Gaia real or is there a scientific basis for talking about Gaia or sort of earth level awareness or at what level could awareness exist? And more, we're probably all curious about a lot of questions and it's not like we need to a priori classify them into physics or metaphysics but that shouldn't make us think there's no distinction. The questions that come to our mind we're curious probably about a lot of stuff. So we wanna be open to questions as they come in especially ones that are framed in new ways and then also be trying to approach them rigorously because as they raise here it's a web of formalisms. It's true that it is a web because we're gonna walk through it in the order that the authors laid it out but it is a web of concepts. And so even for those who are experienced in the field but especially those who are new to the area or as they call it a lay person not only are there distinctions that are unconventional but their implications are not always obvious like assumptions or axioms or relationships that have maybe very non-obvious conclusions or implications especially when combined. And then the two risks here are it might appear to be and actually mathematically unjustified but at the very least appear to be and that might warrant the unintended smuggling in of metaphysical assumptions. Like I have information theory, I can calculate it and here's who's conscious and who's not conscious. I would say that's completely unwarranted. What do you think? Good example. I think that's like the main critique of IIT. So I don't know. I think that it's a good example and how can we say who's conscious and who's not when you can't even give consciousness and adequate definition? I think, I don't know. But we're solving unanswerable questions but that's the joy of it. Cool. Yep, we talked about the consciousness side in some of our other discussions but suffice to say that there's kind of a distinction between the questions on this slide which are related to stochastic systems and action in the niche and then the metaphysics which are things that you can imagine even robots would debate over. Like who is really a robot like us or whatever they ask. So how does the FEP relate to active inference? They write active inference is a process theory derived from the application of variational inference to the study of biological and cognitive systems. And we're not gonna read this whole piece but active inference is a process theory. So it specifies how processes play out which we're gonna go into a lot of detail and it sort of takes something that's initially static and it figures out how to take a bunch of equations just there on the page and actually link them in a way where it helps us think about how systems work in their flow. So we take a bunch of equations that relate variables like estimates and action states and sensory states that are all happening at once as analytical equations in the FEP and then active inference is a compatible architecture like a skeleton or a style of model that implements processes compatible with FEP. What would you say about that? I think that you summed it up well. I mean, the authors, I have a couple of points here that are highlighted. So the fundamental imperative of active inference is to minimize free energy. So this is the FEP, the free energy principle. And then the other point is that, this is also the question that I had at the beginning. Are we using the right tools to study these phenomena? More specifically, we can ask what role the pearl blankets might play in active inference. Oh. Yep, we're gonna be seeing how this technical concept entered the discussion of metaphysical questions and it's almost like active inference as applied to biological creatures is where we see a lot of it coming into play. And yep, we talk about active inference throughout. So we'll kind of return to it. So now let's, that's like the modern stuff, right? FEP and active inference. Let's pull back to Bayes and to Bayesian statistics and build up in a new way how we're gonna go from simple Bayes to modern Bayes. So, Blue, how would you start us off or what do we need to know as we go on this like little Bayes, several slide journey? So like the one sentence or maybe like this, the short summary that I give about Bayes, like Bayesian statistics versus frequentist statistics and frequentist statistics, which is like what most people are taught in school, like when you flip a coin, the probability of it coming up heads half, right? 50% of the time it comes up half. Bayesian statistics accounts for the fact that like the previous 12 flips have all been heads, right? Like so what would you, if you have flipped a coin and you know it's a fair coin, if it's come up heads 12 times in a row, like the next time you flip it, don't you think it's more likely to come up tails, right? Like so the Bayesian accounts for what has happened in the past when it's not accounted for and frequentist statistics at all. Nice. And just to go through the notation because we are gonna be kind of going through the notation, as they write, it's a recipe for calculating the posterior probability. So that's the probability of X given Y and X is the hidden state of the world. Like let's just say that the hidden state of the world is the true temperature. It's actually a hypothesis about the world. That's why in Bayesian statistics, it's like a hypothesis that you're testing and the observational data, the measurement is Y and this is posterior to data. So like prior and posterior are referring to when new information comes in. Prior to the new data coming in and then it's updated and then you have a posterior which has been updated by the new data. And in very rapid here in this paragraph, they make the move from doing inference, Bayesian inference on a hidden variable like the temperature of the world into this idea of the model evidence with a normalization that keeps things adding up to one. So instead of having like an uncertainty distribution of how the temperature might be in the world and then taking in new data and updating that, another way might be I have two hypotheses. It's colder than 20 or it's warmer than 20. And then I can always be judging relatively which one of those is more accurate. So even if the temperature distribution is like very gnarly, it's possible sometimes to have a simpler thing that you're trying to hypothesize about the world, like is it over or under a certain number that helps you guide action? And then also when you do this model comparison, you can keep the relative, basically the relativation, you can keep models being compared relative to one another and that can sum to one. So you can do probability type estimates on different model classes, even without knowing whether any specific model is likely or unlikely. So like in phylogenetics, which is an area that Blue and I have both worked, you can use Bayesian statistics to relatively evaluate what the most likely tree is for species, but that tree's likelihood might be like super low, but that's not what matters. It's whether it's more likely than other hypotheses. So that's kind of what they write a little bit about here is when you have types of models that are computationally or analytically intractable, so there's no closed form solution for the posterior, like there's no clean way how you're gonna update your model of the phylogenetic tree given the next new genome that's added in. It's just not something smooth, like signal processing might be. So how are you gonna deal with that? And that's the big challenge. So it's kind of this rapid Bayesian acceleration from like you're trying to estimate something about the world that's unknown. Okay, well, what's unknown is which model is better. And then the challenge is we wanna update our ongoing estimate of what the best model of the world is, but there aren't really update rules for how new data coming in should help us update which model it is. Like night and day, it's clear, more light, it's likely to be daytime. But again, think about the phylogenetics or something where as new data comes in, it's really hard to know how exactly to update your models. So in that, the phylogenetic analysis, if you have new data coming in, literally, like you wanna add a new branch to your phylogenetic tree, the computational way that that's calculated is by switching in and out, in and out, in and out each branch to see the likelihood of the least likely or the best evidence for that tree that's existence. But it literally makes every single tree and then calculates that, which is computationally a really intense and time-consuming effort. So that's why variational base is so neat and swift in comparison. Yep, exactly. It's these red and blue on the bottom. It's like a continuum of approaches before enter variational. The continuum being we have some simple cases where we have a closed form solution for exactly how to update the model's evidence as new data comes in. And then there's sort of this Wild West where you need to use customized schemes. And sometimes you have to resort to just extremely costly, like almost brute force, like swapping or doing the Markov chain Monte Carlo. However, maybe there's another way, which is what brings us to variational base. And variational base as the author's right on the bottom. The trick of variational base, and we're gonna come back to the KL divergence, but here's the trick upfront, is consistent letting go of trying to minimize the KL divergence in equation three directly, shifting the objective to optimizing a different functional which bounds the model evidence. So what would you say the variational trick is? I mean, that's it. So if you bound the model evidence, so you know that the answer can't be outside of these parameters, that's like bounding the model evidence, then you're not calculating or trying to calculate things that are outside of those parameters, I would say. Yep, and maybe there's another way to read this equation on three, but the underlined red part D sub KL is the KL divergence. So it's about the divergence between two different distributions. And the double line is like what the two distributions are. So that P X given Y, we saw that back here. That's what we wanna calculate. So even though we're kind of accelerated on base, we're still back on equation one doing inference on the world, but it's gonna look more complex now. So we're looking for the difference between that thing that we really wanna find out P X given Y and then the magical Q distribution that we're in control of. And then we want the divergence between those two distributions to be zero. And distance is always positive. There's no negative distance. And so we just have to minimize this divergence so that we can get at what we really want. And again, one part we don't know and it's hard and the other part we're totally in control of. So the whole question and trick is gonna be, can we design a Q so that by reducing the KL divergence on Q versus P, we're getting at what we truly want. So that is kind of what threads the needle between these two approaches, the red being like the elegant and simple and effective and then the blue being like the extremely messy and simulation based, variational base is elegant. And also it helps us potentially deal with complex underlying distributions. Cool? Okay, now here we return in five, there's D sub KL between Q of what we wanna know and P of X given Y. So it's the same as it was on the previous equation three and that is going to be given another equivalency and it's restated as the natural log of P of Y minus big L of X. So what do you think about L of X blue or what did you see here? Or no worries. Okay. I'm like, I'm lost in the math. Yeah. I was just like thinking, oh, you know, there's two lines that's an equal sign. Here's one line is a minus and here's three lines. Yeah, I was lost with the three lines. Like why is that an equal sign? Like, yeah, I'm confused with that three triple line thing. I think three lines can mean that it is defined as, like it's strictly defined as or, but let's find out. I don't wanna say anything that's incorrect. Yeah. That's what I was thinking of though. I'm like, no, no, I'm opting out, I pass. Yeah. But the restatement here is this value that we want to go low, low, low with the divergence between what we can control and what we can't control. We're gonna restate it in terms of two different pieces, a sort of log. This is just the likelihood model of the observations. So this is just Y. It's P, here PX given Y was like the true X given the Y observation, like the true temperature given the thermometer. And here we're just gonna have the likelihood of the measurement. So this is a lot easier to calculate than X given Y because it's not a conditional. And then we're gonna penalize that by big L of X, which is an expectation of Q. What Q thinks is gonna be a likely value for X, plus a joint distribution of X and Y of P, not the conditional. So it's a little past me Y reframing it in each of these nuanced ways, helps you do it. But I just think of it like mathematical pinball. Like you gotta like bring some wall down so that you can go up another ramp and do something different. But let's definitely ask the authors about it. The negative is like baffling. Like I think like the point of when you are getting into the logarithmic scale like so that you like eliminate any possibility of a negative. And then like they put the negative on the front of it. I'm like, hmm, that's a little strange to me. Yep. And then also it's like the, it's like the negative variational free energy. And so we push on though. They are able to say that basically free energy this is by analogy to the variational free energy. They're gonna bring, so that's where they bring in the F. They say basically the free energy that you might want to minimize, we'll get there in a few equations, but the free energy function of X, that hidden state of the world, like which model is best or what's the real temperature? It is the negative of this L. Because here what L was defined this way and it's almost like we don't, we can separate out L from P at all. So here we had like sort of a hybrid with Qs and P's and here there's like a P here, but now when we get L all alone, now we've kind of encapsulated P in a way where it's like all Q centric. I'm not even sure if that's true. So I don't want to go to down the rabbit hole, but now it's framed in a way where we can do our work on Q, but get to where we want to get on P. And I think that is the rub. That's what the question is about is, okay, it's a approach and I would love to ask somebody with more variational inference, especially actually on this, I think these are the ones where we were like blue and I were just like, okay, we're gonna raise more questions than we answer, but what does it look like to make the variational version of a model? And what do you sit down and do so that you can calculate this or is it all carried out beneath the hood? So minimizing free energy as defined in this way on the Q, which is the function that we control, the goal is that that will also get us to the right value on P, the function we don't control. So I have my internal model of temperature and I want to be doing inference on that, finding what's the most likely, most free energy minimized internal model of temperature. And then if that is attuned properly, that should be accurate in terms of the external as well. So in my limited understanding of making a variational model, I think that these models instead of like using values, they use distributions, right? So that's the kind of the key difference, I think, between the variational like model construction versus the like non-variational variety of Bayesian model in my limited understanding. Nice point that previously we were like looking at passing like a number through a function and then when we're going variational, we're talking about like passing functions through each other. This one, we're not gonna go into the technical details, but it's about mean field approximations. And this is a super interesting topic because a lot of statistical models will assume that there's like a homogenous field or that a gas is well mixed or that all types of a certain interaction are equally likely to happen amongst all molecules bouncing around, but because we're interested in applying free energy principle, not just to the burning candle or to a chemical reaction, but to active inference agents, to cybernetic and goal oriented agents, how does the mean field approximation work? Like if we wanna explain culture, is there a mean field approximation for culture or how are we going to apply this to the systems where we know we're kind of extending into analogy land? But at the same time, maybe reality is really multi-scale. So it's not just an analogy to say like, when Chris Field talked about quantum and communication at our scale, not just as bubbling up from electrons and not just as a metaphor, but actually as like a third way that it really was based upon those principles at our scale. That's where this makes me think about like do we need or will there be mean field approximations at a lower level? And then we're just experiencing a sort of non mean fields at our level. And okay, 21. Anything else to say there, Blue? So I think like the mean field approximation, I think it's like, I don't know, it limits the interactions between different variables. So you're interacting with the density instead of each individual variable interacting with each variable. So it's like, it's like a course, it's a course screening, like having this mean field approximation, I think. Yep, and we'll return to it when we look at the figures related to Friston's paper. But yes, it's related to partitioning variables. And yes, now this part I thought was really interesting and kind of like basic statistics. So maybe you'll be familiar to some, but they're seeing it in a new way. That the difference between the different relationships that variables can have. And we're seeing it here written out with math, but we're actually gonna come back to it with figures. And so it will potentially be a little bit clearer, but there's a few different situations. The nicest situation for a statistical inference is when you can cleanly separate inference on the two variables. So here it's like PX1 and X2, the joint distribution of X1 and X2. It could be separated very cleanly because they're independent. So it's like you flip a coin and you roll a die. The way that you get the probability of a head and a four is basically you just multiply that probability of head times probability of getting a four. And so that's the clean case. And sort of almost by analogy, that's how I understand variational inference. Like the variables cleanly separate so that you can do restricted optimization on subsets of variables in a little bit more principled way. But back to the stats, there's often a time where the joint distribution that you care about is conditionally dependent. So conditional dependence, conditional independence, two sides of the same coin, it's conditionally dependent on a third variable X3. And so here we're gonna separate X1 and X2 just like we did above, but now we have like X1 conditioned on three and X2 conditioned on three. And there's actually all kinds of situations where we're really interested in these interactor variables. And so there's one reason it might be latent cause estimation. Like I wanna know why is it that some people take this drug and have this side effect and other people don't. Sometimes you wanna know all three factors like how three different nodes of a computer network or a genetic network are related. So you actually might be interested in how all three of these nodes are linked. And then other times the mediator might be useful for like a statistical analysis or like an interaction effect in an ANOVA. Like, okay, there's an effect of age and there's an effect of height, but is there a height by age interaction? So those kinds of questions, they're not that ANOVA questions, not in the Bayesian framework. That's from the Frequentist framework, but it's an example of when you wanna know about interactor variables. But it turns out that this is gonna facilitate the variational inference because you can like use this hidden variable three to shield or separate x1 from x2 and x2 from x1. So that's the move of Pearl. And then that's what we're gonna dive into from then on. So we went from basically Bayes on states, Bayes on models to variational bays. Variational bays gives us this free energy minimizing way to work the variational inference, contingent on a few assumptions that we would love clarification and education on. And that's going to help us separate out in a way that's tractable different parts of the world model that we care about. So an easy way to think about this conditional dependence is if you're like thinking about the probability of Mary and Mo both eating at the frontier restaurant in Albuquerque, it's independent of one another, right? Like Mary and Mo, they eat periodically at the frontier restaurant. This is like an independent probability that they go there, whatever. And then they have another friend they go with. So Mary and Mo go with Sally. If Sally eats at the frontier restaurant, then the probability that Mary and Mo will be there together, because they'll all go together, is they'll all be there at the same time if Sally's there, because then they'll all go eat together. But they're not independently friends. They're only friends. They only go as a trio when Sally's involved. Nice. Our variables just hanging out with each other. So let's take it from doing inference on like one variable states and the variational and everything to two variables that's with this joint inference. And then here we brought in the third variable. Let's generalize that beyond three, because we're not just interested in X1 and X2 and their relationship or how three influences it, we wanna be actually open to all graphical relationships. And that just means everything you can draw as a network with nodes and edges is graphical or that's a network analysis. So they bring in now the notion of a Bayesian network beta and it is defined as containing G and P. So G is subdefined with vertices and edges V and D. So G is really like edges and nodes. So G is the network itself and that's kind of the matrix form, the two space form that we're familiar with. And then P is, as they write, a collection of tables containing dependencies between the variables as a set of matrices with structure. But it's kind of like G is the social network, like who are the people and what are the edges? And then P is describing their relationship in a way. And they also, then they write the tables P, the graph G is often represented by an adjacency matrix. So that's nodes by nodes and then the cells are whether they're connected or not. And that's exactly what we were getting at with the two spaces. So yeah, it is pretty interesting. And then the tables P contain the factorization for the joint probability distribution over the variables characterized for the directed acyclic graph, the DAG. And so there's an equation that we're gonna use to jointly consider like the network nodes and edges G and all the dependencies on the network P. And so you could imagine if there was like, if you had a matrix of a hundred people and these 50 all knew each other, like they had dependencies between each other and then these 50 people all knew each other. They had dependencies between each other. We would want a method that could decompose that matrix into a more factorizable, potentially nicer to deal with form. Okay. So here we visually see it. And I think that the circles are all meant to be complete, but they look like, like I was wondering if it was a dashed line or something. But I think it's all, yeah. So here we have a bunch of variables, T, W, Y, X, U and Z. And then the authors very helpfully go through a few of the kinds of relationships that can link motifs of three variables. And so here's W and Y. And so it's like, notice how W and Y are unconnected, but they both have a dependency towards X. So it's like, W and Y are marginally independent, but only conditionally dependent if X is observed. So if you didn't see X, you wouldn't know that W and Y had a dependency. But then when you do observe X, you can see that Oh, X is actually being influenced by both of them. That's called the head to head. W and Z marginally dependent because you could tell that potentially through latent interact or variable X, if you had W and Z measured, you could tell that there was an influence. But you wouldn't know as much about the influence of W on Z unless you knew about X. And then there's X and U, which is kind of like the backwards example of W and Y. Because X and U would be correlated in the real world. Like you'd be pulling out X's and pulling out U's and they'd be correlated. So they're marginally dependent. But then if you measured Y, if you observed Y, you would make them conditionally independent because you'd be able to correct for Y. And then as you pulled out corrected X's and U's, they would be conditionally independent. So the Bayesian graphical form allows us to go from that P of T, W, et cetera, et cetera. So think about that. There are six variables. That could be six times six variables to consider like 36 connections. But we can factor it into like the function of Z given X because that's the only thing that Z is conditioned on. X is conditioned on W and Y. U is conditioned on Y. And then T is a lone variable. So we've gotten all six variables. Into a way that is like a lot sparser. It only specifies the parts that we actually need to care about in terms of the dependencies that actually exist. So that's kind of the Bayes network. It's just using network thinking to talk about the really nuanced relationships between sets of variables. So this could be like six demographic factors of a person and Z could be the health outcome. And so then we can use this analysis to do statistical inference just like we would with any other technique. Anything to add on it, Blue? Just the final formula down at the bottom really specifies like it separates out like P, Z given X, right? And so, and T is independent of everything. So you can really see like the conditional dependencies down here in this final equation. X given W and Y and then U given Y, so. Yeah, it's almost like there's the graph representation. Then there's the adjacency matrix representation and then here's an equation representation. So it's like a Rosetta stone. If you could have a technique that works on matrices and then you wanted to apply it to a network, well then you need to connect networks and matrices but then you could definitely apply it there. What about the pearl blanket with this cute picture? Yeah. So the pearl blanket is really traditionally is described by the set of nodes that shields a variable. So in this case, variable A that's in the middle from the other nodes in the network. So A can't interact with these nodes that are outside of the pearl blanket which is represented by this circular blanket. So A can't interact with these other nodes without going through the nodes that are contained in the circle. Yep, and if you notice it has like the ones that are to the sides of A, it's because all these kinds of relationships are relevant. So even the one that A does not directly connected to A but it still is included within the blanket. That's kind of important to note. All right, so now we're gonna get to this figure and it's kind of clever how I think the authors did it because they introduced this in figure two, uncolored and then we're gonna talk through some Friston simulations and then active inference in figure six and then we're gonna return to this in figure seven with a new color scheme that will help really make their point clear. So here they're taking this simple example and they're just giving another example and here is their legend. The grays are the observables. So Y is observable, Y one, Y two. We're observing two things about the world measuring two things. All the X's are hidden states. The arrows are that Bayes net directed acyclic graph representation. The dashed node is this special node A10 that's like our focal node and then the thick nodes are the blankets around X10. So here just like you get A and it was like its neighborhood was the blanket. Here we're gonna have the pearl blanket as they write. Thick lines are used for nodes constituting the pearl blanket for selected node X10 depicted here with a dashboarder. So it's kind of like a legal argument. They're like saying, you know, would you agree that this is the appropriate use of the pearl blanket? Because if you do think that this is a pearl blanket you might find yourself in a trap in figure seven when they show you what this actually represents. So it's like, okay, good. This is what a pearl blanket is. There's no real dissension. This is where we're at with pearl blanket thinking and we've just specified some arbitrary network. You know, nothing in mind, just specified it. And then they also define in 4.1 a little bit about how pearl blankets play a role in active inference. And it's a super fascinating topic that is something we wanna return to with the authors. But if you notice that this Q distribution that was the one that we were in control of and then that minimization, previously we were looking like at F of X. It was free energy minimization on states, hypotheses. But here we're also including policies, pi sequences of action. So now we're doing free energy minimization on policy. And so that's what allows us to include the action in the loop with active inference. So active inference isn't like just a two-stroke engine where we're gonna do inference on states and then action selection in a second step. It's actually a loop that integrates action and inference into functions in a co-equal way. And that's what one of the special pieces of active inference is. As they write, something that classical formulations of variational inference in statistics and machine learning do not consider instead assuming fixed observations or data. So you read in the big data set, you do the free energy minimization and there's your answer. And then if you get new data, what do these companies do? They rerun the optimization with the new data. But this is actually like a real time in the loop and it's also taking action. Okay, any thoughts or on figure three do you want to intro it or what? Yeah, so this figure, I think the papers on the next slide, actually the figure that came out of, it's came out of this Friston 2013 paper. And there's like a set of figures in this paper that were accepted from that paper. And so this is the, it's a primordial loop simulation that was done in the article by Friston and the larger cyan dots represent the location of each particle. And then there are three smaller blue dots associated with each particle representing the electrochemical state of the particle. So this is like on how life came to kind of be, right? So that's the, what's gonna happen in the simulation. So this is just the first and then we go on into the next figures from the paper. Nice. So take this one, yeah. Yep, yeah. So the simulation is just like the cyan. It's kind of like the nucleus of an atom. And then the blue dots, the little ones are like electrons floating around, okay? So just, it's not trying to be a physics simulation, but that's kind of what's happening. Figure four, here is, and they're electrically charged. That's the electrochemical part. Here's what they write. You can take, so let's just say each of these cyan dots had a number, you know, number one through a hundred, like an agent-based model. Now we're gonna element one through 120, I guess. You can plot the adjacency matrix A based upon the coupling that is dependencies between the different particles at the final simulation T. So you're gonna start this simulation, let it run out. And then, okay, this one is very closely coupled to that one. And this one is closely coupled to that one. So number one is talking to four and number four is talking to 22. And then you can take that matrix and you can just put a dot where you saw a dependency. So if you saw some sort of behavioral dependency between the two, like an adjacency, you're gonna add a dot and then you're gonna rearrange the rows and the columns by swapping, which preserves the information, rearrange them so that there's sort of falling out. There's actually several, there's a pattern that falls out. We see that there's a broad region of interactions over here. And then in the bottom right, there's like another kind of sub area of interaction. And then on the very bottom right is like a very tight area of interaction where you see a bunch of dots over the 120. So as they write the adjacency matrix is a representation of the electrochemical interactions between particles because it's drawn because the adjacency is on the simulation, but can be interpreted as an abstract depiction of the Bayesian network. And so we're taking the dependencies between these particles as they jostle around. And this is what we're trying to do. We're trying to almost like factorize it, separate nature at the joints, map in the territory. Okay, there's a border on the map, but is there a border on the territory? And then plot it as a matrix. And then there's some shading that's gonna be suggestive of what role those nodes are hypothesized to be playing. So here's what they did just in a little more formal on slide 30. They used spectral, well, this is I think what Friston 2013 did. Spectral graph theory was used to identify the eight most densely coupled nodes, which are defined as the internal states. So why eight, why dense cup? Okay, right, that's the question. That's what we're discussing. And those are gonna be defined as internal states. Just to skip to the slide, those are those blue ones right in the middle here. So those are those core states, you know, back here. Given those internal states, the Markov blanket is then found through tracing the parents, children and co-parents of children in the network. Sounds like a preschool. As an extra interpretive step, and that's part of what this discussion is about. How interpretive is this? Like interpretive dance interpretive or like interpreter at the UN. The nodes in this Markov blanket can be further separated into sensory and active states. The sensory states correspond to the parents of the internal states. So that's incoming info. Not that they birthed the internal states, but the sense states are the ones that have an upstream dependency on the internal states. And the active states correspond to the children of the internal states and their co-parents. So again, not that the active states are simply birthed by internal states, but that they have downward flowing statistical dependency. States that are not internal states or part of the Markov blanket are then called external states. So we're using this partitioning where first you identify dense couplings, then traced out parents and co-parents and children, and then everything that's left is external states. And so you recolor that simulation from figure three. Here it is uncolored. And now we're gonna recolor it with this new color code where we've identified the blue densely coupled nodes and then the active states and then the sensory states which are sort of on a frontier with the teal states, the cyan, the external. So it's almost like you have the internal, let's just imagine, it's like the nervous system. And then it's swaddled in the active states. And then on the outside of the active states or the sensory states and those interface directly with the world. So that's kind of the tantalizing nature of this model would be like, wow, if it really just fell out of a simulation that you can get self-organization like that, wouldn't that be cool? But what is the issue here, Blue? Well, I mean, it's kind of arbitrary. Like, so, they said a third of the electrochemical states from the very beginning of the simulation did not interact, right? So it's that establishes this non-equilibrium system. And I think that the issue then comes from assigning these interactions to internal, external, active and sensory states that may or may not have some kind of dependence on one another. Yes, as they write here, the simulation assumes that by viewing the system through the formalism of the Markov blanket plus some additional assumptions about the separation of states, the ones that we kind of walked through just a second ago, you can uncover hidden properties of the target system which is then said to quote, instantiate or possess a Markov blanket. And then this is the map territory. The procedure of attributing to the territory which is the actual dynamical system, what is a property of the map, which is the scientist doing inference on the dependencies and then cutting it at eight nodes and then using this other approach to make the Markov blanket. It's a property of the map and it's a clear example of the reification fallacy, treating something abstract as something concrete without further justification. That's not the only way reification works but it's one key piece of it. And so in other words, it's like you go in with a Markov blanket hammer and your secondary pipeline for identifying parents and children and then you find it because it's defined for any system you're gonna be able to get a Markov blanket using that protocol and then look at this, have you really discovered something about this dynamical system by separating it into this partition when actually all of the particles are doing the same jostling motion. So it's not like we have the influencer nodes here. These are just densely coupled ones and so they propose in this same paragraph, propose to distinguish between purl blankets to refer to this standard epistemic or like statistical use of the Markov blankets and Friston blankets to refer to this new metaphysical construct. And they also give a shout out to Martin Beale in the paper. And so we're kind of coming back to this map territory distinction. On the map side, we have epistemic, purl blankets, Bayesian statistics, but no philosophical power. And then what they're attributing to the territory would be metaphysics and it would be the kinds of things that the Friston blanket is making claims about. And I think my question was can we do that action in the loop without metaphysics? Because it seems like there's two innovations in or differences between the purl blanket in its sort of normal statistical use and the way Friston's using it. First, he separates sense and action, but also he's making metaphysical or ontological claims. So I'm just wondering like, are the ontological claims separable from taking a control theory perspective on Bayesian statistics? That's what we would like to find out. So then, Blue, you copied this image over. What do you like about it or what did you see in it? So I just thought it was like a nice pictorial representation as opposed to this primordial soup, like it's a little bit hairball, of what we have as internal states, which were back in the Friston picture, the central dark blue, most highly connected nodes through the internal states and then the external states, which are separated from these blanket states. And so that's where this pink dashed line is surrounded that's labeled with these. So this is like the Friston blanket contains the active states and the sensory states that may be interacting with each other. And then this Friston blanket is what separates the internal states from the external states. And they take it in quite a literal sense, like the boundary of a cell and so forth. So separating the internal from the external is the Friston blanket. Nice. And let's connect this to figure four and then the kinds of things that we wanted to discover there. Looking at this as a graph of Bayesian variables, it's like active states, they influence external states. So this is that coupling. So it's almost like external states, they don't connect to internal states at all. That's kind of like missing a corner on the graph. So we have nodes everywhere, you know, there's cells in your hand and there's cells across the wall, but how can we separate nature at the joints? Maybe we could make measurements, find the statistical dependencies and then carve those dependencies into densely connected internal states and then incoming states that are upstream and outgoing states that are downstream. And then we might have a new way to do Bayesian modeling on systems that are on a continuum or as part of a really complex process. And figure five, yep, we already had that, I guess we put it twice. And yes, 4.3. This is sort of the man's search for meaning. Why and how have Markov blankets been reified to act as parts of the target system? How did we get here? And where this technical idea became reified so people thinking that there's support in the literature. Saying, oh yeah, Markov blankets, it's in the literature. There's real measurement support for Markov blankets. That's reification of delineating spatial temporal boundaries rather than formal tools intended for scientific representation and statistical analysis. When did the map become conflated with the territory? Because like everyone's familiar how we use linear regression on healthcare data, but then no one goes back and says, well then you're a linear regression. So it's a little bit like if somebody were using Markov blankets on health data and then said, now you're a Markov blanket, or is it? Not saying it's exactly the same, but that's the rhetoric of what is being put out here. It would be like the line delineating, the healthy from the sick is a Markov blanket. It's a line, there's actually a line. And then they write. So again, people should read these papers. They're fun to think about and there's a lot more that we're not getting to go into in the dot zero. But they write basically, perhaps surprisingly, many authors in the field are seemingly not aware of this process of reification. And this has led to the conflation of several different kinds of boundaries in the literature. Markov blankets are characterized alternatively as statistical boundaries, causal boundaries, spatial boundaries, epistemic boundaries, and auto-poetic boundaries. Each characterization is treated as somehow equivalent to and interchangeable with the others. Now it's rare that someone does the grand slam where they totally conflates all of them. But then there are papers here and there where, oh, it's causal and it's spatial. And then it's spatial and it's epistemic and it's statistical and it's causal. And through this web of formalisms, we can get in a tangle. To kind of, any thoughts about that? Yeah, just that the paper itself contains like many quotes from other papers where they use the Markov blanket or the Friston blanket in very much like a statistical and instrumentalist sense and then turn around and use it in a very realistic sense. So there's many examples of that illustrated through the paper that trace back through the literature. Good point. And those are the parts that we didn't quote in this dot zero but there's many pages. So if someone's reading it and they're like, wait, they don't do that in the literature. Read the paper because that's where they mentioned it. And this is from the live stream number 15 where we had, we were discussing the paper of Van As in Hippolito and we introduced this discussion around realism and instrumentalism. And what those authors had written was basically realism and instrumentalism concern the models and statistical manipulations that make up the FEP. Whether they are thought to be used and manipulated by the systems under scrutiny independent of scientific inquiry. Realism, the tree falls in the forest. Does it make a sound? Realism, yes, there's a real tree. It really did fall. It really did make a sound. Or conversely, whether they are thought to be scientific tools wrought by humans in specific sociocultural environments to study particular systems, instrumentalism. So that's like, well no, a sound means somebody has to hear it. So you don't know if the tree is real or it really didn't make a sound. And the whole question is where does free energy principle, is it about what systems are doing? Realism, when we don't look or is it about us modeling systems no matter how they look. That's instrumentalism. And that brings us to figure six where they show a classic sensory motor loop. So it looks like it's an active inference loop and that's where they're going with this. But their whole point is that although the figure does not use directed edges to signify causal influence, it is not strictly a Bayesian graph as it depicts cyclic sets of circular dependencies. Some between pairs and an overall loop containing all components. Now, this was quite interesting because first off, it points out that yeah, active inference didn't invent the internal, external or the action sense idea. We're using different models and we're recombining them. But there is this cycle, right? There's also in active inference, more of an emphasis on the internal states and the external states as they have their own endogenous dynamics. But that could be part of these models too. And then it's true that it's not a directed acyclic graph if there are loops, local or global loops. But what about implementations that have time steps where you can break these loops? Like if you compute it in this order, just a thought, like does it really violate, how badly does it violate some of the assumptions of directed acyclic graphs? But I don't know. Any thoughts or good for Sevin? I don't know. Yeah, I don't have the answer for you. If I did, I mean, I don't know that necessarily putting a blanket over the sensation and the action states, the sensory and the active states, like putting that in a box, like does that break the acyclic graph thing? I mean, I don't know that that necessarily breaks it enough to make it somehow valid either, right? Like, I don't know. It is a cycle, I think. Good question. So now we return to the figure that we saw before and this is where they now strike. So figure two, remember it was like, surely you agree this is how a pearl blanket would be deployed on this graph? You know, speak now or forever hold your peace. And then now it's gonna come back to bite some people who may have not expected where it was gonna come. So before they were highlighting that X10 node and now they're gonna color according to Friston's scheme, they're gonna color the nodes. So it's colored like figure five, but on the figure that they introduced earlier with X10 still as a focal node. But then they're gonna show that if you have a different focal node, X9, not only are different nodes included, but they're categorized differently, which isn't a problem for the pearl blanket because it's like, oh, if you measure variable A and B, then this one is the child or the parent depending on what you measured in that bigger graph. And then, but if you're gonna have a realist interpretation of what these nodes are, you're out of luck because one is gonna be an active state for one and a sensory state for another. And then if you are gonna say, well, it doesn't matter if it's an active state or sensory state, like they're all doing it for each other, it's equivalent to the pearl blanket phrasing. So what they write is, if you're gonna have a sensory motor interpretation, which is not the only option, this suggests that the sensory motor blankets are Markov blankets applied under specific set of assumptions and it can't be traced to standard uses of Markov blankets in variational inference. So basically, depending on what your hidden variable of interest is, you compute a different blanket around it. So that is problematic for realist interpretations because depending on what you were doing inference on, you might variously find that the membrane of a cell or a measurement from the membrane of a cell was a sensory state because it was incoming for something else, but it could also be an action state because it's outgoing of something else, but then it would be internal to another set of things. And so if we wanna make realist claims or interpretations about blanket states and internal extra states, that's an issue. So is the issue that we haven't figured out how to nest these models yet? I mean, I think that it makes sense that the outgoing is incoming for another agent in the model so that we just haven't figured out the nesting properly yet appropriately or that each nest is its own blanket state. I don't know. I agree, I'm an optimist as well. Let's see, eight, but this is where they really go for it. And this one I thought was just very clever and funny. So the delayed figure with the intermission to convince you, lol you, now eight. So now they're gonna go into a graph that, oh, total coincidence, it looks like the left side. So here on the right side, we were talking X10, X9, but we never talked about this left one that looks like California. But let's look at it now. So, exactly. On the left, a Bayesian network where ID and ISUB P denote the motor intentions of the doctor and the patient respectively. So that's these ones on the top too. H is a medical intervention with a hammer. C is a cortical motor column and S, or spinal neurons, movement is M. And then K is a different way that the leg could have been observed to move like somebody could have moved it or gravity could have changed. So again, M is the movement. That's the observable movement. This is the abstract Bayesian net on the left. In A, in B is one possible action pattern, which is that the patient goes to the cortical motor command, the IP, and then H is the medical intervention with a hammer, and then that results in no motor behavior. But then here, you do have the motor behavior observed but it comes about a different way. Rather than H, the intervention as a co-parent, you have K this other way, like somebody picking up the leg and moving it. In the middle and on the right, the same network is partitioned using a naive frist and blanket with different choice of internal states, C and S respectively. So if your internal state is the cortical neurons, you're going to think that the basically active states are the spinal neurons, which is how people talk about it. So it's not like this is like a theories blown out or anything, it's like saying the spinal neurons are the active states of the cortical neurons. And then the movement is if your focal node internal is the spinal neurons, then the active state is what we would call in realism the real active state. So it's kind of like saying, again, depending on what node you're focused on, different nodes are going to be categorized as sense active internal and external. I want to say that's how we've been thinking about it, that depending on which side of the line you're on, that's going to be internal, external. But the point is it provides problems for realist interpretations of any single Markov blanket because we have to acknowledge that the Markov blankets and especially how the states are classified is conditional. So it's like, is the bicep an active state or is this an active state? It's like, well, if you're doing inference on this, yes. But if you're doing inference on that, no. So it's hard to have a realist interpretation, even though there are times where if you define something internally as something that's truly on the inside of something, it might be that the active states truly line up with what's on the outside of it. But that isn't the full generality of this Bayesian network approach. And that's like a broken clock is right twice a day and it doesn't actually get as general as it could be. This makes me think of like the broken knee, like the phantom limb, it takes me back to that. Like what if the sensory neurons are activated for the leg to move, but there is no leg to move, right? Like, then what? Nice. So, okay, next one. They give a little bit of recent history. So if Twitter's too fast and textbooks are too slow, this is the history speed for you. Beal, Pollock and Kanai 2020 questioned some of the technical assumptions underlying the use of Markov blankets by Friston. And since then, the idea of the Friston blanket being understood as a distinct construct has gained traction in literature. And we're actually looking forward to having Martin discuss that with us in about two weeks. So again, it will be really interesting to hear like, how did you come to the position where you were getting at these technicalities and critiquing it and where does action fit in? Where does philosophy fit in? Where do these other things that Friston is bringing in? And anything to say on this part or? Only the question that I raised at the beginning, where does language fit in, right? Like to what extent is the problem or the issue that's being raised, like a function of the language that we're using? Yes, and in this paragraph, they make another subtle point that shows why when the language is uncertain, there can be a lot of drift underlying. They say in a recent paper, Friston blankets are formalized in terms of constraints on sparse couplings and then they're identified using, read the technical details, the non-zero components of the Hessians using some other last names. And that is taking the construct according to the authors far away from the pearl blanket origins. And so they're saying, you wouldn't make that kind of swap in the underlying mechanics unless you were working with the physical domain of what's really there. So that's an interesting point. It'd be cool to hear from the authors. Here on the next slide is a kind of a closing remark by the authors. And this is kind of a meta point. And the meta point is actually the relationship between action and inference, like of an organism, and then scientists modeling their world. So it's not the first person who's made that comparison. You got physical foraging, you got information foraging, but they're saying basically, there's a lot of similarities between perceptual inference and scientific inference. Both use a previously learned model of the world and set of observations to infer the causal structure of the unobserved outside world. And this was like the triple meta is model based cognitive neuroscience is in a special place. It makes models of how animals model their environment. A cognitive neuroscientist uses both behavioral and neural data to infer the most likely model that the agent's brain implements. There's nothing wrong with this doubling up of modeling relations, as long as one is conceptually careful. One needs to distinguish between the properties of the environment, properties of the agent's model of the environment and properties of a scientist's model of the agent modeling its environment. And that's the third level. And that's why it's unsurprising actually almost in retrospect that FEP arose from cognitive neuroscience and neuroimaging with EEG and fMRI because it doesn't come from, we're gonna use physics to find the grand unifying theory. It actually came from a, we're making measurements about how agents reduce their uncertainty about their model of the world. And it's that that gives you the triple loop, which I think is like quite interesting about how it's nested. Well, and here we are adding like another layer to the loop, like thinking about how, like trying to, you know, criticizing the model with which the scientists are using to model the agents. It's just models all the way down. Yep. So what are the implications for cognitive science in humans? And this part right here, properties of how the scientists model their agent modeling their environment, you could replace scientist with researcher or data scientist or machine learning algorithm. This really matters what people take into account when they design their algorithms and when they make algorithms about other cognitive agents. Because that constrains in ways that are sometimes obvious and sometimes not obvious. You know, if you only had one variable for a user on your website, how likely they are to buy something, that was your variable, you're gonna set up a different algorithm. So it's the properties of how the scientists models the situation that really matter and has implications for cognitive science that when we're studying humans and non-humans and non-animals and lose big question. This is actually the big question that the office raised. So they asked in the paper, if Bayesian networks and Markov blankets are the right kinds of conceptual tools to delineate the sensory motor boundaries of agents and living organisms. So the authors brought this up within the paper. I thought it was just important to highlight that because I don't know. I think like if you go out to a scale, like if you have an image and you start to core screen that image, building representations of the image, building representations of the representations of the representations, eventually you can't see the picture anymore at all. And the point is to make it easier to understand not to make it so blurry that nobody understands it at all, I think. Cool. This is what brings them to the distinction of inference with, wait, is that even the right? It's, yeah, oh yeah, inference with and inference within. I was thinking, I was getting really, inference of a model, but inference with a model, the distinction between the two blanket constructs as they're positing them. I'll shrink our little video. Can then be easily identified once we look at who appears to be performing the inference and what system that inference is performed on. So they go through four kinds of networks. We're not gonna go through it here, but if people are interested in the technicalities here, it's really worth thinking about what's the difference between the generative model and the generative process. So like I can have a generative model of blue in my head. So when she's speaking, I'm doing generative inference on her syntax, but then that's not the generative process, which is the actual underlying person. So this is just a nice, very rigorous way to break down when we're doing inference on a model versus inference on a process. And okay, 42. All of this points towards a fundamental dilemma for anyone wanting to use Markov blankets to make substantial philosophical claims about biological and cognitive systems, which is what we take proponents of free energy principle to be wanting to do. So agree or disagree. You know, if you disagree, you can write a paper or you can come on active stream, but that's the claim of the authors. And so that is what we kind of got excited to learn about. And they then come back to this with versus within and they say basically in inference with a model, the graphical model is an epistemic tool. So that's like, I'm going out to the forest with my machete. I'm gonna use it as a tool as I go out versus inference within a model, the scientist disappears from the scene, becoming a mere spectator of the unraveling inference show before their eyes. The way that's written makes the authors sound a little bit sort of dismissive of that being a likely approach. It's almost like, well, we wouldn't want that to happen or it can't happen. So scientists can only do inference with models. Now the scientist as the body is the inference within a model, like the inference is happening within the science body or within the scientists exocortex or within their community. But that's very different when you're talking about the real system, the scientist at the desk. That's the system that is doing inference, but inference with a model is the instrumentalist take. Okay. And this is just a final slide that just gives them closing thoughts. And this was interesting because they mentioned that there are simulations like the game of life which have been enormously productive in many domains. And so basically they say that it might be an interesting way of modeling emergent processes and complex systems, but it won't support any metaphysical claims about frist and blankets. And then they say, we will not pursue this idea any further here, but offer it as a more modest, perhaps instrumental interpretation that some proponents of the FEP and active inference might be inclined to adopt in order to avoid any stronger commitments. So that's in some ways where we feel like we're sitting, which is we're approaching active inference instrumentally and then we'll see what happens with other claims. But it seems like we can pursue it instrumentally and I guess that's a modest proposal. And then they just summarize. So overall, nice paper, relatively good length, a lot of figures, clear arguments, rigorous philosophy and math. So it was good times. And like a good coverage of the literature and a trace through the history of the methods underlying the FEP, it was good. Do you have any other closing thoughts? Only that maybe inference within a model is actually possible if you're using a robot that's programmed to do inference, right? Like the inference, if the action is to do inference, if that's how it works, then that's inference within a model, possible. Cool. Well, it will definitely be really interesting to hear from the authors, hear from everybody who wants to join on the stream. So, Blue, thanks a lot for joining. This was a great discussion. Thanks everyone for watching and we'll see you later. Bye. Thank you.