 Hello and welcome everyone to Act in Flab. This is Act in Flab live stream number Palindromic 22.0. It's May 13th, 2021. And this should be a really fun discussion just to introduce ourselves at the beginning. I'm Daniel. I'm a postdoctoral researcher in California and I am joined by Dean. Go ahead. Oh, I'm Dean. I'm from Calgary. I'm retired and I'm kind of new to this active difference thing, but I'm trying to keep up. I'm Blue Knight. I'm an independent research consultant based out of New Mexico. Nice. So thanks to both of you so much for helping a lot on the slides and giving insights. And we're all gonna be sort of working through this together because there's a lot of big ideas that are gonna come up in this discussion. And then we're gonna have the next two weeks with Alex Kiefer, the author and other participants on the dot one and dot two streams. So we're just gonna be surfacing probably more questions than answers today. Welcome to Act in Flab. We are a participatory online lab that is communicating, learning and practicing applied active inference. You can find more information at these links. This is a recorded and an archived live stream. So please provide us with feedback so that we can improve on our work. All backgrounds and perspectives are welcome here and we'll be following good live stream etiquette. We're preparing and setting the context for the discussions on May 18th and May 25th, 2021 where we'll be discussing the paper, Psychophysical Identity and Free Energy which is a paper in 2020 by Alex Kiefer. And just like the other dot zeros, this is a context and a background and just an initial stab. It's not a review paper. It's not authoritative. This is us unpacking and exploring as we read the paper in preparation for the group discussions. And similar to some of the other dot zeros we're gonna start out within the author's own words. What are the aims and claims as well as abstract and roadmap. And then we're gonna ask some big questions. We're gonna talk about why it matters which is hopefully a discussion that includes everyone. And then we're going to meander through some quotes from the paper, some formalisms, some related reading, just a few ideas that we actually brought in during our learning process to help connect the dots because the paper is really like the tip of an iceberg and there's a long history of information and thermodynamics. And that's even before you get to everything with a free energy principle. So we hope that this will be a educational experience whether you're familiar with active inference and just learning about thermo or vice versa or not familiar with either. Any overarching thoughts that either of you would like to bring up before we jump in? Nope. All right, so I'll read the first aim of the paper and then either of you would be welcome to give a thought. The target thesis of the paper is the claim that biological systems whose internal states come to encode statistical models as a result of spontaneous self-organization in response to environmental pressures to learn and make use of these models by minimizing their physical free energies. The identity thesis, which is that yellow highlighted section does not specify precisely which approach to variational inference is thereby implemented or the minimum value the variational free energy must take. It may be, for example, that biological systems run an algorithm closer to contrastive divergence than simulated annealing. So a lot of terms, maybe first time hearing it or not but what's something that draws either of you or made you curious to read about the paper? Well, for me, it was contrastive piece. It was the idea that in order for this, any kind of a modeling effort to work, you can't start out by knowing. You have to kind of have a bit of a fuzzy idea before there's anything that's edgy. Cool. And the second, blue? Yeah, so the second piece of the aims and claims here and very timely, what is exciting about an identity thesis in 2020, and we also hope in 2021 and probably beyond, is not so much its opposition to Cartesian dualism because many people are that way today but what does make it exciting? Well, one, the fact the connection between mind and body is drawn at the level of physics undercutting even neuroscience as a reduction base. So strength for some, weakness for others probably. Two, that the reasons for which and the precise way in which identity is realized can be discerned at least in broad strokes on the basis of current science. And three, that the theory is accordingly expressed in quantitative terms. So this is not your run of the mill anti-Cartesian dualism. This is that 2020 flavor and it has more science now. So this for me was really awesome. I mean, I am always looking for the expansion of the mind brain duality to incorporate the body because I think that it's been overlooked for quite a long time. And it's nice to see that incorporation in this thesis. Cool, we'll definitely be able to return to that. All right, so let's turn to the abstract. It's three sentences long. Would either of you like to read it? I can. Yep. An approach to implementing variational Bayesian inference in biological systems is considered under which the thermodynamic free energy of a system directly encodes its variational free energy. In the case of the brain, this assumption, I like that places constraints on the neuronal encoding of generative and recognition densities. In particular requiring a stochastic population code. And some people may know what that means but I'm sure we'll parse that in a minute. And the resulting relationship between thermodynamic and variational free energies is prefigured in mind brain identity, DCs in philosophy and in the Gestalt hypothesis of psychophysical isomorphism. And I didn't know anything about Gestalt before I read this paper. Yep, this paper, it's like relating the relations because the little kernel down there at the root is mind brain identity, DCs and we'll return to this but that's the relationship between mind and brain between the physical and the mental. And then how is that as a kernel the mind brain identity, DCs related to on one hand qualitative areas like the Gestalt theorizing which we'll talk about. And then on the other hand, more quantitative areas like the formalisms related to information dynamics and thermodynamics. Let's go to the roadmap. So the roadmap, it's nice because first off there's only a few stops on the roadmap but there's a lot of freeway between each stop. On the right are the sections that were in the paper and on the left is how Alex wrote out in a narrative way the roadmap. So we kind of have the turn by turn on the right side and then we have a little bit of the expert giving you a guidebook on the left and it starts with an introduction and goes straight to formal equivalences between what? Well, between free energy and a lot of other things. A transparent code, a little bit of an oxymoron there I thought codes were supposed to be oblique but what is a transparent code? And eventually how does it lead to this free energy identity thesis being proposed in 3.2 and then discussed more in section four? Then there's a conclusion. So it's a pretty dense paper and what did anyone think about just the roadmap or just getting from A to Z? Can I speak to that for a sec? Please. Yeah, so the first reading of this when I saw transparent code I wondered about that as well a lot because I didn't even see it as oxymoronic. I just wanted to know how that was gonna get unpacked. And the other thing I found really interesting on first reading and both of those sections three and four changed by the time I'd read through the paper at third time, but then the whole idea of isomorphism to identity, I came around to the idea that I think he was talking a little bit about the ability to be an identifier as one of those measures of the sort of quantitative measure of what an identity could look like. He doesn't really get into, he still throws possible in there quite often in the paper, which I think is really a good thing to do because he doesn't try to bring it down to absolutes which I think is smart given as you said the density of the stuff he's trying to look at. Cool. Blue, anything? Otherwise we'll go to the big questions. So. Other than it was a really pretty rocky road to get from A to Z and it's, you know, I just kind of had no idea really where this was going so it was nice to see it evolve as it unfolded. I just always keep in mind the paper's finite. I will get to the end if I just crunch line by line and somebody who knows way more than me on this topic thought way more than me on the paper. So let's just trust the author, not in every single claim but at least in the structure and go from there. Well, why does it matter? Why are we spending our regime of attention reading this paper clearly multiple times each and digesting it and turning it around and having discussions on a chat forum. What is bringing us to the table from very different backgrounds and hopefully you as well to think about and be curious about this paper. The three things we wrote down here and then either of you feel free to unpack one or any, why does it matter? Well, continuing existence depends on efficient energy use. It's important to understand what life is as well as how we model it. And it's interesting to think about information, identity and physics. What brought any of you to the table? Well, so just to comment on the distinction between life and how we model it. I mean, I think that this is very relevant to the discussions that we've been having on realism and instrumentalism and really a fundamental way to clarify that distinction, I think, which was nice. Cool. And the big questions to the paper which are sort of a little bit drawing from quotes from Alex other times us linking the questions together. One is how might models of psychology, action and perception, such as Gestalt but basically just psychological theories that might have no math involved at the first pass. How could those math-free ideas be connected to formal equations, potentially those related to certain kinds of statistical inference? So how are we gonna connect psychology to equations? Big question. Two, after realizing some of these resonances between certain types of statistical inference and information and thermodynamics as well as potentially mind in the brain, is there any way to bring that all together? We have apples and oranges but maybe we can make a still life that has apples and oranges and they're each in their own place and it kind of makes sense. And then the third point here is the biggest question perhaps which is where does identity come from or what is identity? How would we measure that or characterize it or determine it? And I think one hint perhaps from Dean was to start with identity formation, what it is not. So then there was a quote there about comparing with neural networks and computers. Any other thoughts here before we jump into keywords? So this quote, like the difference between living systems and computers, I think like it would have been nice to see that upfront. I kind of had no idea where the paper was going and that it was gonna come to this like really solid kernel of meaning, right? And so it would have been nice to have some hint of that like foreshadowing maybe like an introduction or the abstract, but I was like had no idea where this was going and actually pleasantly surprised like wrapping a gift wrapped in 10 different boxes at the end. It went better than expected, inactive inference, what could we want other than that? But it does say that it provides a criterion for distinguishing minds from relatively crude simulations like blue is alive and was just speaking but if it was just the voice being played from a speaker is that gonna be counted as alive? Cause it's making noise as well. Well, if not, we want something richer than just making noise, what is that gonna be? How are we gonna get there? We're gonna start with some of the keywords and just like there was only three sentences in the abstract and in the final paragraph there was only three given keywords. And so rather than do a keyword driven intro and then walk through the paper we're gonna kind of do a little bit of both. We're gonna give some background information and some frame setting on the paper and then we're going to walk through the paper and where we need to introduce a formalism we'll just kind of unpack it there and share resources. Okay, so earlier we were talking about how it's kind of relating different relationships and this slide a little bit abstractly, little bit provocatively, a little bit ambiguously is relating those relationships. So at the top, yeah, D, okay, glad to hear. Because of the twist and I think when I saw it I mean I didn't anticipate ants, which is cool too but the fact that there's a twist in it I actually think is what we're talking about. It's, I think that will again surface as we go deeper into today's conversation but actually I was like, whoa, very cool. Cool, so what we were trying to get at here is to link the topics and then see how those topic linkages are linked. So the mind and the brain are these two nodes on top and the question is how are the mind and the brain related and you can check out the mind brain identity thesis to learn more about what a few previous perspectives have been but one can imagine there's all kinds of takes. They're the same thing, they're totally different, they're a little bit related so that's how the mind and the brain are linked and then on the bottom we have these two nodes. One of them is actual thermophysics so candles actually burning, balls actually rolling to the bottom of a hill and then on the right side we have human models resembling patterns that we see in thermophysics and the question of the linkage here is how are just like how are the mind and the brain related how are actual thermodynamics so instead of skin in the game, ATP in the game actual thermo related to maybe some equations and other domains that make us think about information dynamics. So those are the level one linkages first between mind and brain and second between actual thermo and models that are akin to thermo and then this next slide is a little bit of a multi-play but it's saying maybe by juxtaposing these relationships also with a nod to Gestalt and illusions with the aero illusion maybe by us as the observer juxtaposing these relationships we gain new insight and maybe FEP which is sort of the little ants crawling all around the sides maybe the FEP is that permeable membrane or it's the framework that's strong enough for us to build within and subspecify but it's also permeable in that it allows us to be open to new ideas coming in and the psychophysical identity thesis is kind of smack dab in the middle it's relating these relationships so if there were four things that get brought together I think it would be these four mind and brain and then actual thermo and models of thermo and the way to link them is to first make the pairs and connect the pairs and then see how those linkages are contrasting or similar. Danny would you say the part of this too though is the fact that they are parallel and apart I mean we build a relationship we've already sort of confirmed that but is it a connection? And I'm not saying that in a questioning way it's just I think it's a legitimate question I'm not sure that the way that we quantify that confirms it or not but maybe I'm grasping at something that isn't there. I agree that this relationship exists but I'm not sure if it's a connection but that's just me wondering. Perhaps it's reflective of the instrumentalism versus realism debate right? Like so thermophysics is a model and the human system is like an alive system with some kind of concrete existential purpose right? So we assume. Yeah it's almost like real on the left side and then more like an instrument on the right side because the brain does have thermodynamic constraints it generates heat, it takes in glucose it's doing metabolism, if you cool it down it stops so that's like a candle and then what is the mind? Especially if the mind is kind of higher order and reflexively making a model of itself in an interesting way but if the brain like a physical energy consuming object is like a candle that's this left side people might draw a link between those two draw an edge between the brain and actual thermo and we're gonna get to that in just a few slides and then the question is well what's the relationship then between these thermo inspired models on the bottom right and the mind? Is it that the brain is actually minimizing free energy or is it that we can model it as minimizing free energy? What is happening here? Let's go to the psychophysical isomorphism. I'll play this video and Blue what made you interested in this video. I saw the term psychophysical isomorphism in the abstract and I was like, what is that? It's the Gestalt hypothesis of psychophysical isomorphism and I was like, those are really big words. I just don't have the background. So I wanted to look up the concept and it comes from the expression that the properties of the mind and the consciousness are a direct consequence of the electrochemical interactions within the physical brain. And so this is the five phenomenon, the video, did it work? It's not playing on my end. So this five phenomenon is an example of where the patterning of a stimulus and the activity in the brain is similar and that's why it makes this like illusion of motion. Cool. So it's almost like we're connecting the dots but here it's shown with negative space and that's a entry point into thinking about Gestalt more broadly or is there anything else you wanna add on this slide? Okay. That's it. What is Gestalt? Well, people live their life teaching it. So we're not gonna cover it in one slide but we can cover some of the ways in which it was used in the paper as well as just a few little tidbits which kind of on our active inference foragers path were cool. So first just as a design principle on the slide you visually and it's a culturally scaffolded process separate for example if you're not colorblind, the red shape out from the black text. And so it's almost like there's this organization that arises when we're viewing material and that turns out to be patterns that we can study like reification on the left side here. We do see quote a triangle like a white triangle in A even though it's the negative space of these other Pac-Man type shapes. And so Gestalt is this big area of theory. So check out the word itself, terms like figure and ground and all these related fields. And what are we getting at that's bigger than all of these examples? What does reification of shapes and multi-stability like illusions have to do with this notion of invariance? Like the idea that by looking at different angles of a shape we can recognize that an object is invariance even though computers often have trouble with that. So it's almost like if you're working towards grasping it or you're seeing that there's something bigger behind these examples, that's kind of Gestalt. It's hinting at the whole. And so it's very related to holism and relational thinking. But we can dig one more level into Gestalt and just look at how Alex used it. And in a paper it says this work, the paper that we're reading and other work by Carl Friston and collaborators confirms a systematic link between variational free energy, even if it's your first time hearing it, it's all good. Variational free energy and thermodynamic potential energy. Interestingly, this view is presaged not only by the identity theorist in philosophy but also quite precisely by the Gestalt psychologists who suppose that the perceptual phenomena as subjectively experienced had structures isomorphic, same shape to their underlying physiological correlates. And does that mean when you're looking at a triangle that there's another triangle that's the same exact thing in your head? Well, not the same exact thing but maybe something that's structurally isomorphic or the same shape. And so to unpack that and to also evaluate the author's claim, is it really true that this area of science that we think is super awesome and advanced in 2021? Was it really true that it was presaged? So we can go to these citations like this 1935 citation and just copying out the contents and doing a little color coding, look at the titles of the sections. Chapter two has to do with behavior and sets up the problem but chapter three through seven it's about the environmental field, the field of affordances. This is what we've been talking about with a lot of our different discussions and then action, memory, learning and at the end a little bit of a closing note on even society and personality and probably what it means to be a person and an individual. So we are seeing a lot of these terms which should really bring us to slow down and think about how we can tie different knowledge, traditions and perspectives together in the way that Alex has done and hopefully in a way that we can also do when we come together to discuss. Any thoughts on Gestalt? Okay, now we're gonna switch gears from the ecological field and psychology to thinking a little bit more quantitatively about statistical inference and metabolism. Alex wrote that the argument at hand is meant to support a slightly less abstract isomorphism or identity directly between variational and thermodynamic free energy descriptions. This argument is consistent with a detailed treatment of metabolic efficiency, a variational inference given in 45 which is a Sen Gupta et al paper with Carl Friston in 2013. So for those who are interested in active inference check out where active inference was in 2013 and think about how what has happened since then changes our understanding. But just to pull one figure from this paper which is thinking about energy use in nervous systems it's showing how the sensory environments whatever's out there is going to get encoded. This is where that transparent code is gonna come into play into a format that allows control, gain control of the sensory input as well as action control. And it has to be done in a way that's thermodynamically viable. So you'll often hear things like the brain processes this much information or it uses that much energy as if it were just like some sort of desktop computer but it's pretty clear from just a first pass understanding that it's not just doing what my video card is doing. Vision isn't just the RAM going back to the video card and back and forth with a processor. So how is it that brains and cells and all kinds of biological systems are able to act effectively when it's pretty clear that they're not doing it like our heat producing computers are doing it. So this is sort of where we're looking at. How do biological organisms do good inference on the very complicated sensory environment and represents all of these wild stimuli in a way that allows for effective control of the inputs and do that without overheating or requiring just a ton of energy. Let's go to free energy. Wanna start here, Blue or I can start. Yep. Free energy in the context of variational Bayesian inference is a functional of a probability distribution or density Q used to approximate the impractice typically intractable joint posterior distribution P of H V of data V together with the unobserved causes of those data H under a statistical model whose parameters may also be unknown. And sorry, go ahead. Just that Q is the one that we can control and Q is sort of our guy. And then the one that we can't get at is the actual distribution P of two different things. The data, which is what we definitely have. So that's good. But then the unobserved causes of the data H and sometimes H is hypotheses. Like I hypothesized that it's raining. And so what are the data that you get? Well, maybe it's visual or maybe it's tactile data, but you're gonna be able to control your estimate of whether it's raining Q. And then what you're trying to get at is that total distribution P, part of which you can get at with H and V. So we're gonna be trying to talk about a way to bring what we can control our internal generative model and align that distribution through optimization to really complicated distributions in the world that might have many more degrees of freedom or a lot more variability than we could possibly handle. And another nice quote on the bottom in yellow is free energy captures the discrepancy between the organism's generative model of the world and the current environmental conditions. So that's kind of cool. And we've seen that come into play in the context of the predictive processing or the hierarchical predictive processing framework, which is shown here from Ramsted et al. answering Schrodinger's question. It's a figure that's showing how there's sort of a bottom up, which is sometimes associated with sensory input and a top down sometimes associated with priors about the world, these two interacting streams and they're in a multi-level way. So each level is only getting past up or down one level. And what's going up are these sensory inputs and unaccounted for errors. And what's coming down is expectations. So if the bottom up is exactly what you expect, it's almost like the system is resting. So in other words, if the model's perfect, given the visual data coming up, everything is very chill and things are not chill to the extent that the bottom up information is surprising. So what is getting passed up and under the free energy principle, maybe it's free energy, but what is free energy? Another way to look at this question about the relationship between the observations in the world, which are called data in many cases, and the generative model of the world is with the paper that we discussed in actinth number six, tale of two densities. And variables and parameters, they're used often loosely because when you're dealing with programs, everything's a variable, everything's a parameter. So it can be a little bit confusing to which kind is which, but the data parameters are the observables and the recognition model in Bayesian statistics is the model that recognizes the data and forms hyperparameters about the data. For example, the hyperparameter might be like the location of an object and the data might be the actual retinal cells giving raw input. And so the recognition model is going from that retinal input to a hyperparameter of where's the object located, and then that can be used in an invertible way with a generative model that goes from, okay, given where it is in the estimated location, what kind of data would be generated from that? So this is a Bayesian model fitting approach that's related to expectation maximization where you take what you have as far as data and you update what you think about the generative process of the data and then you crank it the other way and you ask, given what we know about the world, what kind of data might it generate? And Tale of Two densities contrasted that sort of Bayesian approach with a lot of the theory of the inactive researchers who are interested in this relationship between agents in the world and seeing that incoming data as analogous to perception and then the outgoing generative model as generative model of action. So that's what we discussed in Tale of Two densities. Dean? Yeah, I wasn't there for that, but I'm just gonna ask the two of you. I really liked the diagram, but one of the things I was curious about is I had lots of conversations with people who had taken an activist stance or an activist perspective on things. I'm wondering from your guys' perspective, is this ability to ask and then what happens? Is it, how is it fundamentally different on either side of that diagram? Fundamentally different. So you're asking the ability to ask and then what happens? So the ability to ask a counterfactual? Yeah, how is it different between the statistical basis and the here I am in reality basis? Because I've asked this question and every time I ask it of people on either side of that diagram, they kind of go on, I'm not really sure. Do you have any thoughts, Blu? Yeah, I think that this paper kind of really tries to get at it actually, like getting at the what's different in maybe not this inactivist sense, but yeah, I really don't have the background, but I think a lot of times there are layers that seem like they should overlap, but maybe don't have a direct correlate, right? And I think that that's what's happening here and I think that that's what Alex tries to address in his paper in like a thermodynamic free energy statistical thermodynamic kind of way, right? Bear, I'll give another thought. So and then what happens in the context of Bayesian statistics, it's like a question about counterfactuals. So if you're asking and then what happened differently with the data, well, from the point of view of the hyperparameters either the data would be better fitting or not. If you ask and then what happens to the hyperparameter, well, it might generate a different kind of data. What would happen if the sun were in a different location? Well, the shadow would be different. So as opposed to that sort of cut and dry sense of counterfactual either on the data or the generative side in the inactive frame, perhaps and then what is more of an invitation to play. So just at a very first pass, the inactivist perspective gives more of a affordance for negotiating that counterfactual as play rather than just coercing it into a statistical crank. It's like, well, what happens if we jam this gears? Like, well, then the wheel is gonna stop turning. Next question. But inactivism is a little bit more playful with that blue. So I didn't see this in terms of counterfactual. I was looking at it in terms of a predictive standpoint, and then what happens is in my mind a prediction. And so making a statistical prediction versus making like a prediction on the side of the organism or from like an inactive standpoint to me like, I don't know, I mean, I still think like it just you predict and then either you're right or you're wrong and then you update your model accordingly. Yeah, Dean. Yeah, so because I've sort of come at this from the way faring and the wayfinding of how do I orient myself to this stuff? I wondered, I wondered, I don't even have the confidence to sort of put this out there and stand by it, but I've wondered if part of it is the statistical gives us a bit of an allocentric, a bit of a detached kind of back to what both of you said and Daniel, there is a personal piece and egocentric part on the inaction side that doesn't necessarily exist on the allocentric side, but that doesn't mean one or the other more or less real. But I think it's whether you can look down on something and see it in that kind of a relationship because we're talking about relationship or relationships or whether you're seeing it from the egocentric, oh, that snowball is now going to hit me, duck, right? So I don't know if that's the explanation that Kiefer was going for here, but I think it's part of the conversation if we're going to quantify this, that's it. So perhaps the difference here really is and then what happens in my model and then what happens to me, right? Very nice, very nice, it has to do with agency and there's an I in that inactive side. There's what happens to me and what happens to my model or how would I see it differently versus given this chessboard, how would it just play out? Which is a little bit more like the impersonal Bayesian. So pretty cool and the reason why we showed this slide was first off to highlight that even when we're talking about the hard, the soft is there and maybe even vice versa. And also because this approximating distribution that the free energy is being discussed in the context of is the recognition model. So that's why we're showing this mapping. Okay, what is variational inference? Well, it's something that I'm sure we all had to do a little bit of a dig to find out about. And early in my digging last week, I came across a really helpful PDF and I started with this quote from what Alex wrote and then just started searching. And the quote that Alex wrote was that variational methods formalize statistical inference and learning in terms of the maximization of a lower bound on model evidence called the negative variational free energy. He describes them as state of the art approaches and also notes that the variational methods have a long history in unsupervised learning. So cool, it sounds state of the art, but there's a long history and maybe it's a tractable way to get at some hard problems. Well, what is it? And this PDF started pretty early on said, if you haven't seen Jensen's inequality, spend 15 minutes to learn about it. But again, this person has thought about how to teach variational inference. Maybe it's worthwhile thinking about Jensen's inequality and one way of phrasing Jensen's inequality is a generalization of the statement that the secant line of a convex function, so kind of like a chord on a circle, like a cut through a circle, lies above the graph of the function. So it sounds like it might be a little bit obvious graphically and indeed there is a proof without words. So that's just flipping to this next slide. So yes, it's kind of like if there's a bowl, any slice that you take through the bowl, it's gonna be able to have something roll underneath the slice. That's really important. And pulling a few quotes back from the variational inference by Blay PDF, what is written is that we can't minimize the KL divergence exactly. Maybe we'll talk more about KL later, but we can't minimize the divergence between the data we're getting in in the generative model. We wanna minimize our divergence. So just we'll think about it at that level for now. It's hard to minimize that directly, but we can minimize a function that is equal to it up to a constant. This is the evidence lower bound or the elbow. So it's almost like, I can't tell you where the bottom of that bowl is, but if I can take a cut through the bowl, then I know that it has to be lower than that. But if I cut as low as I can to the bottom of the bowl, maybe I'm gonna be just infinitesimally cutting above the bottom. So if I have a really easy way to figure out how that cut gets minimized, then maybe we'll do all right. Maybe we don't need to find out the true single bottom because we'll just make cuts that are lower and lower and we'll go from there. And it turns out that Jensen's inequality really comes into play. And also part of the trick is that there's a choice of the investigator for which functions to use. And it turns out that some functions, it's easy to calculate them one way, but really hard to go the other way. And there's times where you want that, like when you're doing cryptography, but there's other times where you want functions that are very, very easy to go back and forth, like as easy to go from the input to the output, it maps uniquely and quickly back. And so it turns out this loop thinking kind of primes the pump, but it turns out if you choose mathematical functions where it's really easy to go from input to output and back and you combine it with this idea that you can take a slice through a curve and find a tractable way to estimate basically the lower bound on the evidence, it's gonna be possible to set up some optimization problems so that we can solve them straightforwardly. And so in other words, highlighted here, minimizing the KL divergence is the same as maximizing the elbow. So maximizing the evidence lower bound means like we're pushing up the bottom bar. So we're getting more and more evidence, we're getting better optimization as we minimize the divergence between the data that we're getting and our generative model, but we picked a generative model that we can play really flexibly with. Any thoughts or we'll continue on? Just that it goes back to solving the problem once you know the answer, solving the process of solving the problem once you know the answer becomes like very constrained and simple. Nice. And I just described this as then if then, which is exactly the same thing. Then you have something that you already are aware of the if the unknown space below that line and then you resurface to a new place. And below that line on that next slide is what I would describe as learnings unholy triad, the triangle under there, but I'll let you talk about that. No, it's really interesting because outside of the cord, outside of that line, you have no guarantees. And in fact, often you increase the precision of your guarantee within the frame at the cost of losing the generality beyond. It doesn't make any generalizations. So the metaphor that I don't know, somebody with more math, please correct or help us make this richer, but it's kind of like variational inference is doing inference on rails. So the train operator can have a really complicated generative model of the world. Like what time should I be home for dinner and what's my favorite color? Could be wild sensory input. So you can have a really rich system, but the operator can tell from the local speed of the train that the train is going in the right direction and maybe even how fast it's going in the right direction, given the structure of the problem, go to the desired location. Now, there's other times where you don't want to go as fast as possible, but just taking it at a first pass, going from A to B, we want to structure the problem so that we're on rails. We know that all we have to do, no matter how much data we have in our generative model, we want to have like a forward and break. And so variational inference helps us structure the problem. So it's more like being a train on a track with a destination in mind, which is this minimization of the divergence, which we will get to, as opposed to sort of this like hill characterizing and just mapping all the hills. That's like a map maker, but the train conductor is not map making. They have exactly their tools at hand, and then they're just going as fast as they can in the right direction. And this was just a technical slide from the paper. Any remarks on this, Blue? Otherwise, I think we could, yeah, okay. So this is for those who want to see the technical details, read the paper, but this is equation 2.1, and it relates some of these terms that we're talking at, but we're just exploring today. This was another useful blog post, and it plays into the fact that the KL divergence is asymmetric. So the divergence between Q and P is not necessarily the same as P to Q. And so it's almost like there's two different ways that we might optimize. It's kind of like finding a local maxima versus a more global maxima. But Q is a constrained type of distribution. So for example, here, we might be picking Q to be a Gaussian. So we know that Q is gonna be a bell curve, and we're only going to be trying to maximize one thing about Q, which is its mean and its variance as well. In the easy case, what the author of this blog post writes as easy, is just like find me the highest point in P and then place Q right on top of the highest point in P. So that gets you to a local max, it finds a local, the global maxima, and it just lays down your average there. But it turns out that there's a hard direction as well, which is actually given the total distribution of P, I want the best overall fitting Q, and the author writes, well, why is this hard to compute? Well, for most interesting models, we can't compute a complex model. For example, the real world or really complicated graphs or very complicated Bayesian posteriors, because remember a lot of this variational inference rested on the fact that we were choosing distributions that were easily reversible. However, when we're optimizing functions where there isn't a really easy reversible way, it's harder to do inference. And they also write down here on the bottom, the optimization problem is convex, so curved, hashtag Jensen's inequality, when Q is an exponential family, i.e. for any P, looking at this bottom graph, the optimization problem is easy. So in other words, if we pick Q to be the right kind of distribution, then it'll be smooth sailing no matter what P looks like. But if we want Q to be the same type of distribution as P and we don't know what kind of distribution P is, we're kind of setting up the problem to fail. However, if we say, hey, whatever kind of problem P is, whatever is generating P, we're going to make Q part of the exponential family, then it becomes easy. And so you can think of maximum likelihood estimation as a method which minimizes the KL divergence based upon sampling P, but in this case, P is the true data distribution rather than a model of the generating function, Dean? Yeah, it was interesting because when I first looked at that, I wondered, is this a good way of quantifying the difference between the guy in the train who's looking at the world through deontologicalized and then another train conductor who's looking at it to utilitarianize because that actually exists in those graphs. It's kind of interesting. I'm thinking of who those different kinds of people would be or what they might be. Well, yeah, the which one is, if I'm gonna run over somebody on the tracks, do I run over five people who I don't know or do I run over one person who I do versus I can't do anything that I wouldn't have something be done, the equivalent to me. But anyway, I just think it was interesting that the philosophical qualification ended up being visible in the representation of those graphs there. Cool. And here is how Alex framed it, which is that F, the free energy, is useful as an optimization target. So just a little reminder, why are we going into all of these details about statistics and about the KL divergence and the variational inference? Well, because we're talking about variational free energy and doing optimization on variational free energy. So that's why we had to pull back to asking what variational means. And then when we think about where the energy comes in, it will return. So Alex writes, again, that F is useful as an optimization target for several reasons that have been widely discussed. And maybe those reasons are obvious to some or some people think that it's not enough of a good reason for F to be used, but there are reasons to choose F as a function that you want to minimize. 2.2 gets at that. And it even uses the Q and the P. So if we look at 2.2 relative to 2.1, in 2.1, see how P is on top and then Q is on bottom, and then here, Q is on top and P is on bottom, and there's this plus F. If you wanna unpack the equations, please do so and help teach to us, but just note that things are flipping and it's being mirrored by what we saw here where on the top one, the easy one, Q is on top, right? And then on the hard one, Q is on the bottom. So also using that PHV notation. And then what does that mean? So let's just say that free energy is the thing that we're going to be optimizing on. And variational free energy has a certain structure that's related to variational inference that means that maybe if we're trying to minimize variational free energy, that's gonna be like the right optimization path. Okay, so we're gonna be doing variational free energy minimization to do model fitting. Where does real energy come into play? Where does thermodynamics come into play? And it's right here. As is often remarked in discussions of variational inference, so e.g. in the statistics literature, F, free energy, has almost precisely the form of a negative Helmholtz free energy from statistical mechanics. In fact, that's why we call it free energy because it has a really similar form. And that free energy term, or the energy term, let's just call it, F of t, it has two parts to it. The first term on the right is the expectation, so that's e is for expectation here, of the energy stored in the system at a given temperature due to internal states. So really complicated molecules with a lot of order, like DNA, they have a lot of internal energy locked up in their structure. And then the right term, TS is the temperature times the entropy. So that's what the term looks like for thermodynamics. And that's how you can look at whether an equation is gonna be like endothermic or exothermic. Like if it creates a lot of disorder in the environment, it can locally order some stuff, but you can't make or lose energy. And so to get to this more chemical phrasing of F from variational inference, Alex writes we rewrite equation two one using an energy term. And so we get this equation two three, which is analogous to this more statistical physics equation where we have basically VFE consists of the energy of the internal configuration minus the temperature times the entropy. So part of the question is gonna be, okay, cool, it looks similar. Might those things be isomorphic? Might they be the same? That's the thesis. That's the question. Is it actually that way? Or is it just that we can think of it similarly? And so this is where we got to 26, bringing back the train to two. Could we take well-structured optimization problems and connect them by resonance in the structure of the equations to the ways that downhill trains navigate or burning candles? Or if you put a ball into a bowl, it goes to the bottom every time and it only goes to the bottom. So could optimization, variational optimization not just be well-structured, not just have the gas and the break so that we know that we're going in the right direction, but could we be thermodynamically pulled in the right direction? That would be like the candle, where the candle always burns. If you think where its higher state is with a candle and then the lower energy state is its burnt and dissipated, it always happens given the activation energy. And if there were an isomorphism between the structuring of the statistical problem and the actual thermodynamics in the world, that would be pretty interesting. So analogies are always partial. I know Dean's probably gonna have a ton to say about analogies here, but this is what is written by Alex. If the connection between statistical mechanics and statistical modeling by the brain were merely one of analogy, it would be surprising to find that all the terms in the Helmholtz free energy play useful and interlocking representational roles. This coincidence between physical and representational descriptions is to be expected, however, if free energy simply measures how much useful representational work can be done by the internal elements of a system where work has physical meaning. In the remainder of section two, I consider how the physical interpretation of each term in the Helmholtz free energy can be related to a corresponding facet of the optimization process. So it's like we have a DNA molecule and there's the energy that's stored in the configuration of the bonds, that's the internal energy. And then there's the temperature dependent term, which has to do with the DNA snaking around and moving around in the environment. So now when we compare that to statistical models, it's almost like the internal energy is the complexity, not of the DNA molecule, but the complexity of the model. And then that wiggle room is related to potentially the accuracy or the variability of the model, but we'll see. Anything to add here on the candle? Okay. Just that the focus is on the function, I think. The representation as function. Yep, that's really interesting. So I hope Alex can clarify on a lot of these math points and also on the representational work side. So here's another piece that came up during preparing for this. Gibbs free energy is G and the difference between the reactants and the products of a chemical reaction is often referred to as delta G. And when the delta G is significantly low, like it's a downhill, then if you can get over the activation energy with just thermal energy or with an enzyme lowering the barrier or whatever, it will flow downhill. So if the delta G is negative, the reaction proceeds as surely as water flows downhill or inhale, should I say, because up and down are not real. And so here's three different cases that are being shown. So here is the downhill case where the reactants have a higher G than the products. The reaction proceeds, but not all the way. Sometimes it's functionally it proceeds to completion, but actually all reactions have this K value where it's somewhere in between because at some point you'll have like 99% of the product and then the backwards reaction is 50-50. So just like if you have a test that's 99% accurate, but the disease is really rare, you might do the Bayesian statistics and find that actually contingent on a positive test, it's actually not likely that the person has it because there's so many people who don't have it who get false positives, that's the Bayesian case. So in the chemical case, it's kind of like, even if the reaction is favored 20 to one, well, once it's 95% done, then for every one of those that goes forward, 20 come back. So it re-equilibrates at 95 to five if the overall difference is 20 to one. And then on the right side on the top, if the products have a higher Delta G, they have more internal energy, then you need to add energy to make it happen. And then on the bottom left is the special case where the reactants and the products have the same Gibbs energy. So for example, like two isomers of a molecule. And in that case, it's like a coin flip. It's as energetic for the reactants and the products. And so it ends up just jostling back and forth and getting completely mixed. So those are the little bit of the thermal intuitions that people might have heard related to kinetics and thermodynamics of chemical reactions, but let's talk about inference and models. What does it look like to have a model that's going downhill on an energy landscape or two models that are as likely as each other? So it's kind of like a coin flip between them. So that's just kind of, again, getting at this potential, certainly helpful, but potentially even physically real isomorphism between model inference and thermodynamics. Cool. Lou, what could you tell us on this slide? So I think this is stuff that Alex put in here talking about the role of internal energy. And so these two equations, the Helmholtz free energy you talked about already first. So where U is the internal energy, the top equation is A equals U delta S. And so U is the internal energy combining the kinetic and potential energy total in the system, and then the T is the temperature and then the S is the entropy. And he writes, an intuitive explanation of the minus TS term is that in so far as the properties of the particles in the system are uncertain, the kinetic energy constitutes irregular motion. So the impact on the free energy of entropy, a measure of the irregularity is scaled by temperature, roughly average molecular kinetic energy. I thought that was a neat explanation. And then the Boltzmann distribution is this bottom probability and it's the probability of a state at a given temperature in relation to the energy in a system. And he says here, well energies are associated with good or desired configurations of the system which are often interpretable as assigning high probability to what they represent. And the analogy to physics is closer that the energy can be directly related to the probability of the internal state of the network occurring. Cool. So how exactly are these different flavors of thermodynamics and math related? Alex, teach us and help us understand where other researchers can contribute. How about on 30 blue? What was interesting to you about this? I threw these in because these are both different kind of neural network setups. And so I'll start with the bottom one. The Boltzmann machine isn't really practical or used a lot to solve problems because it really takes a lot of energy. As you can see, the connections in the network, there's a lot. It's like a fully connected network. But they use this restricted Boltzmann machine to kind of solve problems instead. It's a neural network architecture. And it's simpler than the Helmholtz machine which is above it. It's just divided into the visible units in green which is like, you can see the data going in and then the hidden layers which are the blue nodes in the network. Anyway, it's just uses Gibbs sampling to recognize and generate inputs. Not like what goes on in the Helmholtz which is like the, there's in the Helmholtz model which is the top one. There's this recognition layer and a generative layer. And so there's the hidden layers are behind it. So when you see the visible layer, this is the input in the recognition model. And then in the generative model, this is the output. And so I think that it's really kind of cool that there's this recognition layer and the generative layer because it just relates so closely to active inference and how that's all done. And there can be multiple layers. So you can have like a deeper structure than this. And also these Helmholtz machines are, I think Alex talks about in the paper, he doesn't show it or maybe say it explicitly, but the Helmholtz machines literally are doing sleep wake, sleep wake. And so how you train the model is that like the generative traces the recognition like by definition. And so it's constantly chasing each other versus the Boltzmann machine that's just all kind of happening at once. Thanks, that's really interesting. Also these names like Boltzmann and Gibbs, it's almost like there's equations that are named after them in thermodynamics like Gibbs free energy. But also this is an inference machine. The Boltzmann machine is an inference machine. And I just looked it up. Some may find this interesting, some may not, but this is the Wikipedia page for Gibbs sampling. What is Gibbs sampling? Gibbs sampling is a Markov chain Monte Carlo algorithm for obtaining a sequence of observations which are approximated from a specific multivariate probability distribution when direct sampling is difficult. So that comes into play in blue and eyes area of genomics or in phylogenomics where we don't have the generative model, but we have a ton of data. And part of the way that we fit Bayesian models in phylogenomics is by using these sampling approaches. And so it's kind of like sampling, but that stochastic way that we're sampling where like you kind of go in the direction of informative samples, but also you want to sample broadly from a distribution that you don't know the real underlying truth, but you just have a ton of data. That is just very interesting to see how it comes back here. So thermo and info have been pre-saged for a long time. That's what Alex does an awesome job of bringing back some of the literature from. Okay, what is temperature? So there's temperature in the physical world, which is related to the speed of molecules. And just before we quote Alex on the left, just a reminder that even at a given temperature that the speed distribution of different molecules is really different because they have different masses. So what's conserved at a given temperature like zero degrees Celsius or something is actually an energy value per molecule and lighter molecules are moving faster than larger molecules. And what Alex writes is that the meaning of temperature within statistical modeling, so that's what it means in physics, statistical physics, but in statistical modeling is illustrated in the example of simulated annealing. Where increasing the temperature increases the variance of the distribution over configurations of the system and lowering it collapses the distribution to a small range of states near the ground state. So like when you have DNA, the lowest energy state is when the two strands are annealed all the way together and just resting. And then when you crank up the temperature in that reaction, there's more and more wiggling of the double helix and then for any given stretch of DNA, there's a temperature where there's so much wiggling that they separate. And so then you cool it down and they re-anneal. So that's kind of how PCR works, but it also has a statistical interpretation. As you turn it up, everything is wiggling around more. And then as you cool it down, things converge to small ranges of states near their ground state. And controlling the variance of a distribution is useful in many applications. I'm sure we can all think of many times where there's a need for looseness or tightness or flexibility to tune a given model from loose to tight. But what does that make you think about? So I just want to talk about temperature. There's temperature in this thermodynamic sense that controls so much about the energy of the particles and all these things. And I just read a really interesting model. And so I would have added the reference. I just looked it up really quick. So it's a paper called Dynamics of Collective Action to Conserve a Large Common Pool Resource. It's a paper that just came out in scientific reports by Anderson et al. But in this paper, they use this. It's this phymetric. And they use it to talk about the temperature. But it's not really a temperature. But it's just the influence of all of the external surroundings on the behavior of the agent. So I think about in the terms of active inference, what the political arena, the news, all of these things have influence on our perception and our action. So can we have a temperature in terms of having this in an active inference sense? And they did a really nice job baking that into that agent-based model is nice. Cool. Also thinking about action, it reminds me of a golf swing or chopping a knife. The low-temperature action is the variance in the action is very low. And so you might see that reflected by a groove that's very entrenched. But then the high variance, the higher temperature action is kind of all over the place. And so we can imagine that, especially when we're learning, we don't know where the right place to put the average is. So you don't want to have a tight distribution too early. You want to start with a very high temperature and a broad distribution. And then you want to hone in. Maybe there's a few ways that you a few successful end points you might get to. But ironically, if you start there too tightly, then you'll fail. So that's kind of interesting to think about. How about organisms that have to minimize their free energy to stay alive? Well, Alex writes that free energy minimizing has been used to explain how organisms managed to keep themselves away from thermodynamic equilibrium, the ultimate equilibrium of dissipation, with respect to the external environment. In other words, you have to maintain that non-equilibrium steady state because if you descend into total equilibrium, it's equivalent to death. And so using a candle analogy, the flame is out, out, out flame. There's a point where the system is no longer lit. The flame is out. And then it's not just that it's like a zero to one with life and death, even though we're not going to the human case or anything like that, just taking it thermodynamically with a candle. And then it's just like volume. You can just get more and more and more dissipation of energy from a candle to a flame to just a complete inferno, more and more dissipation of energy. And then there's some value where you're not dissipating energy as a system, you're dead. Well, let's map that on to organisms. So there's some point where the organism dies. Not saying there's a moment where it happens or a clean break, but there is a difference between something that is no longer actively maintaining a non-equilibrium steady state versus something that isn't. And here is like a dog that's sick, but it's alive. And this is like a more vital and healthy dog. And so the question would be, do species really minimize their free energy or do they just, oh, oh wait, that's a little bit more of a generalization. As long as the organism is maintaining its non-equilibrium steady state, it's fair to say that it is acting as if it's minimizing its free energy. But then the question is maybe for other levels of organization like a species, do higher levels of organization actually minimize free energy in their own right or are they just more like a collection of agents that are seeking to stay alive? And then the group stays alive as long as you have individuals that are staying alive thermodynamically. So I think that's partly where the mapping is going with this whole energy and information and life and death. But again, Alex, that'd be awesome spot. You know, what is life? What is death? These would be great things for you to address. Any thoughts or we'll continue? All right, so in the free energy principle and metaphysics section, we pulled out a few quotes related to where all of these predictive coding and free energy principle ideas come together. It's a little bit rehashing, so we're not gonna read anything, but people should check out the paper. And yeah, sometimes with these dense papers we end up just getting a lot on the slides just to get it out there and unpack it. And there's many ways we could have probably reordered stuff or thought differently about it. What about this slide, Lou? Okay. Just that it's just, yeah, we're not gonna read it. We're not gonna read it. The paper was really good. Good, good there. Let's consider the thought experiment that was done in the paper, which is this sleeping and waking agent. And so we're going to have these, an agent that has two different phases. There's a wake phase of the algorithm where the hidden caught where the activity is driven bottom up by the recognition weights. So like when you're awake, it's driven by the visual input coming in. And then there's a sleep phase where you can imagine the opposite is true, like the model updating is being driven by the generative model in the absence of being driven by stimuli. And then you take those two phases of the model and combine them into this R equation. So it's like you have night and day, light and wake, and then the agent is gonna be modeling that. And I think, Dean, you had a nice take on our interpretation of what this sleep wake meant. What did this example show or where did you see it in the context of the paper? Well, sort of going back to this being an example of this stochastic channel. So I wondered whether or not recognition is a recognition which is then consolidated is a two-way establishment of what sort of we've taken to be what we now know versus what we don't. So is there two parts to this? Being it's a two-way establishment, is there a build up in terms of the generative model? As probabilities are confirmed and is there a build in? Is there a way to be able to in that, I guess it would be in the sleep phase or the non-wake phase, is there a way for us to be able to sort of purge the error units and then sort of discard them? But again, I'm saying this, not knowing my negative F from my D sub KL in terms of what this actually means, right? But I do like the idea that you can try to take a shot at it, a guess at it. And if you're completely off, well, fine. But it's not so unreachable that you can't take a good guess at it. Thanks for really doing awesome on the dot zero and bringing your perspective because I totally agree with that. And I'm not even sure what the agent is modeling here. It's like R of what, let's find out what this agent is doing and what it means. But it is playing a role in the paper which is using the model as just a minimal experiment to illustrate the identity thesis, which is the claim that the variational free energy, so doing the variational statistical inference, that model fitting is going to be equal to the thermodynamic free energy of the system. So definitely Alex help us understand what the model is showing here. And also what maybe empirical system could we actually look at? Like can we collect data from a wake person or a sleeping person or another system to see if this is the case? In equation 3.2, there's more unpacking of the F in terms of a KL divergence and then another term. If you wanna learn more of the mathematical details, unpack that and work through it. In the discussion, there is some connection to the work of Lewis and others. How is the Bayesian brain and the position of Lewis relating to folk psychology? How do these terms come together? Section four. Later in section four, there's some formal logic. Ramsey sentences, predicate, logics. Also it'd be very interesting to learn about what is being done here. Because we were talking a lot about the mind and the brain. That sounded like statistical and philosophical. And then there's things burning and thermodynamics. And then here's a logical sentence that is actually, yeah, Blue, what did you see here? So I put a lot of these little comments into the slides because I was like, what is a Ramsey sentence? And what are these like weird symbols? Like I can do regular math, but like when you start to get into these like, I don't know, tricky symbols, it baffles me always. So a Ramsey sentence is a formal logic reconstruction of a theoretical proposition that attempts to draw a line between science and metaphysics, which I didn't know that I've never heard of these before. And so I just couldn't really figure out, like, and so this backwards, yeah, I was trying to read this, like, you know, equal this equation here down at the bottom. And so this backwards E is called the existential quantifier. So like there exists a- X. There exists, there exists of X. And then this upside down A is a universal quantifier for all. So there exists an X for all Y. And then the T I guess is the predicate. I don't know what that means. So that Y, and then this is a material by conditional. So I was really grateful for the triple equal unpacking because I had no idea what that meant. And then the identity relationship. So I couldn't even get all the way through reading this. So if someone would read it to me, that'd be great. Yep, perhaps there exists X for all Y. And then the claim is of that or that relationship is that the predicate, which it's the part of a sentence or a clause with a verb stating something about the object, like went home in John went home. And then just to see the three and the two, like the three is saying the material by condition, that's the psychophysical part. It's like saying there's something T of Y and X. Those are the parts that are actually materially by conditional. And then there's an equal sign. And those are the identity or let's hear it from Alex read in different languages. Another topic that came up in this paper as well as was in 16 is what's the difference between a real mind brain system versus a simulated mind brain system? Like is every chess playing algorithm alive just cause it can play chess just because people play chess? Or does there have to be some other aspect to the information dynamics? And it's something that both this paper and again 16 with Wanda Weiss went into. Any thoughts on real simulated? I really thought this was like the crux of the paper. So that's my only thought, this identity thesis and the criterion for distinguishing, you know, a neural network versus an actual neural network in my brain, right? Or a person with a neural network or an organism, somehow. Cool. Plankton or whatever, it doesn't have to be a brain but cause Alex did make that distinction in the paper. Awesome. So just a few more last notes for those who stick around to the end to listen to the good stuff. First, how does the cell do more with less? It's kind of like asking how does the brain do more with less from a model fitting perspective? If you were to train the computer on tracing a baseball, it might use like 100 hamburgers worth of energy but then you can trace the baseball for a fraction of that, one centi hamburger. So that's the brain doing more with less for model fitting. How does the cell do more than less, absorb more with less in the biochemistry realm? And this helps us get an intuition for how the brain might do more with less in the information realm. And it relates to the hydrolysis or the cutting with water of ATP, which is a molecule with a high internal energy so you can use some of that internal energy to do other work and also release heat. In other words, split ATP into ADP and a free phosphate and extra energy, free energy, energy you can use. And this is what was one of the most fascinating probably facts that I learned in my biochem education exactly how much free energy is released with a hydrolysis of ATP and how is that free energy used to do cellular work? The calculated Delta G for the hydrolysis of one mole of ATP is negative 7.3 K cal per mole. So if you just burn the candle, you get negative 7.3 units of energy. That's under standard conditions. In fact, the Delta G for the hydrolysis in a living cell is almost double that negative 14 K cal per mole. How is it vitalism? Is it magic? Well, can't rule that out, but there's a lot of tricks that can be played with localizing reactions. So you can have a little tiny bubble where the pH is ridiculously acidic, like one proton vibrating around, but it's so small that the proton's local acidity is unreal. And so there's all kinds of tricks that can be played with how things are combined and separated and connected to each other and related that help you get more energy out of just the hydrolysis of an ATP. So just think, if you can get two X on burning the candle in a cell, probably you can do a lot more than two X on the information in the environment. Is that physical? Is it just statistical? That's the question, but it certainly is happening physically with a cell. So it seems like it probably could be happening in the brain as well. Biochemistry blue? No, the cell has the demon. All right, so when we were putting these slides together, it's like, where's Maxville's demon in here? Like the cell has the demon, I guess. I don't know. Then where does active inference come into play? Hashtag actinflab. And so we searched for the term and it's used twice. Both times it's used a little bit indirectly referring to the literature on active inference and the literature on the FEP. And also it's related to topics we've talked about like top-down priors, the umvelts of the organism. So it'd be super interesting to hear from Alex where does active inference come into play in your agent as you had the thought experiment or just where does active inference come into play? And then a key test and prediction, although in another sort of classic three sentence paragraph, Alex does take it both ways, which is basically to ask whether the identity thesis can be tested and he suggests that you could look for a stochastic code. So in encoding, just like we saw earlier on with that SenGupa 2013 paper, the encoding that's gonna be like the sort of trade-off between only giving a little information but making it the right info so that the model can be updated appropriately is to see if that is empirically plausible. And this is an interesting philosophical point which is that the conjunction of the identity thesis with FEP, free energy principle, is a falsifiable hypothesis, even if the FEP is not. So in Acton 14, 15, 20, we talked a lot about this. What is falsification and relationship with the FEP? And so it's almost like FEP is a carrier. Like it is deployed in really clearly falsifiable ways, yet it is more of that carrier wave that isn't directly falsifiable. Or I'm not sure what he was getting at here but I thought that was really fascinating to think about not just, oh, it's a corollary that's falsifiable. It's like we're taking the FEP and another field and now that is gonna be something that we can both formalize and falsify. But here we just had falsification, here we just had formalization and we had to amalgamate them in order to make something that would take the best of both worlds. So let's hear from that. And then this is where I said he has it both ways which is not that something is true and false but actually that the evidence is gonna run backwards to us. He's saying strong empirical evidence for the right type of encoding together with a formulation of cognitive dynamics in terms of that variational Bayesian inference supports the case. So maybe there'd be so much evidence for some kind of encoding. What kind of encoding? How much evidence? How would we know? But we'll find out. Maybe it would be overwhelming positive evidence for the identity thesis. And then as always, then what? What then? Then if then? All these cool learning terms that Dean has been helping us understand what would happen then? What would happen if the identity thesis is true or not? And then what would we be then? So pretty fun. Thank you both for participating. Any last thoughts? We'll have just a closing section. If anyone wants to ask a question live chat, they can otherwise we'll just close it out. Can I ask one quick question? Oh yes. So I'm glad Blue brought up the thing that she thought was really important about the paper. So I want to kind of swing all the way back to the beginning because I think identity as proximal formalisms is really important. And I want to just read part of what he wrote right directly under this section for a transparent code because he never really does get into this too much. But I think it kind of matters. He wrote stochastic encoding and variational inference in order for the thesis of this paper to make sense. It must be kept in view that the generative and recognition densities of variational inference are densities over. And then he puts in brackets possible external causes of the sensory input. That is to say in the parlance of most philosophers and nearly all cognitive scientists, they are representations and it is their representational function that defines them as statistical models. Fixing the encoding of the recognition generative densities allows us to directly relate the representational work in quotes. That is the divergence between the densities is decreased. Decreased mind due to physical work. That to me would be amazing for him to be able to give that to me in real terms of real examples of people doing that on a sort of ad hoc basis because that would be lovely to turn the what is because I understand appreciate the down the train track thing but the what if to be able to sort of translate that what he wrote there in terms of a transparent code around what if would make my day and not just my model. It would make my model of my day. Make my model of my day. But that sounds super interesting. And that part first about representations, we didn't even tackle the representation question because also in several other streams, we've asked about what is a representation internal representation? Where are these representations? What if they're distributed in the environment or among agents? Is that still a representation? But also that adjacent possible. That's kind of like, whoa, if our perception is actually just the tip of the iceberg of a really deeper generative model and there's cases where it seems intuitive. Like, oh, I guess daytime is contingent on having a model of night or thinking A is contingent on knowing that it could be A or not A, but even that whole framework for thinking is still just part of a bigger process. And that's what I think a lot of philosophy tries to grasp at from a lot of sides. We're working with a kind of forceps, but there's like a bigger picture. Well, it sounds like we're also working with something called the transparent code, which blows me up because, oh, okay, well, if that's all it was, can you just tell me, show me what that is? And I know it's going to be hard because it's transparent, but hey, let's go. Lou? I'm good. Well, thanks to those who watched this live or in replay. We hope that you'll join us for the discussion when it happens or after. Thanks. Great times. Talk to you later, Dean.