 Hello, everyone. It is June 2nd, 2022, and we're in week five of textbook group cohort one. We're having our second discussion on chapter two, and we will get right to the questions. First, just wanted to highlight the math learning group, especially for those who are not regularly attending the meetings yet. The meetings are on 19 UTC on Wednesdays, but just with everything, don't let the synchronous time dissuade you. We can set up other synchronous times if people want to co-work. And also, the resources and the modifications are all asynchronous. So we've been doing several things in the math learning group. First, we've been sharing important resources and categorizing them. So if somebody wants to look at a video on something or find a textbook on something or if you come across something useful, just add it here. And then if you feel like leveraging the resources that you or other people have added, then you can categorize it in the table and just add a few pieces of information on it. We have a notation table that we'll be building, especially as we get into the continuous time active inference. We have some questions, less so questions about the textbook and more so just cool math questions that people are raising as they're discussing the material. And then we're building a math-oriented overview. So we've gotten up to equation 2.5 in chapter two, and this is just sort of like point by point, like what are the formalisms doing, where are the figures coming into play, how are some things related. And any and all note-taking can be added. You can always add it to like unsorted notes or errata, and then we can categorize it later. But this is how we develop a shared epistemic resource that's useful for everyone. And we're going to come to this in the questions, so we won't go into detail right now. But equation 2.5 is the variational free energy. 2.6 is about expected free energy, the G. And this is a key equation, a set of equations. So yesterday with blue, Brock, Jessica, Jacob, we went through and line by line we went with examples, natural language readings, questions, interpretations, and then Jacob added some derivations, like how are those three lines in 2.5 related to each other? How do we go from the energy and entropy formulation to the complexity accuracy formulation? So this is the essence of understanding the equations. It's an infinite learning journey for everyone. So whatever level, whether you want to be patching up to what has been written here, or whether you see ways to reorganize it and go beyond and add more connections, every single background and level of math familiarity can contribute in this way. But we'll come to the more details on 2.5 in the questions. Any just preliminary thoughts that people would like to share? They can raise their hand or write in the chat at any time. Is there anything that anybody wants to share before we start going through the questions starting with the most upvoted ones that we haven't addressed? But we'll just pause for a few seconds, tune into the chapter 2 regime of attention, upvote any questions that you'd like to see addressed, post anything in the chat. Yes, Jose? Yeah, I'd like to ask if there's a recording of the math sessions that you had? No, we record stickmergically with the traces in the notes, and we had this conversation about whether it should be recorded or etc. If there's enough desire, we could do something like have some unrecorded and some recorded. This session is obviously recorded and rewatchable here, and we do many live streams and there is a lot of recorded material, and so we want to also hold a space for people who don't want to speak on a recorded version, even though everything's recorded in universe and in each other's minds. But yes, if there's enough interest in people who want to record sessions, we absolutely will do it. We also just wanted to make sure that there was a total open math learning space where people wouldn't even have that cross their horizon. But we did discuss that yesterday. Thank you. Yeah, any other like comments? People can just raise their hand with six and gather or write in the chat. So we're going to just go to the questions, and there's still time to upvote like during the discussions, and we're just going to eye up the ones once we've discussed them. So we'll start with the first question that we haven't discussed yet. Okay, this is from page 32. Policy dependent outcomes are not immediately available, but they can be predicted by chaining together two components of the generative model. The first is our beliefs about how hidden states change their time. Just want to find what the second part was. The second component is the likelihood distribution. What really changes as a result of policy selection? The hidden states or the outcomes? Bringing a jacket doesn't necessarily change the temperature hidden state, but instead changes my observation perception of whether it's cold or not. Can we actually change hidden states with policy selection and what would be an example of this? Does anyone want to raise their hand and give a thought on this question? What really changes as a result of policy selection? The hidden states are the outcomes. While people are raising their hand, I'll give one possible thought. The answer is both slash it depends. So bringing a jacket does change the temperature around you. It's not just changing your perception of temperature. So one example of changing a hidden state of temperature with policy selection would be like changing the setting on the thermostat. That is intervening into the causal process, the generative process that's giving rise to temperature, which is giving rise to your perceptions of temperature. And so assuming that your temperature perception is accurate, you are able to change both the temperature in the room or the temperature around you with the jacket. And thus, because temperatures are giving rise to observations of temperature, also you end up changing both blue. So this was my question. And I think Carl kind of possibly started to answer it on the live stream yesterday. I need to like think about it more and unpack it a little bit more. But like we don't actually change. We change like the temperature around ourselves, but like we're not going to change the thermometer reading by bringing a jacket, right? So there's like the, yeah. But anyway, what Carl said and what might be beneficial for others to hear is that when you think about what's changing, it's like really when you think about a hidden state, there's a causal chain underlying that hidden state. And the same thing like when you think about an observation, there's a causal chain underlying that observation. So like, no, I'm not going to change what my eyes tell me the temperature is, but I'm going to change what my body tells me the temperature is. And so it depends because it's a multisensory observation, right? So which sense is going to change? Not my visual reading on the thermometer, but my physical perception of cold. Also yesterday on the live stream, Thomas Parr and Carl Friston joined. So that was really awesome and a great one to watch 45.1. And we can continue on this Ali and then Mike. Yeah, I think that's what meant by the word hidden in the statement, because if it was transparent, the observation to any kind of perception or observation, I don't think it would, it would be called a hidden state. I believe that the reason it's called this, it's called as such is because it's opaque to any kind of perception or observation. So it's kind of inherent to the universe and probably not amenable to any change or action. Thanks, Mike. I'm not sure if I'm thinking about it the right way, but I intuitively I was thinking that policy selection would update the expected conditions or the expected state. So maybe adjusting the priors in a Bayesian sense. Thanks. So we'll come to this later. But the answer based upon the partially observable Markov decision process, this time versions policies pie intervene in between hidden states and how they change their time. Different hidden states should they be giving rise to different observations will result in different observations. So changing the thermometer policy selection is intervening in the causal process of temperature changing through time in the room hidden state. And if the temperature in the room is associated through the a matrix with your observations of temperature, then you will be changing your observations. And then just like Mike said, that is reflected based upon the expectations that the individual has about the observations of temperature. So they're both changing because the hidden states and the observations are linked. That's what it means to have a generative model. Or for there to be a partially observable model with hidden states and observables. Hierarchical models, a given state could be in emission from a higher state and emitting a lower state. And that's kind of like how the current moment is like a consequence of the past, but it's a cause of the future. So this is related to Bayesian modeling, but using this architecture, we can say that policy intervenes only in between hidden states. We don't directly intervene in how observations appear. Any other thoughts or questions on this question? In figure 2.2, figure 2.2, they write X star and X. So that's going to be generative process and generative model, latent variables, hidden states do not necessarily live in the same space of measurement. It might be the case that hidden states in the external world take on values that lie outside the space of explanations available to the brain. Conversely, it might be that the brain's explanations include variables that do not exist in the outside world. For example, the former could be five-dimensional and the latter two-dimensional, or one could be continuous and the other could be categorical. So what are examples people can think of where the dimensionality of the generative model, the cognitive entity, is the same as the generative process. The niche process giving rise to observations passing them to the generative model. That's one question. Second question, how does this approach or framing speak to the map, territory, debate, hashtag instrumentalism and realism? What papers or live streams best characterize how active inference models the action-cognition perception loop? And what other possibilities or models are there for the action-cognition perception loop? What would be the different strengths and limitations of these different framings and partitionings? Thanks. Joe, and then Blue, and then anyone else who raises their hand. I was just thinking, a nice case where you could say dimensionality is pretty obvious is board gaming. If I was trying to anticipate what my opponent is going to play in a board game, they have a limited number of possible moves, and I'm assuming I'm playing the same game. I'm expecting them to make some moves on the same board. So me, that sounds like the dimensionality of the thing, because you're in this limited world. I don't have to know what they're thinking. I don't have to know what they had to record. I don't have to know a lot of stuff I'm just thinking, and I might be wrong, but move them like this. Great. Thanks for that. So with the same dimensionality, it's like if we're playing the same game, then we're playing a game where we're tracking a little cursor on the screen. The movement space within that is just the X and the Y value, or just two dimensions. The observation is like the location of two dimensions, and then that is the same dimensionality that inference is occurring on. But I'm thinking just to add a little bit more of my comment. If I wanted to enrich it and say, oh, but I'm starting to think strategically and anticipating their next move or anticipating maybe they're bluffing or maybe I'm bluffing and I want to see if they know that I'm bluffing, we add more dimensionality. And I guess in principle, again, you could still add that dimensionality in somehow in parity with parts of the hidden states. You see what I mean that I'm thinking, oh, they think they know I'm bluffing or I think they're bluffing. You're talking again about the same thing. You don't have to know what they had for breakfast. You're just talking about this extra information which isn't in the board itself, but that gets a bit more complicated. Yes. Thanks. So that is a good answer. Other ones that people have written, thermostat is having the same dimensionality like it's a temperature value. It's a dimensionality of one. And so then the person's generative model is also on this one-dimensional temperature axis. And then they mentioned that it could be like categorical versus continuous. So without going too many rabbit holes deep, we could just say temperature in the room is a continuous variable. It's like taking on a number. It could be a decimal point number between any number, zero to infinity. The person's model might be the same dimensionality like one, but then they might be trying to make a categorical estimate. Is this livable or unlivable? Is it hot, neutral or cold? That would be like a three-state categorical model. So the generative process doesn't have to have the same dimensionality or continuous versus discreteness of the generative model. Second answer for same dimensionality. Controls for a car have two degrees, physical degrees of freedom. The acceleration, the speed, whether you can hit the gas or the brake, and the direction of the front wheels controlled by the steering wheel. So there's like a speed dimension and then there's a turning angle dimension. This is the model that people use when controlling a car at some level of abstraction. And then the acceleration dimension can be acted upon through policy selections like stepping on the gas and the brakes. And then it's like, well, those are two categorically different affordances. And then there's continuous variables inside of that categorical difference. And we'll talk about like hybrid control models with both categorical and continuous aspects in play. And in this case, like the speed of the car in the speed axis is the hidden state. The observable would be the speedometer. So again, the gas and the brake are policy selections that modify, let's say we're thinking about a train, no turning. Policy selection is influencing how speed changes through time. And if you have a speedometer that's accurate, even if it's noisy, it's giving rise to observations reflecting those changes in speed. But gas and brake do not change the speedometer directly. They're changing in a hidden cause, which is the speed. Different dimensionality. Rain comes from clouds. Simple mental model is that the darker the clouds, the more there will be. That's a two-dimensional generative model. So how much rain, how dark the clouds. And this also is going to speak to this map territory. Someone could say, well, there's the size of the cloud and there's also the humidity. So that's like maps with increasing amounts of variables. So no one's denying the territory of the actual chaos of the cloud. It's just a question of how much detail and what data we're actually treating as observables and what sophistication is being taken with unobserved variables with hidden states, which are modeled in the computer or on paper, but they're not directly observed. Then the mental model can be decomposed even more and then add infinitum. On the reality side, the actual generative process for weather is including all the butterflies. So that's a cognitive model that is two-dimensional. In this case, how much rain is expected and how dark is the cloud that one could make increasingly nuanced cognitive models. But whatever a cognitive model is being proposed, the generative process is something that's like totally different. Cognitive model, generative model, map. Generative process, niche territory. Plants do better when they have water and fertilizer. Generative mental model, leaf droopiness is a sign of moisture, but leaf size and color are signs for nutrients. So one could imagine like a causal graph where there's like moisture level is unobserved, but it is emitting leaf droopiness as a state. We're observing leaf droopiness and we're using that to infer moisture, whereas nutrient is an unobserved state to the visual gardener and then they're using two-dimensions leaf size and color. So this is a three-dimensional cognitive model. Leaf droopiness, leaf size and color are observables and then there's two unobserved cognitive states, nutrients and moisture. Whatever map gets constructed, that's not the plant. Now, the gardener might then want to take an informative experiment, being an optimal Bayesian gardener and then measure something, take something that was an unobserved in the initial model and then measure using LIDAR or like using some sort of sensor to understand and then that would make that data point an outcome related through some A matrix which we're going to get to later of the hidden state being like the true levels of copper. And then if we have an accurate test, we can start using the outcomes of the observed levels of copper and start flushing out our model that way. Joe? I chucked in a comment below. I hope it's showing up. The other part of the question was asking about the map territory debate and here's from the person who kind of came up with that dichotomy and the way we usually talk about it. He says that the map and territory should have a similar structure and I think that's really interesting to think about, you know, if I'm making, again, moves in a board game, I might, you know, make a reduction of their mental state. They may not, maybe I'm playing against the computer, for example, so I don't know that they have a mental state or maybe I'm playing against a person I happen to know as a beginner so I know they're likely to make naive mistakes or whatever, something like that. But I was just thinking like with regard to the leaf droopiness thing, you could have all kinds of really badly structured models of plant health, you know, like maybe relate to how recently you watered it or something. Well, that sounds good in general, but if it's a cactus, maybe that's like a defeater to your system. So the point being, it seemed interesting that your map should, maybe if it doesn't have exactly the same dimensionality, should still be a nice reduction of the thing you're trying to model, whatever that means. Thanks. One aspect on this is evolution, natural selection, dissipation in our world, sweeping off the table, entities that are failing to at least be adequate. So in a dissipative situation that we're in, the map has to be at least good enough, otherwise the entity will fail to exist. So that sort of closes the loop and is like, we're seeing the persistent entities that are acting good enough to navigate. And it's just interesting like how many, the allegory of the cave to the blind people and the elephant and realism, instrumentalism, and we've had many live streams on this, especially check out number 14 on Mel Andrews paper. The math is not the territory. So this one sort of explicitly jumps in right there and tackles this issue with the FEP front and center. So this is a great series of live streams. The dot zeros are background and context, where we just go over the paper in a small group, like one person or two or three people. And then the dot one of the dot two, we usually have more of an open participatory discussion and the authors sometimes to usually join. So everybody's welcome to participate in contributing to dot zeros as well as enjoying the dot one of the dot two. But just so you know, when you're looking in the live stream table, the dot zero is a good one to watch first, because it has like the background and overview on the paper. But then like yesterday in 45.1, we had Friston and car join to discuss just so that people can navigate the live streams a little better. So we can continue asynchronously adding more thoughts on like map territory and what papers and live streams and what are the possibilities for the action perception loop. But those are big fun areas, but we're going to continue with the questions. On the notion of surprise, is the agents perception not only affected by the influence of not only its environment, but also an agent's peers. So perhaps to restate isn't how is an agent's perception affected by its direct perception and its assessment of peers. So the agents peers are updating their beliefs in a collaborative fashion. What would be the extent of the update to an agent's generative model of perception if the agent witnesses the annihilation of one of its peers, for example, something like this. Okay, will to be some bird box not going to open any video links, perhaps we'll observe some annihilation like some animal attacks, one of the wild beast on the right, but then like the other one learns. So this is a great video. Definitely good to watch. So far in the textbook and indeed for the textbook, it's focusing on, just like we saw in Figure 2.2, it's like one entity in the niche. The niche can consist of others like me, other entities that I expect are similar to myself in terms of like the coarse grained cognitive architecture they have or their preferences, their affordances. They're able to do similar things. They want similar things. They have a similar history to some extent. So that is sometimes called thinking through other minds, T-T-O-M. And just to give one thought that anyone can raise their hand. If we were to accurately observe an agent, another peer, taking some action and then failing to exist, we're driving on the highway and then we observe somebody taking a policy to go off the highway and then we see them fail to exist. We could use that information to update our beliefs about the consequences of the action to drive off the highway. So I believe it would be possible to implement every variation of direct perception, direct perception of peers thinking through other minds, theory of minds, learning by example, imitation, contrarianism. Every phenomena could be modeled as just, it has to be whatever the specific model is actually about. Ali? I have a relative question about 2.5 and 2.6. Well, correct me if I'm wrong, but I understand that these equations describe the behavior of a kind of rational behavior, the behavior of a rational agent. Where does irrationality come into this? Or let me put it this way, can irrationality be a parameter in order to qualify these two equations, 2.5 and 2.6? Or seen from the other side, can we use these two equations to measure the extent of rationality as a measurement of rationality? Good question. Here's one quote and we'll come to equation 2.5 and 6 as well, because those are important. Distinguishing the generative model and the generative process is really important. That's the difference between the entity and the niche. It's important to distinguish them to contextualize psychological claims about the optimality of inference. To the extent these claims are valid, on a Bayesian view is always contingent on the organism's resource. So I want to step out too far, but rationality and irrationality, it's rational to believe in X. It is irrational to believe in X. It's always subjective from that speaker's perspective. Given the priors and the update rules, all there is is rationality, which is to say this is just the Bayesian updating process. So there are maladaptive priors, ones that trend towards dissolution of the entity, but there aren't irrational priors. There just is what it is and how it updates. And then that could be inadequate. It could be any number of other things. And these are like getting into relatively complex cognitive phenomena, where there could be like multiple layered models. But I would say rationality in the Bayes process is like, the cognition is Bayes optimal or modeled with a Bayesian process. And then sometimes the interaction of the generative model and the generative process, as appropriately defined specifically, is going to result in what it results in. So let's, we will come to 2.5 and 2.6. Okay, here suffice to say, yes, social learning and collective behavior and emergent outcomes at the group level is something that's important, but we haven't addressed it so far in the textbook. And multi-agent models are not really addressed in the textbook that much because there's so much to understand about the kernel of the perception cognition action loop that even though it's so important for real systems, it's not brought up. Wanting to go to the equation, the darkroom problem for many years, many papes firing back and forth, but let's go to the equations for our, let me check the chat here, yes. Okay, variational free energy is minimized through two different possible approaches. Minimizing divergence of Q and P, changing mind and maximizing evidence, taking action, changing perception. This is done based upon prior and present information. Equation 2.5, variational free energy. To plan best actions, generative models and internal policies produce simulated outcomes that can be used to estimate expected free energy as the average of the log probability of outcomes. Equation 2.6, expected free energy and there's other variants. In the context of planning, planning as inference, equation 2.6 provides a view of EFE that establishes consistency in measurement units of exploration, exploitation. The relative balance between these terms determines whether behavior is predominantly explorative or exploitative. 34. In other words, dissolving the classic explore exploit dilemma in behavioral psychology. What causes this balance to change, i.e. what controls this balance? Very important and interesting question. Perhaps we can come to equation 2.5 first and then we can come to equation 2.6. So, where we're going is the future. The future has several sources of in-principle and in-practice uncertainty. Observations in the future have not happened. Actions in the future have not happened. And the causes of future outcomes and hidden states, actions, as well as the generative processes and dodgeness changes haven't occurred. So there's several like fundamental and in-practice limitations of the kind of precision that you can get on the future. That's planning as inference, it's prospective. In contrast, and that's expected free energy, because it's about the free energy of an expected future. Variational free energy, F, and expected free energy is G. Variational free energy, F is in equation 2.5. Variational free energy, as the question says, is based upon present and prior information. So, variational free energy is like now casting and verging into something kind of like memory as inference. We're integrating the past in the sense of our priors and the present incoming data point Y. And it's like a snapshot evaluation of optimal perception in the purely sensory case. And if we include action within variational free energy, it's taking one-step optimal actions. But those are not necessarily like the ones that's not planning. That's just like snapshot decision-making. In contrast, 2.6 will explicitly consider policies. It is a function of policy, Pi, and so it is including planning as inference. But first, let's look at 2.5. Jacob, or anyone else, would you like to just describe one pass? What is equation 2.5 showing? I guess if you mean like describing say the first line or the lines in a bit more detail, essentially it's the... The energy term is just the expectation of the log of the joint probability of the data and the observation. So I think... I don't understand quite why they changed the notation from appendix B, but why would essentially be the S, I think, and X would be the O? Yes. Anyone could be taking more notes on this, but Y is the observable data and X is the hidden state. In the POMDP that we're going to be getting to later, X out there, the hidden state is S and O are observations. So this is using more of a like regression familiar like framework and variable notation, but this could also be written LNP of O, S. That's the joint distribution. Joint distribution with a comma, conditional distributions with a vertical pipe. Yeah, and so we are also taking the entropy term is only based on the beliefs of our hidden states. So how people were saying before that the hidden states are unobservable, the only thing that we can say about the hidden states are the beliefs Q about the hidden states. I don't think that the hidden states actually can ever really appear on their own because we always need to have some kind of belief about them. And this was also related to a discussion we were having yesterday. That's interesting that they used the kind of switch between the notation of entropy with H as the functional and just the pure expectation with the E because the actual definition of entropy is the expectation of the negative log of whatever is in is the variable which in this case is Q of X. And that's actually how we get to the other two expressions with complexity and accuracy and divergence and evidence. And there's also a link, I think might be wrong with theoretical physics with where free energy is essentially the kind of cognitive statistical equivalent of the Lagrangian which is defined as kinetic energy minus potential energy. So yesterday, what that means in terms of the energy and entropy in this case whether the energy is perhaps kind of the immediate cognitive kinetic energy and then the entropy is like the potential energy or vice versa might be going off on a tangent at this point. Awesome. Let me just add in a few other notes. The variables F is a function. It's the variational free energy function. The arguments that it is taking in are Q, that's the variational distribution that we control. That's what makes it a variational free energy. Q is the one that we control. P is the actual probability distribution out there. Y is the data point. So we have on the left side of the equation variational free energy is a function of Q and Y. On the right side we have three equal signs and these are three values, three ways to phrase this one function. In the bottom here Yaakov has shown how this energy minus entropy can result in the second line complexity minus accuracy and in the third line divergence minus evidence. So that was like one really interesting thing is it's not like this is how it is and this is a transformation and then this is a transformation even in a sense that I was also true. It is also probably helpful to think about this energy minus entropy formalism for what it is which is the closest to a physics framing of free energy and then think about these two more statistical ontology ways of framing it complexity minus accuracy which is like very commonly brought up in the context of model fitting and divergence minus evidence also and ya great so Blue if that sounds like a very helpful norm in the description we'll try to have like a description of the equation. The first row says this, this, this, this, this people can type it however they want and then of course we can replace terms that are in the ontology with the special at sign and that will facilitate finding equations that mention those terms and it will facilitate translating these equations and also the descriptions into different languages and human languages. So then we can work on it in this ontological space and then make sure that it's accessible to different languages for example. Then we thought about this example of like a ball and its location in a bowl. So there's a true location of where it is in the bowl X and then there's an observation of where we're observing the ball to be so and this also speaks to the quantum to classical handoff. If the bowl is a swimming pool and the ball is a bowling ball after enough time it's going to be at the bottom and not moving. If the bowl is like a molecular well and the ball is like an electron or like one atom then repeated measurement is going to be entirely different and as the ball in the bowl become more massive it approaches like a classical limit. Little hints at the continuity between classical statistical and quantum and thermo physics. But we thought about this ball being in the bowl. So we looked at two different cases which is when the ball is being observed in the bottom of the bowl. It's like given the generative model of the shape of the bowl and gravity which is like a potential energy function observing the ball at the center of the bowl is like strictly the most likely thing that could have happened and then observing it up on a side is like a less likely observation. Eric and then blue. Yeah. So you mentioned that f is a function and as they put it it's a function of a function which is a functional. So the way I think about this is that the function that it's a function of is the q. So the q is a distribution over your beliefs. So I think of that as being a function that you're trying to optimize. You're trying to well how do I figure how do I adjust my internal belief state the q. And that's what I'm optimizing in this equation and the functional says that's what I'm trying to optimize is that function and distribution. And then it breaks down into the two parts which is your generative model p which says well p says for any there's a joint probability of x x is the internal state variable that you have a distribution over that's q and so how do your observables and your internal states go together and then you have the entropy term which is well you want to have maximum uncertainty so that if you're not constrained by your observations you want to be as agnostic as possible about what the possible values are of your internal states or make that q as flat as possible under the constraints of the observations going through your generative model p So I guess that's the way I talk through this first line Awesome. Thank you Eric. Blue and then Brock So Eric that was super helpful I've been like since yesterday trying to figure out what exactly is meant by q here like I get like later on we're going to talk about q as representing beliefs but in like as it's given as it's laid out in chapter 2 like they don't actually define q at all so I am like having a hard time up here like they describe it much like right before they get into that box they describe q in section 2.6 as an approximate posterior approximate posterior of what and like should we denote q as the function of beliefs here even though like it's like they're skipping ahead and looping back at the same time it seems like or feels like to me q is the variational distribution where the form of the posterior has been chosen so exact base using matrix multiplication basically or sampling approaches to approximate Bayesian computation like Monte Carlo Markov chain those try to recapitulate the actual form of p so p is truly bimodal then exact base or sampling based approaches would try to reconstruct that distribution in contrast the variational approach gains tractability because you're choosing a posterior approximate posterior q of a form that you know is going to have a tractable optimization scheme and there's message passing algorithms that can do it computationally so if you choose q to be a Gaussian then depending on how you parameterize it it's either gonna like be seeking the mode and go to the hump that's bigger and center around that with a variance or it'll just spread out a lot and go over and try to cover as much probability density as it can by covering both humps okay so the variational free energy function f as a function of the function representing the approximate posterior q and the data y is equal to the negative expectation of q given the observations x times the natural log of the probability of the joint distribution of the data and observations is that the English translation of that equation I think I got it finally after like two days close close I would say you one thing I would say slightly different it's the expectation over q not the expectation of q we actually talked about that a little bit like here we had negative expectation over q and yes okay blue then sorry the ontology terms like don't match the terms that are given in the book and I put in a request in the questions here if we can append the ontology because they use like data is the same as like observations those are different words but there's they're similar yeah there's a lot of synonyms or synonymous use and so like I would hate to step on them but like also the terms aren't in ontology at all so maybe we should update the ontology to reflect the terms we will absolutely add terms into the supplemental or eventually into the core terms if that's required but just leave a comment or email active inference at gmail to add terms to the ontology if there's a term that you want to tag that's not being tagged I feel like you may have indirectly answered this just now but in the math discussion yesterday we were talking specifically about the q distribution of beliefs over the like if x is this uncertainty term this hidden state like what what value could it possibly have where is it coming from so q is this property of distribution of beliefs is x the belief value and that's being generated in the previous step I think we in the bowl marble thing we like my question was or I was just pointing out I guess that you know the hidden state is changing however it wants perhaps below our threshold of observation that's not reflected in the equation for obvious reasons and so but then where you know maybe it crosses a threshold and then we update whatever but where is that x coming from if it's a hidden state do you see what I'm saying is kind of an ontological problem there I'm trying to solve if it's a hidden state how can it have a value or be a variable if it's hidden it's a yes Eric I'll take a stab at that so as I showed in the pictures there's two hidden states there's a hidden state of the real world and then there's a hidden state of your generative model and the generative model is whatever the model is and that's however the model was built so once you have the parameters x then that's what you're working with until you change your model that was my yes and over evolutionary or developmental times you get swept off the table if there is an adequacy at least but in a simulation environment you could make the x and the x star to be anything but the hidden state is as modeled by the generative model of the cognitive entity so in the cognitive model of the entity in the bowl example they're modeling an unobservable which is the location of the ball and they're modeling an observable which is the observation of location so both of those are in the generative model and then the generative process would be like there's somebody with a magnet who's moving the ball around or something or it could be anything we're going to have five more minutes before we close this session so what does anyone see in 2.4 what is free energy minimization doing and what does it mean that variational free energy is an upper bound on the negative log evidence just one quick qualitative read is that this is what we would actually like to be minimizing this is the floor we truly want to reach would be unsurprised about observations however that is intractable because we don't have access to the form or the parameterization of the true generative process what is a tractable approximation approximate Bayesian computation is variational Bayesian inference so here's f q of y q and y and then this is a KL divergence one vertical pipe means conditioned on like x conditioned on y x given y here probability of a hidden state being the case given a data point or the other flipping which would be the probability of given data point given a hidden state two vertical lines is in divergence notation and that's the divergence between this and that between q of x and between p of x given y so that reducing this divergence which has very nice properties in terms of its implementation which is bringing us to lower our upper bounds on what we really want to be minimizing in a tractable way so that's one aspect on free energy minimization and then here is reframing the two ways that free energy variational free energy can be minimized through perception the updating of beliefs and we'll talk more about the continuity between like perception and learning when one observes the ball moving across the visual field is that perception or is it parameter learning updating the location of the ball that depends on the time scale and such or action can be taken so that the expected observations are changed to maximize evidence and then in the last minutes Ali asked can a hidden state be an unquantifiable quality like a quality quality or like an unquantifiable quantity well of course most qualities can be parameterized but perhaps we can think of a hidden state as a pure quality that cannot be is fundamentally unquantifiable and we cannot parameterize it I mean can we think of such a hidden state Brock are we just going to say the probably the quantifiable way to state that question is like are there intractable hidden states which the answer to that is like obviously yes can we get some estimate on it yes but then is that commensurate with the I you know and then is that mean it's a quality and not a quantity I don't know and also to the earlier discussion about like the dimensionality of the generative model or the generative process the unquantifiable part could be like the baby's happiness but then we're able to still have an inference on the happiness as a parameter X the underlying state that's giving rise to the different sounds without necessarily knowing like the distributional form of happiness but we have to have a distributional form for the variational approximation of happiness Joe I was just going to say I mean I think that this qualitative versus quantitative distinction is interesting if you're thinking about like a light source you know it's a source of electrons well you could count the number of electrons but whether or not there is a light source there a source of electrons in this framework and I'm proposing you're not counting that you're just saying there is a light source or some other dimension the thing we've been calling dimensions you know yes you can count the number of dimensions but given that they're all different they're not really relatable to each other thanks yes very interesting so in our final seconds we'll just point towards 2.6 which is expected free energy G and it's prospective which is going to enable inference on action X hidden states the world conditioned on policy selection and that is what is going to allow comparison of different policies which are sequences of actions over a given time horizon so minimizing the divergence between two things minus some things reframing it as a few things so we didn't go as much into 2.6 as we did into 2.5 because 2.5 is an important precursor and simpler cousin of 2.6 but it's like super important and we especially to those who are here and those who are not here we would really love over multiple cohorts when everybody will hopefully be able to join and rejoin as participant as facilitator to be able to annotate all the equations with what they mean because what is there to be said about expected free energy without some understanding of what this is actually discussing what variables go in, what is it doing, how are some of them being related so we want to learn this and we can do it collaboratively if people just pick and choose affordances they see annotate some things and then they can highlight it and say I'm not sure about this but I just wanted to add it and so on like we're all there to help each other learn and improve our epistemic niche 2.6 is expected free energy there's further discussion on expected free energy showing how it's composed of these five sections and that by like leaving some out you get different special cases that are quite important for example leaving out this you get this now just one you get 2, 3, 4, 5 and so on figure 2.6 and then there's a summary of the low road to active inference that's chapter 2 so this concludes chapter 2 and our discussion in cohort 1 in the coming two weeks we're going to be discussing chapter 3 the high road to active inference it's going to take another tack and we will go through it in the coming two weeks addressing people's questions but hopefully we can continue answering and addressing just wherever we see fit no one's going to do it all but we do need everybody to do some hopefully we're going to now close recording we'll then take a one minute break and then in this room we're going to continue with dot tools if you want to continue discussing the textbook or anything else then head up to a room up and to the left but in this room in one minute we're going to continue with dot tools so thanks everybody see you soon