 Hello. It is September 23, 2022. And we are in meeting four for a cohort two of the Active Inference textbook group. We're having our first discussion of the chapter two of the textbook. So before we jump to the book and assess where things are at with questions, does anyone wanna provide any overview thoughts or experiences that they had like while reading or rereading chapter two? Just overall anything about chapter two? There's a lot of math. Okay, Blue, thank you. Anyone else? We have this questions table and none have added any questions specifically for chapter two or if someone did and put them somewhere else, just let us know. These are really important. We hope and expect and prefer that people do have questions while they're reading the text. And this is a really essential learning practice and it helps improve our epistemic niche. It also contributes to future cohorts and future learners when we can have, again, every single kind of question from the directly related, what does this word mean here to the tangential, every kind of question that people have, it's a huge service to actually just add several of them to this table. So we don't have any for today and that will free us up to give an overview of chapter two, walk through it and kind of connect a few dots perhaps and that it'd be awesome in our second discussion of chapter two for people to have noted down a few things that they were curious about including during this discussion, people can add directly or like write down somewhere else and add later. But this is like a really important way to structure our learning individually and collectively. So we're gonna go into chapter two and walk through it, paying attention to the formalisms and also to a lot of the work that has been added through the math learning group and through various people's efforts to improve the legibility of the math because it'll be like either a first coat of paint or a refresher for truly everyone. So everyone's gonna be approaching this and trying to learn the math in different ways. Let's start with figure 1.2. This is the high road coming from the top and the low road coming from the bottom. Chapter two is going to be leading with the low road. Chapter three is going to be presenting from the high road. So chapter two is going to begin with an overview of Bayes' theorem, Bayesian statistics and then make its way up through some of these terms described here and reach that kind of common meeting point with active inference. Before we flip to chapter two, does anyone have any just general comments on the low road? The chapter begins with this quote from William James. My thinking is first and last and always for the sake of my doing. What does anyone think that means or how does it reflect on them? Why is it here? Anyone can just unmute an address or raise their hand. I suppose it's making the connection between perception and action there with the thinking and the doing which is important in chapter two. As the name active inference would have it, there are going to be many connections between cognition and inference and action like the embodiment of maneuvers in the real embodied world as well as even ways of thinking about mental or cognitive behaviors as actions like attention as action. So that will be a thread that continues as we continue to build on active inference. The introduction paragraph is first providing a historical view that even qualitative and pre-1900 views encompassed perception as an unconscious inference and as something that was being generated subliminally and updated and then kind of presented to awareness. And then the authors place the Bayesian brain hypothesis as a kind of formal realization of some of those models of perception and action and explain how active inference is going to be extending those ideas by doing several things, not just considering perception as unconscious inference but also considering various other cognitive features like action, planning, learning as Bayesian statistical issues in how we model them. And then by finding approximations and heuristics that help us address problems that might otherwise be intractable. Perception as inference. Here the Bayesian brain perspective is brought up and does anyone wanna like add anything that they've heard or what the Bayesian brain kind of brings up for them? One key feature of the Bayesian brain that they're going to point to is how it contrasts from a kind of outside in signal processing purely recognition-based model. So we can think of that model as being descriptive statistics and also being compatible with frequentist statistics like we're getting kind of a blurry picture out there and then we're gonna do descriptive statistics, frequentism and get some sort of sharpened realized image like the camera filters a little blurry and then we're just gonna post-process the image and that's gonna result in the final image. So it's outside in. In contrast, and this is not novel to active inference this is being described as part of the Bayesian brain hypothesis, perception is not a passive outside in process. It is a constructive inside out process in which sensations are used to confirm or disconfirm hypothesis about how they were generated. The way that that plays out in Bayesian inference is with kind of a tale of two densities. Density being another name for a statistical distribution is like a density and the tale of two densities was a paper title by Ramstead et al. It was an early live stream that we did and the two densities are like the forward and the reverse model which are used in Bayesian statistics as well as expectation maximization and other algorithms and here's what those two densities are. There are the density of observations that's like the observables in the sensory states and then there's another density which are like latent states or hidden states that generate those observables. So if we were looking at a red ball, then there'd be like the visual sense states of the location of the ball and then there'd be like a hidden state of the location of the ball. So thermometer giving us readings of the world and then the temperature in the room which is an unobservable. And so we can take thermometer data and we can use that to fit a temperature hidden state and also from a given hidden state we could use a generative model or what is sometimes referred to as a forward model to generate predictions about observations. In contrast with descriptive statistics which is taking in observables and kind of distilling them and then saying, well, the mean was 11 and the variance was four. If that's the end of the road for that modeler that's the end of the road. But the Bayesian approach says, okay, we've got a mean and a variance and now we're gonna be able to generate data with mean 11 variance four. So that's the tail of two densities is the way that these two generative processes and a recognition model are linked. To operationalize that a little bit more the state of the hidden variables a priori meaning before is called a prior. The prior is then going to be updated by observations. And depending on a few factors which we're gonna go into you can imagine one extreme case where whatever observation comes in you just instantly update your prior to that. So then it becomes a posterior like after the fact but the posterior of that first moment it becomes the prior for the next moment. So it's kind of this unfolding process. And again, one extreme case would be whatever comes in from the world you just instantly update your prior to that. So that's like a zero memory just instantly updating to that incoming data point. Like what's the average height in this classroom? Well, this child was four feet tall. Okay, the average is four. Now there's five feet. Okay, now it's five feet tall. The other extreme would be like very much denoising the observations by having a really stable maybe even stubborn hidden state estimate. This relates also to attention because when we pay attention to observables we update our priors more. When we're not paying attention observables coming in don't update our hidden state estimates. Bayes rule tells us how to combine different elements, the prior and the likelihood such that a prior probability becomes transformed into a posterior probability. So you have some a priori estimate of how likely it is to be raining at that moment. Then you're gonna detect wetness on your skin or not. And then there's gonna be a posterior probability update. Bayesian inference is a broad topic. And so there's many resources in place to learn but it's also a very important thing to know in active inference. Box 2.1 covers some of the underlying mechanics of Bayes theorem which is like very simple and top logical yet very powerful in its application. There's an example introduced here that's going to come up several more times in the book. And does anyone wanna like explain this example with the frogs and the apples like what is happening here? So the situation is that there's a person holding something in their hands. And there's only two options for what it could be either a frog or an apple. They also, this is their prior belief. There's a 10% chance that it's a frog, 0.1 because we scale probability distributions to like a total of one because something must happen. But a one for a statistical distribution is like 100%. So there's a 10% chance of it being a frog and a 90% chance of what they're holding being an apple a priori. That's their prior. Then they have some sort of likelihood model. The likelihood model describes for apples and for frogs how different actions are emitted by them. So apples jump 1% of the time. That's the probability of action equals jumping vertical line means conditioned upon. So the probability of jumping conditioned upon the hidden state being an apple is 1%. The probability of no jump conditioned on an apple being the hidden state is 99%. So apples rarely jump. In contrast, the probability of jumping, observing a jump when the hidden state is a frog is 81% and 19% for not jumping. So a priori, there's a 10% belief that it's a frog, 90% that it's an apple. That's representing the uncertainty around what the object is. Then a jump is observed just intuitively even without any equations. If we know that frogs are more likely to jump than apples, observing a jump should increase our favor of that being like a jumping object. And so that's exactly what happens and that's what the Bayesian statistics describe. It's a situation where you have a priori belief about hidden states that are not directly observable and you have a likelihood model of how observables which is here why how observables map to different hidden states. And then through the observations you update and realize a posterior belief about hidden states. Does anyone have any thoughts or questions about this? We're gonna continue on the low road. This is a method called exact base. So this is kind of like a plug and chug method where you can take what is described in box one and apply it to the values that are presented here and you get the exact calculation of different variables depending on what you're trying to find. You're gonna be able to calculate basically every variable, every unknown you can kind of isolate for and solve. This is called exact base. Exact base has a clear intuition for small state spaces. So when it's like there's two options and there's two for hidden state and two options for behavior it's quite easy to calculate. For large state spaces it becomes or unknown state spaces it can become intractable. So that's gonna motivate some of the heuristics that are brought in later. Table two again people can just please raise your hand if you have any thoughts or questions or wanna like dwell on it at a part. Table 2.1 is going to be a Brock yes. I just wanted to, if anyone is feeling like, so blue is, she said there's a lot of math in this chapter and she's one of the co-organizers of the Act of the Prince Institute and this is my second time through. So if you're feeling like this is a lot of math you're not alone but that's okay. Um, I just, I don't know, just wanted to say that like it's, I think the way this is presented here is really helpful for people that have a grasp on some of the other even simpler parts. Like the way that you made this distinction between the generative model being a thing that's producing predictions and that like that's a really important thing to understand the Bayesian brain versus this sort of bottom up predictive sort of machine thing. Part of what is happening there though that I guess we're gonna get into a little bit more when he's talking about hidden state spaces one is that like what that all depends on when he's going through this apple and frog example is that there's a conditional probability here that like the reason that you believe that the frog is jumping or whatever or not or might jump is because it's a frog and that may seem like well obviously or like so what like that, you know but if it's something a little less, you know, I don't know contrived or something like what is Brock gonna say next or whatever? Like then it helps maybe to have more priors on what I'm, you know, and what I generally say, you know so I guess what I'm saying I'm trying to draw some attention to this conditional probability it's kind of like a simple concept but it's kind of what makes that whole thing work at all is that there are probabilities that are kind of like strongly related and it may be part of a hidden state or whatever but that's kind of what is, if there weren't any conditional probabilities none of this would work and that's kind of the magic simple secret sauce there that's getting traction on it so that's helpful or more confusing but Thanks for the comments, the personal comments Brock and also yes, the conditional probability this vertical line does so much what's the probability that the giants win against the Yankees? Well, in my view it's 50% now conditional upon it raining maybe that doesn't matter in your generative model conditional upon this what if I told you that it was seven nothing in the third inning like all these situations where information comes in and there's information that doesn't update our prior like if you give a piece of irrelevant information then it doesn't update our prior so like conditioned upon the irrelevant information then we're not updating our hidden state estimate whereas there's other information that comes in that is relevant and that's gonna have to do with the sparsity of how the variables are connected and then those conditional estimates help us reduce our uncertainty what's the overall probability that somebody has a medical degree in the world it's just the number of medical degrees in the world divided by the total population of the world conditioned upon being in this situation what is it? And so that can be a different number. Table two one gives a little bit of a statistical detail on several different forms of distributions. Now again, even for people who have taken several courses or even years of statistics these are not trivial concepts. The piece though that we'll just highlight is what a support means and what surprise means. Support is describing where that distribution is even valid or definable. So there's some distributions like the Gaussian that have support over all real numbers that's what the fancy R means. So like a Gaussian the kind of archetypal case is it's a bell curve centered at zero and it goes off to infinity in both directions with vanishingly small numbers but it has support everywhere. So that's the Gaussian. In contrast, there's other distribution families that for example can only exist on the range between like zero and infinity. So that would be like waiting time between events. It can't be a negative number. So the Gaussian is like an inappropriate distribution family to even describe situations that can't be negative. Surprise is a formula that for any given observation helps you determine how surprising that data point is. So we have the children in the classroom and the average height is four feet plus or minus one foot and we observe a new measurement and it's exactly four feet. That is like the least surprising observation possible. And then as it gets further away from four like the standard deviation increases, it's more surprising. And so these formula describe for each distribution family how to calculate surprise. So that's just the surprise. How surprising a new data point coming in is observation given the observer, given the hidden states as they're estimated. Okay, so that's one notion of surprise. That's kind of like new data point comes in. How surprising is it? There's a second notion of surprise that's called Bayesian surprise. Bayesian surprise is how much your prior moves following an observation. So you could imagine, again, we have a four foot estimate for the average classroom and data points are coming in and we're surprised or not. That's regular surprise. Bayesian surprise is the difference between the prior and the posterior. So we could get seven foot measurements again and again and again and again but because we're super stubborn in our prior we're still keeping that four foot estimate. So those data points are surprising, surprising, surprising, surprising, but the Bayesian surprise is zero because the prior is not being updated into a new or different posterior. That's a little bit on the difference between vanilla surprise, naked surprise and Bayesian surprise. This is specifically within the Bayesian statistics updating framework. How do we quantify surprise? Well, we have these formulas. You just plug in like the observation and some of the parameters of the distribution. The parameters are like the values that make the distribution the way it is. So for a Gaussian, there's two parameters like mean and variance. Where's the mean of the bell curve, the average and how wide is the bell curve? Those are our parameters. Surprise can be calculated from the parameters of the distribution and the observation with these equations. How do we calculate Bayesian surprise? This is going to introduce an operation that may be familiar to some or not, but it's all good. It's called the KL divergence, Colbeck-Liber divergence. And this is what it looks like. Again, these are topics where one can go very, very deep and learn a lot. So just treat it like many coats of paint. You're gonna know it's a KL divergence because it's DKL. Just like if it was like F of X is like a function of X. DKL brackets means that we're gonna be talking about a KL divergence about what's in the square brackets. So at this point, I'm gonna go over the equations. I'm gonna find equation 2.3. So we can see it here. I'm also gonna open up the description. So this is really helpful. This is a super important and valuable contribution for people who can to create natural language descriptions of equations because it helps get us all on the same page because otherwise this is like inscrutable. Again, even for people who might know the area because the notation can be slightly different with different letters. Like what is Q and so on. So the KL divergence is going to be hinging upon these two vertical lines. When it's one vertical line, it's gonna be used in a lot of different statistic situations to mean conditional upon X. X line Y, X conditioned on Y. The two vertical lines are going to be what the KL divergence is being calculated between. So this is describing the KL divergence between the Q and the P distribution. Here's Q of X. This is a function. Here's P of X. That's a function. Specifically for Bayesian surprise, we're talking about the KL divergence between the prior and the posterior. So the KL divergence between the prior probability distribution Q of X and the posterior probability distribution P of X. And then the second part here. What does that mean? Equal sign with a triangle is by definition. That's what the triangle means on top is the expectation that's fancy E means expectation over. So it's the average value of Q of X of the natural log of Q minus natural log of P. This part's a little bit less important, a little bit more detail. But this is where we're gonna see it mostly written out as with two vertical lines and the KL divergence is measuring the difference between them. If they're the same distribution, the difference is zero. If they're very different, the value is gonna be high. It's kind of like how much earth do you have to move to reshape the distributions to look like each other? It's not exactly that, but that's kind of what a KL divergence is. So here we can think about the Bayesian surprise of that frog jumping. Brock, yes. I was just gonna say, I guess this, those details I think this natural log Q of X minus natural log P, that another way to say that is that that's information. Those are gnats or like a form of bits kind of. So that's the expected difference of information. Is there zero information or is there some change in other words between the P of X being the prior and the Q of X being. So that divergence when he's about to go into is, that's what we're talking about is like how much space is there informationally between those two things. Great. So a gnats is a unit of measurement for information. A bit is a binary bit. That's zero one digital computers use bits. So people are pretty familiar sometimes with a bit. A gnats is a unit that's not based around zero one. It's based around the natural log and base E instead of like a base two binary system. So it's monotonically related to bits which is to say like they're both gonna go up and they're both gonna go down but they're not exactly comparable. It's like inches and feet. Here when we were calculating just regular surprise. So how surprising the observation is that could be calculated in gnats. Was it a surprising observation or not? Did it surprise me at all? That can be calculated about surprise. Here gnats are also used to calculate the informational divergence between the prior and the posterior. When we're thinking about Bayesian surprise. Box 2.2 introduces the fancy E notation. Fancy E is standing for expectation. Now, sometimes conversationally expectation is talking about the future. Like what do you expect the weather to be tomorrow? In the statistical sense, expectation is the average expected value. It doesn't necessarily refer to the future. It could be an expectation about a future time point but also alone it is only describing basically the average of a distribution. So the expectation of a Gaussian is the center of the bell curve not like what one believes it's going to be in the future. But if someone said what is the expectation of your in your model for what the average is going to be in 10 time steps that can be addressed. So it's compatible with the way that people talk about what do you expect to happen tomorrow but just be aware that expectation alone does not refer necessarily to a future time point. It's just referring to a statistical average and one note, again, maybe familiar to some maybe the first time hearing for some but it's gonna come into play in this book is there are discrete state spaces and continuous state spaces. A discrete state space is one where values can only take certain like clickable values like zero, one, two, three, four or it could be point one, point two, point three, point four but it's like there are discrete values on a grid that can be achieved like the two options for what the thing in the hand was it's a discrete state space whether it's two or two million it's a finite set of definables and there's no in-betweens. In contrast, continuous state spaces would be like a continuous variable describing the temperature. Now one could also describe temperature with a discrete state space. In my model, I'm describing temperature as either 20, 21, 22, 23 only integer values are allowed for my model. There's advantages and there's challenges for both discrete and continuous state spaces and this book is going to be like juxtaposing them and the chapters even later are gonna be like kind of alternating between them because there's active inference in discrete situations discrete state spaces, discrete time and then there's also active inference in continuous state spaces and continuous time. Okay, so that was very low down on the low road. Nothing of the above was like about active inference or even many of the ideas that often bring people to the table. That was kind of like an overview on some Bayesian statistical ideas. Bronwyn? There I am. This goes back a little bit though. I just wanted to maybe ask these questions before we move on. One was in relation to conditional probability or the role of conditions I think Brock sort of talked to it a little bit. Is that with the conditional thing like, in Melbourne tomorrow there's the grand final. So you've got two teams and you said there's a 50-50 chance but it depends on what conditions. Those conditions are they mostly subjective? I mean they can be subjective and they can be factual as well. So yeah, I suppose it's the question is what states of the conditions are there? Can they both be, are they mostly subjective or can you have the fact that frogs jump as that's what frogs do? So that's what generally happens. So that's not really my subjective opinion of frogs but does my subjective opinion put some condition on it? Does that make sense? Yeah, great question. These models in their construction, like what their overall structure is and then in their parameterization like what the values are, are subjective, they're modular dependent. And so we might have different perspectives on the overall likelihood of one team to win and then somebody might say it's gonna rain tomorrow or it is raining out. And then in your model, you had that that's gonna update your prior but maybe it doesn't update my prior. So it is even when it's like frustratingly obvious it is still also subjective and multi-perspectival. Okay, great. That's what I thought. And then that leads on to the Bayesian surprise, the two differences between surprise and Bayesian surprise and that our Bayesian surprise actually operates in the world so in relationship to that, like I might say Geelong's gonna win tomorrow and even when they've lost, I still think they've won and you can't convince me of anything other than that. Is that where my prior doesn't change? Yeah, so how does Bayesian surprise itself? Well, can you give us some examples of how that might operate in the world, in the real world? Yeah, it's an awesome question and to connect really surprise by itself so how surprising a single observation is independent of how it changes your belief. You have a stereotype about some sort of situation. Something surprising happens but then the Bayesian surprise is zero because it's a recalcitrant belief that the person has. Yeah. In contrast, there might be a surprising event that induces a Bayesian surprise because someone's like, whoa, I thought about it this way and then I was surprised by this observation and that brought me over here. And then neither of those are the experience of surprise but that's an area of a lot of research and discussion which is like how do these concepts in statistics like beliefs, surprises, preferences, how do those map onto our psychology and phenomenology? Is a preference in active inference like I prefer A over B? Is it expectation related to what people use every day? And that's kind of the fun part with the active inference ontology is some of these terms are not used in day-to-day conversation. And so it's like a blank campus. But in other situations, these are terms that are used day-to-day, ambiguity, attention, belief. And so are we using them in a technical sense that's compatible or isomorphic with the way that it's used broadly? Is it incompatible? And so on. That's interesting. I just had a different perception of Bayesian surprise then when you said surprise is always surprise. But Bayesian surprise maps onto, I think you said something like the psychology or the preferences or whatever. So it's not really surprise. It's how we manage surprise. Could it be like that? It's what sort of precision we have on things so that we recognize something or we don't recognize it. Is it something like that? Excellent. And the connection to precision is also technically correct, which is that when observations are coming in and we're treating them precisely, we do have Bayesian surprise because we are updating our priors. When we have imprecision about observations, we do not update our distributions of beliefs. It's like, oh, that's just a bad thermometer. I know it says that it's, I believe it's 37 in the room. The thermometer is just weird. It says 65, but don't worry about it. So then that's a surprising observation under the belief that it's 37 plus or minus one, but it doesn't induce a Bayesian surprise because we haven't updated our prior. Prior, yeah. Okay. That's interesting. Yep. Awesome. So just to kind of... I thought it's... Yeah, continue please. It's also interesting in that I watched that, that the podcasting with John Veracari the other day and it was interesting out of there that just the language, the use of language and then you were just talking about the ontology and the different uses and are we actually considering the words that we're hearing as they actually are? I think that's really, yeah. That's really interesting area with all of this because, and it makes it really... What I've noticed in what I listen to and my studies is that it's like you have to keep going over and over and over until someone says something like you just said then. Okay. Incredible. Incredible. Thank you. That's the relevance realization in a way with language. Yeah. Just to kind of... Just to kind of... Oh. Sorry. It is interesting. So we pay attention to it. We update our priors, we learn and we deal with the surprising observations and we deal with the Bayesian surprise to adjust our regime of attention and action along all these ways and we meme and connect with the act in fontology rather than spiraling off into uncertainties that aren't relevant but that's not to banish uncertainty like from the realm. It's to have the right course grainings that help us deal with certain features of action and perception. Here's the subjective questions coming up. Bayesian inference is optimal and this is also the root of like a million discussions. How can it be optimal to crash a car? How could it be optimal to be wrong? I mean, we were talking about how could I be wrong if betting is always optimal? How could I be wrong? Inference is subjective. The results are not necessarily accurate in any objective sense. Bayesian cognition and optimal Bayesian calculation again, doesn't mean society accepts that neurodiversity. It means that given that person's generative model and the observations coming in or their body as an embodied generative model and the observations and constraints coming in that is its own local optimality. And then there's secondary discussions around like is it adequate? Is it best amongst a family of counterfactuals? Does society accept that? And those are discussions to have but that's irrational will not be found here. And that's a very interesting angle that kind of reframes a lot of again, rationality versus irrational thinking. Yeah, and this is a second reason for optimality being thought of as subjective. Okay, here we're gonna get to figure two-two which is a schematic that's gonna be very different remixes of this schematic will be found in many places. Here is our model of temperature in the world. Here's the real temperature in the world, hidden state. Here's our thermometer, observations coming in. We're updating and then we're taking actions which may over in some way influence the state of the temperature in the room. So this is like the cybernetic loop. This is the control theoretic loop. This is describing a kind of generalized, again not active inference unique way in which cognitive entities take in observations, juxtapose them with a generative model and then do action selection to influence the hidden states such that the observations are brought into alignment with their preferences and none of the things. Here's a short discussion that the state spaces don't have to be the same between X and X star. So temperature in the real world might be like a continuous variable whereas we might just have a binary state. Is it too hot or not too hot? Or you might have a three state, too cold, just right, too hot. Or five states, getting too hot, actually. And then you might wanna do model selection across two, three, four, five states internally. And none of that hinges on quote how it really is. So there's a lot of degrees of freedom in generating cognitive models and their structure doesn't need to map on to the neuroanatomy of the brain. It doesn't need to map onto a physical mechanism about the cognitive system, which you may or may not even know, and it doesn't have to be isomorphic the same structure as the hidden states of the world. Okay. All of this so far has been purely inferential. Brock? I just wanted to, I don't know, leave a bookmark here. Stigmurgically. Did you mention something about neurodiversity there in the context of the Bayes optimality rabbit hole? And just wanted to remind that like the origins of this come from psychiatry and the impetus for it of trying to have a causal reasoning tool for addressing certain kinds of mental illnesses. So it really, it doesn't, yeah, it just, like you said, it doesn't have to map to a specific architecture of brain or arrangement of this in the world or whatever it's, you know, it is very much like based on the subjective model that's already in there. And it could, you could, seeing the whole thing perhaps like reason about how it might make sense from their perspective or many different kinds of priors or whatever and why that would be optimal for that model. Like or seem, again, just, yeah, so. Great comments. And to kind of just foresage chapter nine, this is the metabasian. So it's the modeler's subjective model, the psychiatrist's subjective model of their partner's subjective model. So that is the genesis of so many discussions around the map and the territory and the fallacies and the fallacies upon fallacies and the cause and effect and all these different discussions. And this is how this modeling framework wraps that up. It's a subjective model of a subjective model. And that double subjectivity opens up in unbelievable amount of degrees of freedom in modeling, which then refocuses the imperative towards selecting from these models. But that's a little bit of a later point. Just to in the last minutes, because again, next week, hopefully many people will add simple, tangential, advanced, formal, any kind of questions. But let's just in the last three or four minutes just see what else is gonna happen. All of this up to the point has been inferential, Bayesian brain, hashtag live stream 43. They're gonna extend that inference about temperature, so to speak, into include action. How that happens is at the heart of active inference. In the next sections, they're gonna clarify the quantity that active inference agents minimized through perception action as variational free energy. One note, minimization and maximization are like twins of each other, because with just a negative sign, they can be the same. So when you see minimization or maximization, you can just think of like finding relatively higher or lower values. But like whether you choose to think of like biologists talk about climbing Mount improbable, like a Dawkins book. And a physicist might talk about finding the bottom of an energy well. But those are like going to be seen as kind of not even like just sisters or twins of each other, but like they're actually kind of the same thing through a negative sign. There's two ways to reduce free energy as we will all come to learn and love. You can update your model, that's learning and perception, and you can act in the world. That's action, action and inference. Active inference. First, you can learn. This is a variant of figure two-two, where now we're being able to see how the prediction and the observation are being put together. There's some discrepancy. In any realistic situation, there will eventually be a discrepancy. And then here are those two roads to reduce free energy. Change beliefs, which can be perception. Perception and learning are much more similar in active inference than in other frameworks. Like when the ball moves across your visual field, are you perceiving it? Or are you learning the position of it as it moves over rapid time scales? So they're kind of the same. And can you act? Here's how you can resolve that discrepancy. You can change your mind about what you're seeing. That's perception. Or you could act. That doesn't mean both of them are equally good or they're gonna result in your survival or they're plausible in any given situation. So it's always situational what affordances there are for learning and perception and for action. But this is the general framework. How does free energy get quantified and minimized? We'll return to this later, but equation two-five and two-six are gonna be really key equations. Thankfully, through the work of many people here, we have really excellent natural language descriptions connected to the ontology. We also have a lot of like notes and questions people have added and things like derivations. For those who wanna go that way that Yaakov and others have worked on. So these lines of symbols can be read this way. And with a mouse over, we can see what those are. So yes, it's like words, words, words, but we have a staged development around these equations because we're gonna be revisiting them and understanding how this F as the free energy that's being minimized, I mean, free energy principle, this is kind of what it's about. How can it be decomposed into these different sets of natural language terms and in what sense are these terms being used? Again, we don't have the minutes to describe, but variational free energy F is about the past and the present. That's figures, I mean, equation two-five. Equation, and this is what it looks like a little bit graphically. Here's variational free energy. It's the same figure that was two before, but now there's an equation here. When we wanna look into the future, there are gonna be several complexities arising, which is that we're talking about the surprise of observations we haven't observed yet, and we're talking about the consequences of actions that we aren't sure about what their consequences will be. And so variational free energy F, equation 2.5, morphs into expected free energy G, equation 2.6. Similarly, excellent work in progress as everything is to unpack what this is in natural language. There are some special cases for expected free energy such that when certain terms are ignored or set to zero, we get a lot of special cases that are in quite broad use. And then that's the end of the low road. Equation 2.5, variational free energy, past and present. Equation 2.6, expected free energy, G, future. That's the end of the low road chapter. So that has been this first discussion on chapter two. Hope that people have enjoyed and it would be extremely appreciated to write any questions that you have on chapter two so that we can, as people were adding during this conversation so that we can have some specifics to follow up on. So good luck with your learning in the next week and see you all soon.