 of these questions. I think we talked about the, oh, this is chapter two. Let's go to chapter four. Okay. So we talked about belief, policy, state. I'm not sure if we talked about this question. In the discussion of active inference in POMDP, belief updating about policies, we find that the posterior that minimizes the free energy does a posterior at time t. Oh, we did discuss this. Yeah, we did discuss this last time. So this question, I think we did not discuss Pi being a policy or a model last time, or did we? I think we didn't get to this. Yeah, I don't think we discussed that. All right. So we can open it up and we'll start here, I guess, with the most upvoted. So the question reads, what is Pi? On page 69, the author's right at each time step, the current state is conditionally dependent on the state at the previous time, and on the policy Pi currently being pursued, pursued. Then on page 71, they write, thus, we can interpret the priors of equation 4.6, combined with the likelihood of equation 4.5, as expressing a model Pi of a behavioral sequence. So which is it policy or model? And then they suggest a rewrite of the sentence. They say that thus we can interpret the priors of equation 4.6, combined with the likelihood of equation 4.5, and the transition probabilities, B sub tau Pi as expressing a model for behavioral sequence where the model is a function of policy Pi. And then there's some discourse here. So in this reframing, can we say that the model simulates the agent in the environment as if it had taken the actions in the policy? And I don't know, does anyone have any, I don't know who asked that question? Oh, Eric, that was your question. And then does anyone want to maybe take a stab at answering that question? Or we can continue reading what was written here. So what's written here says that on page 69, the authors also say policies here may be thought of as indexing alternative trajectories or sequences of actions that could be followed. On page 71, they define the likelihood in equation 4.5 as a matrix A that expresses the probability of an outcome. They describe the priors of equation 4.6 as the prior over the initial state, vector D, and beliefs about how the state at one time transitions to the state at the next time, matrix B. And they also say that the transitions are conditionally dependent on the policy chosen. Because of this conditional dependence, we can see how the policy influences the model, and why the authors use, may use these, the terms interchangeably. So does anyone have any comments there? I guess I would say that that answer in discourse kind of agrees with what I proposed, which is that strictly speaking, we should treat the Pi as policy, but it behaves as a model that once it's executed, it's implemented, then the model become, the model has become a function of the policy that's been we've seen so far. So I'd say that's, you know, that's compatible interpretation. So I think strictly speaking, the, it would be better if the text was consistent and called Pi a policy, and then say, yeah, the model's a function of Pi. That'd be my interpretation. Yeah, I definitely agree with that. I think that the model is policy specific. So, so, yeah, but they could do a better job of, I mean, the variables as we've seen are so ambiguous anyway, so it could definitely be a lot better. And by the way, about the topic of consistency, actually, I gave some thoughts about the issue we were discussing in yesterday's math learning session. And I also consulted some other papers and active inference. And in all of them, the notation for matrices and vectors is used consistently as in almost every linear algebra textbook. So I'm seriously beginning to suspect that every instance of a matrix or a vector not written in boldface is a typo, either in chapters or appendices, because otherwise, I really cannot find any justification behind using two different types of notation. So I'm going to use this assumption as my prior for the rest of the book, unless someone has any belief updating observation she wants to share. Yes, so just to summarize for the people that weren't there yesterday, we had a big debate about what does the bold typeface signify? And it wasn't so much a debate, but just like a mutual trying to come to a mutual understanding. So it's this question here. Does bold lettering mean this is a matrix, but seemingly also many non bold letters are matrix, or does bold lettering mean sufficient statistics of as as on these page examples? And it really was. Yeah, we couldn't come to a single unified answer there, because there are many instances where it seems like it should be a matrix, but it's not bold. And there are many instances where sufficient statistics like are used, but it's also not bold. So so yeah, we couldn't really conclude that. But I think I'll leave it that's a good assumption there. So we also talked about this a little bit last time, but we didn't really get fully into this question. What is the use of categorical distributions in equation 4.5? The second line is supposed to explain the cat notation, but I have problems to understand the advantage of the notation. Could someone explain it in simple words? Does anyone want to try to explain that? I can pull up the equation. Well, I think every regular matrix and vector is by definition an array of ordered elements. But in a categorical distribution, we don't necessarily assign any specific order to the elements. So the reason behind using the categorical categorical distribution might be to just jettison any orderness in the elements and treat them as just an unordered set of the probabilities. I wrote an answer to this one. If you want to pull that out. So I hope this is helpful. It's basically the way I see it is you have math, the math notation there. But when you need to actually implement that, you need to carry it out, and you need to turn that into operations you can perform. When you say that A is this categorical object, that means that you're turning it into an actual matrix with elements. And so the ordering is important, actually, because the ordering of the elements in the matrix says how that matrix is going to apply to the objects it applies to, which are the belief state S and then your observation O. So basically it says the way you carry out inferring what O will be, the probability of O given a belief state is you do a matrix multiply. And because in this example, the states are not continuous, they're categorical, then you need a categorical matrix, you know, elements in order to do that from the categories of belief states to the categories of observation states. And those are both distributions that gets transformed by the matrix. Thanks, Eric. That was super helpful. Does anybody have any further question or comment on this one? Sorry, Eric, but isn't the categorical notation special case of multinomial distribution categorical distribution, special case of multinomial distribution? And I don't know that. Yeah, because that was my understanding that the main difference between any categorical distribution and well, I mean, the lack of, for instance, if we, of course, you're right that in any matrix, we necessarily have some ordered elements in order for us to be able to compute the math on those matrices. But on the other hand, for categorical distributions, at least that was my understanding that we don't necessarily treat them as ordered elements such as, I don't know, the ordered vectors or ordered matrices as the normal matrix. Because if that was the case, well, we could just designate this distribution as just a regular matrix, right? So I don't get what's the reason behind using the notation here, a cat A or cat C here, instead of just using the name of the matrix. Steven, can you mute? You're super loud over there. Sorry. Yeah, I guess so. I mean, yeah, so the mathematical notation doesn't have an ordering, it just says distribution is related to another distribution through this, this, this relation. And it's when you get to the mechanics of how do you, how do you do that? That's where you care that you have the matrix with the vector ordering. So yeah, I don't know what cat A actually means, otherwise, other than it's just describing it. It's saying, I don't think it's like an operator, I think it's just saying what it is. That makes sense. Steven, if you're trying to talk, I feel like Steven is trying to share an idea with us, but is unable to communicate effectively through the technological affordances he's been given at the time. Like the gain on your mic is way up or something. So it's picking more background noise than you. So just looking at this equation, let's try that again. So just looking at this equation, can you guys hear me okay? Yeah, okay. So just looking at this equation. The second part, it's like the, it's like an ordered matrix, AIJ, where the probability of the observations are equal to index i and the states equal to index j, something like that. I don't know. It looks like it is ordered. Yeah, the matrix itself is ordered. Yeah, and so the second line is actually by how the authors are defining this cat A. So cat A represents this ordered matrix AIJ. Yeah, that's at least what it what it says in the textbook. So it says the likelihood expressing the probability of observations conditioned on time tau given states s conditioned on time tau is equal to the categorical distribution A. If I may Sorry, sorry. Go ahead. Sorry, my internet is a bit is lagging a bit. So there might be a delay and I might break off. But I just want to comment that I think that the cat notation is really just to like reinforce the the notion that it's in discrete time. But I think it's kind of redundant in this case because since we know that we are in discrete time, we could have just written A bold to to signify that it's a matrix and where it's a matrix, it's probably not going to be a continuous distribution anyway. So I think cat bold A and bold A is pretty much interchangeable. So I think that some of the new math is using some of the affordances from category theory more like some of the newer things and I think like the forward direction that Carl and Thomas and some of the people that really helped arrive and advance some of this math, I think that they're going to be leveraging category theory methods like the renormalization group and the pullback attractor more in the future. And so that might be also like part of the reason that they are moving toward this categorical notation. So just a little maybe foreshadowing there too. And may I just make one additional note about this categorical distribution? Well, the thing that confused me about this this concept of categorical distribution is the Wikipedia's definition of it, because here it says, there is no innate underlying ordering of these outcomes, but numerical labels are often attached for convenience in describing the distribution. So that was my confusion. I don't think it would be the first time that we've seen traditional mathematical notation slightly and somewhat abused in active inference, probably also won't be the last. But definitely I think it's a point that we can put forward for to the authors for additional clarification. If there's no further, anybody has anything else to say about the categorical distributions here? If not, we can move on to maybe the next question. So this next question reads, in message passing, is there a decay of information as the distance between the variable X and individual Markov blanket constituents increases? Is implementation of information decay in message passing an option for model implementation? I think that this is a super interesting question. And it's something that I've thought about before a lot. But but I think, yeah, let me open it up to anyone and see if they want to address it before we can get into some of the discourse. Well, doesn't that all all depend upon the chain of loss between one stage of the message passing and another? And if you have no loss, then you don't have any decay. But, you know, the more stages you have where you get a little bit of diffusion or loss, then you're going to have decay, just a function of the parameters. Well, so our message is passed within the same Markov blanket or between things that are partitioned with their own Markov blanket. That's another question. I also agree with Eric, as I mentioned yesterday. I also believe that, well, this message, this information loss is not accounted for at least inherently inactive inference formalism, because it depends on the specific situations that those information propagate. Because otherwise, we couldn't have any general or all encompassing formalism for accounting for these information loss. So we see in this box, they talk about Markov blankets, and the message passing. So here it's variational message passing. This involves messages from all constituents of the Markov blanket of X, including the parents via the conditional probability of X given its parents and the children. The latter depends on the conditional probability of the children of X given all of their parents, which include X. Note the expectation includes the children and parents of the children. As parents of the children X, we divide by Q of X to ensure the expectation includes the blanket only. So I think the messages are like all of the things inside of the blanket send a message to something else in another blanket. At least that's how I interpret this way of active inference. And so it would make sense that like from within the blanket, does everyone have the same? Is there like a high degree of mutual information shared in the Markov blanket? I think like conditional dependence is what qualifies as like conditional independence is what establishes the Markov blanket. So within constituents under the blanket, do they all have all of the same information all of the time? That's something that's kind of confusing to me. You know, one thing I'd point out about figure four point three. So in the previous chapters, you know, we have this picture of there being the world and your model of the world. And and that and those being and your model of the world is a Markov blanket itself and it interfaces with the world through the observation variables in the act and the action variables. This picture doesn't have that breakdown. So it's got state there, but it doesn't say what is that is that state all internal state? Is it external state or a mixture of them? And it apparently is mixture because you've got policy that can affect state, which means that you're going to have action is, you know, it's going to be part of the state transition, as well as and then you have the observations coming out at the bottom. But I think if we're going to talk about Markov blankets, then I would like to see this picture be exploded into the internal, you know, the state of the machine versus the state of the external world and how those interact with one another, because that's what the Markov blanket does, is it compartmentalizes the information? Well, and so that is the POMDP model and figure four point three. And this really figure four point four is like makes me scratch my head quite a bit. This is the image that is going to like represent a message passing in a Bayesian framework. And so the caption says on the right, dependencies between different variables in the belief updating scheme outlined in the main text. Intuitively current beliefs about states under each policy at each time are compared with those that would be predicted given beliefs about states at other times. This is one. And current outcomes to calculate prediction errors. These errors then drive updating in beliefs. This is two. Given beliefs about states under each policy, we can then calculate the gradients of the expected free energy. This is three. These are combined with the outcomes predicted under each policy omitted from the figure to compute beliefs about policies. This is four. And then using Bayesian model average, we can then compute posterior beliefs about states averaged over policies. This is five. Yeah. So and this kind of leads us into the next question. But like I am not able to really interpret this figure four point four in any better way. So if anybody has like a plain English description of how this works, I would love to hear it. We can also we can talk a little bit about the question that was raised about this figure and also about this like squiggles sigma because there's a great like I don't know. I mean, we kind of had a hard time figuring out the equation and the squiggles sigma and what that actually stands for. So we could move to that discussion because I think it's maybe related to this message passing discussion. But let's read the discourse here first on the message passing question. So in the discourse, what is the mechanism for signal decay? So this is information decay, implementation of information decay. Is the message a thing behaving as an active inference agent itself or a special piece of information not behaving as an active inference system itself? Is there a blanket impedance mismatch? I think Brock contributed this I'm not sure if he's here today. And then message passing as described through active inference usually is a hierarchy, not a lateral transfer like within constituents in the blanket. So the key thing here is how you define the markup blanket. Either there's a petition partition between particles cells or whatever or there's no partition in blanket and everything under the blanket is conditionally dependent. Does conditional dependence preclude message passing? Our message passed within nodes in the same blanket, I have not seen this and would love if someone pointed me to some references. So that's that was me I wrote the last part. Because I haven't seen message passing under the same markup blanket, it's always been like from one markup blanket partitioned object to a different markup blanket partitioned object. But that moves us into if anybody has any comments, feel free. Well, I would say message passing is used to do the inference. So it's within a markup blanket. I mean, that's that's pretty common, I think in message, in the way message passing is used. So here as it's shown in figure 4.4, like although this is totally like beyond me to describe here, this is a temporal message passing. So it's like from one time point to another time point. And then on the left, it depicts the hierarchical, like a hierarchical expansion, collapsing over time steps, but that a higher level network might predict the states and policies at the lower level, and use these to draw inferences about the context in which these occur. So the way that I've always seen the message passing used is from one time point to another or from one layer of a hierarchy to the next layer of a hierarchy. And I've not seen it passed within, like, constituents under the same markup blanket. So Eric, if you have references that depict like some kind of message passing in active inference that's not through time steps, or through hierarchical levels, I would love to see that because I have not seen it. Well, I'll just look at go back to box 4.1, where they talk about the variational message passing and how you you get information. I now would say within a blanket, you've got various nodes, and they have this hierarchical parent child relations. And in order to do inference about Hey, what's you know, what's our belief in one of these parents? You got to kind of go up and down and say, Well, what are the children of that? And what are the other parents think? So that's all within within a single markup blanket, the parents and children, the hierarchical relation, and the message passing happens to to circulate that information within the blanket. That's how I read. And I also put a link here to a paper by champion at all, in which they have actually explicitly defined the active inference or better to say reformulated the active inference formalism, according to a variation of message passing. And if you just give me a second, I can find exactly the place where they have stated this in plain English. Just you can continue with the other questions if you like. But I need a moment to check this paper. Sure, sure. Yeah, I'll check it out also. And yeah, thank you for that. So like when I'm reading this variational message passing, this involves messages from all constituents of the markup blanket of x. So the message is coming from all the constituents in the blanket. But it doesn't say where the message is going. So it does make sense like that. I guess that it that it's an interchangeable like, or within would be a better word than from like within all constituents of the markup blanket, because I was reading it as like the message is coming from everything under that blanket, which I guess is an incorrect interpretation. So thank you, Eric, for for pointing that out. I was like, what? So this is a really hard question that I looked to try to answer. Actually, a bunch of us looked yesterday a lot. So it says in equation 4.10, the sigma variable, squiggle sigma, we'll call it whatever, I'm not sure how to say it, is used to describe the difference between the natural log of observations conditioned on policy and preferences. What is the function of this variable? It also comes up in figure 4.4, the message passing figure we're just looking at. And it would be great to have a verbal description of this figure, what other papers or equations use this squiggle sigma. And we looked at length like through the entire textbook, through the other chapters that refers to like, we'll unpack this later in chapter seven, we looked through chapter seven, we looked through the appendices, we saw a variable that almost looks like that. But I think it's a gamma, it looks like a little bit more fancy than this squiggle sigma, and could not find that at all. So we went to unpack equation 4.10. And so here is the equation at itself. I'm not sure how to put this up next to the actually, we'll put up here. So and this equation 4.10 is a rewriting of equation 4.7 in linear algebraic form. So we tried to kind of unpack this a little bit. So the first line states the prior probability for each policy, this is that pi sub zero is equal to the softmax function sigma times the negative expected free energy g. And so that's this first line. The next line is the expected free energy conditioned on policy g sub pi is equal to the entropy or negative expected log probability h times the states conditioned on policy and time, which is s sub pi times tau or both of those sub pi and tau, plus the observations conditioned on policy and time this o sub pi times tau times the beliefs conditioned on policy and time, which is the squiggle sigma maybe sub pi times tau. So so we were kind of unsure if beliefs is correct here, we kind of pulled that out of the legend for figure 4.4. But if anybody knows, and just to maybe unpack that a little bit more, the beliefs conditioned on policy and time the squiggle sigma sub pi times tau is equal to the difference between the observations conditioned on policy and time o sub pi times tau and preferences conditioned on time c sub pi or sub tau. So we were unsure if belief is correct here also. And so if anybody knows if beliefs is is the difference between observations and preferences, that would be great to have some kind of feedback or input here because we couldn't really reach a conclusion. Well, I just I guess one one tiny step toward figuring this out, I would say is that that funny squiggle thing there has to do with the C. We've got we've got this probability of observations given C and C is this mysterious object that expresses preferences, which has also been under explained. And that's so if you look at the two terms of the expected free energy G is a function of function of policy 4.7. And then look at how it looks like in 4.10. That sigma thing there is is expressed as is a we've got the dot, which I guess is a multiplier by observation. So that's the same as that's like saying your probability of observation given C. So that's I guess the nature of nature of multiply version of the this log probability of sequence of observations given your preference preference prior C. Yeah, so they do unpack it a little bit more in equation 4.7, or it seems to be like maybe we could extrapolate from 4.7. If this is beliefs, because it is this natural log of observations minus the natural log of preferences, but but up in the top that just the free energy is the entropy or the maybe the dot product of the entropy times the states plus the observations times or the dot product of the observations and maybe beliefs here. So I've not seen like I can't recall a mathematical representation of belief ever using this squiggle sigma before and I looked through the recent Ryan Smith paper and I looked even at like the message passing paper by first in and I was not able to really get any additional like references to this squiggle sigma at all. So why do you say belief as opposed to preferences? Well, so it says here that the squiggle sigma is equal to the difference between the natural log of the observations conditioned on policy and time and the natural log of the preferences at at a certain time. So the sigma is defined by the natural log of the preferences, but it's more than just the preferences. Does that make sense, Eric? Yeah, but that's not the same as belief, right? That's that's observations versus preferences. Right, so the different like and generate from expected deviation or how much risk is being taken relative to the policy that's applied. So because I mean, it's the dot product is between the observation and that squiggle sigma, right? So which is in turn, like, okay, this is what I've observed. This is this is what I expected. So it's like a divergence between two. So how much more risk am I taking? And how far is just pushing me away from my freedom from minimizing free energy? So maybe something that whereas belief is wrapped up in the s because that's that's your model, you know, the world. So risk is maybe the how the squiggle sigma is defined. But then if you look into this figure 4.4, like that's that's why we used belief. So here, let's let's I'll pull it up right now. So I'm not sure how easy this is to see. But in the before we get to number one here, we have like at the second on the right hand side, the second to lowest level is states conditioned on policy and time. I think I'm going to read it off this bigger screen. But yeah, states conditioned on policy and time at different time steps. So like the the current time is in the middle, the forward time step is on the right, and the backward time step is on the left. And so that's what we start with. So intuitively, current beliefs about states under each policy at each time are compared with those that would be predicted given beliefs about states at other times. So maybe beliefs is this s will so and that's kind of what what Eric was saying. And we know, because it's defined earlier, this epsilon here is a prediction error, right? So so what we come to after step one, it says and current outcomes to calculate prediction errors, that's what we arrive at after number one. The errors then drive updating in the beliefs, that's number two. So here we go from the current state at at the present time to a prediction error that drives the belief updating at this forward time step after number two. And then it says, given beliefs about states under each policy, we can then calculate the gradients of expected free energy. Three. So so what gives us this gradient of expected expected free energy? Is it this s sub tie so s sub tau plus one or is that a towel? What I think that's a towel. See, yeah. So is it a future state not conditioned on policy? Or here we get to the squiggle sigma. So it's a squiggle squiggle sigma conditioned on policy and time plus one. So it looks like a belief update there, which is why belief made sense. So is that a risk update? Is that a possible interpretation there in this message passing figure? Well, they say it's a gradient, which means it's saying, how much do we have to change our policy? I think in order to get our objectives. Yeah, gradient makes sense to me as well. It's how much information you gained, right? So if you actually look at the change in the state with respect to, so I apply some say force or like, let's just take a simple example. Like, if you have an actuator, that just applies force. And all it does is move moves forward and has to follow a trajectory. So the error is so at each time step, if it's a domestic system, you expect it to follow a straight line and it's deviating away. So that's epsilon. So you have to update, okay, this is how far of I am from the state that I'm supposed to be. So I need to apply slightly less force or maybe force in a different direction. And that seems so it's also how much I have to overcorrect in the future, depending on what I've done currently, right? It's not just so if if I've applied some force right now, I might deviate in the opposite direction and I have to come back. So I have to figure out how much force to apply. So that's why I said risk or yeah, gradient, you would have to take the change with respect to the state change with respect to the error. So yeah, that would be one interpretation of it, right? So it's a gradient of the state change with respect to what you have done. So I also heard the term information gain in there. Yeah, I mean, I mean, I'm just generally saying so in neural networks, for example, we would have like some sort of loss function. And we would calculate the rate change with respect to the change of the loss function, right? So like I have a rate vector and then this particular example is giving me a certain amount of gradient. And I need to minimize this gradient. So I need to change the weights in turn. So the gradient actually interfaces between this loss function and all these weights. So effectively, the gradient passes on information as to how much the rate has to change in order to minimize a loss function. That's why I said information. And also, as a side note, I think I found this sigma in Ryan Smith's paper two. And it is defined as the expected prediction error. So the expected prediction error versus the actual prediction error, like and they distinguish it from epsilon in the Ryan Smith paper, because I did look at that, but I didn't find it in there. Yeah, it's in question in equation 27, under the section outcome prediction errors. It may be that the expected means it's an expectation over the Q distribution, which is your distribution over belief states. Yeah, that's great. Ali, like we were, we Ali and I and some others worked on this yesterday a lot. And it's great that it kept you up late at night clearly Ali. So you went digging around for for some more information. That's awesome. Yeah, great. Okay, perfect. Well, that's that resolves that very well. And then let's get into maybe the next question. So in equation 4.16, the x has a dot over it changed through time, but not the y data. So there's some discourse on this. Or maybe we should go to the next question actually, because it's maybe more related. So figure 4.6 uses an epsilon for prediction error as described in equation 4.21. Is this predictive processing framing a part of the active inference model? Or is this presented for contrast to illustrate the similar similarities and differences between active inference and predictive processing. So this is equation 4.1. And this is the epsilon, but I think we see it way before equation 4.21, like it's even there in equation 4.4. It says this predictive coding schema is part of the active inference model illustrating the hierarchical structure of predictions and beliefs. The authors say one way to think about this is as if we had equipped a predictive coding scheme with classical reflex arcs at the lowest level of the hierarchy. And they give this reference. In this setting, active inference is just predictive coding plus reflex arcs. Does anybody know what a reflex arc is? Because I'm a little bit lost with respect to that. Or have any comments on this prediction? Yeah, reflex arc is basically the stimuli that doesn't go through the sensory cortex. I thought it was like a mathematical construct. Maybe you think of it as a control system. No thinking. It just, it's like, you know, you bang your knee, your foot goes up. Yeah, that makes sense. And it says we minimize free energy through action. And the only part of the free energy that depends on action is the lowest level of prediction error in this hierarchical schema. Action fulfills descending predictions by minimizing the error between the predicted and observed sensory data. Any additional comments here about predictive coding and active inference? Or we can look at this equation 4.16 the x has the dot over it change through time but not the y data. Why is this? And here the discourse says the top equation depends on f of x and v, which is a deterministic function describing how a hidden state changes over time. The bottom equation depends on g of x and v, which describes how data are generated from a single hidden state. The tilde indicates change through time, the dot indicates the first order derivative. Any comments here on this equation or question? So maybe we can ask one more question or maybe a couple more. So here it says on page 63, it says, specifically, the Bayesian brain helps us frame the problems that an agent engaging an active inference must solve. Broadly, these are the problem of inferring states of the world perception and inferring a course of action planning. Other than perception and action planning, are there other tasks or challenges that the brain or organisms engage in? How would we know? There's no discourse here, but let's open it up and see what you guys think. Yeah, people wonder, hey, what's on that star out there? That's neither. I don't know. Is that inferring state of the world? I guess. Not planning. And it could just be a hallucination of generative model, you're not really measuring anything, you're just measuring yourself over and over again. That could be one thing. Okay, so here, there's another question. For each of the graphical models in figure 4.2, what is an intuitive example for each structure? One would be disease diagnosis. So you have symptom that is causative or predictive of some condition. I think that was the example that they gave in the text, right? Or did we talk about this last week, maybe? Yeah, I actually don't get the question because each of them, we have an example, at least an example in the text. So Yeah, that's true. And I think we're going to maybe see more examples as we go through the book. But just in case if anyone wants to see some more examples. Actually, that paper I put the link in the chat realizing active inference and variational message passing contains some more examples for each of these diagrams. This is a fun last question, maybe. It says this reiterates that active inference uses two constructs variational free energy and expected free energy with which are mathematically related but play distinct and complementary roles. In your own words, what are the definitions of variational, variational free energy and expected free energy? And what role do they play individually or together in active inference? My view of expected free energy was given a certain action that might be taken. It might produce this or reduce the free energy or some or it talks about some future reduction on the free energy or something like that. variational free energy is more objective. I don't know how to put it. So it's like, it's like, okay, actual kinetic energy versus perceived kinetic energy kind of thing. So you're rolling down a hill, like a ball rolling down a hill, we can actually measure the kinetic energy at the bottom of the hill, right? If you put the ball at the top, expected free energy would be like using a model to calculate that and then finding out actually corresponds if you have the model, that's my view. So I don't know. I'm probably like, I think about this in a lot of different ways. But I think about variational free energy. I mean, this is one way to think about it is like maximizing like the trade off between epistemic value and pragmatic value like right now and expected for free energy is like maximizing that difference at some point in the future. So you can like plan ahead. Like it might be cold out. So I'm going to take a jacket or like it's cold right now I'm going to put my jacket on. Those are kind of like that's how I think about the difference between those two. So it's 10. Thanks everyone for coming. I hope I didn't, you know, muddle this up too much, trying to be substitute teacher for Daniel. And it's 10. So we're going to have tools right now in this room. But if anybody wants to continue discussing these ideas and gather, you're welcome to migrate up to one of the different spaces. Yeah, and I think we'll stop the recording now. Alex, if you haven't already