 Hello everyone, it's June 23, 2022, and we are in week eight of the textbook group cohort one. We're starting chapter four, and we'll be continuing with chapter four next week. We're more than halfway done with the time and with the chapters that we're going through in the first half of the book. So let's go to chapter four and just raise your hand and gather or write in the chat if you have anything that you want to address. So I'm going first to the math overviews page where I've written some overviews for the previous chapters of varying levels of completeness. But this is very important as we set off into chapter four. So on page 64, this chapter is more technical than chapters one through three, appealing to linear algebra differentiation and a Taylor series expansion. Those readers interested in the details may turn to the appendices dot dot dot. Those who do not want to delve into the theoretical underpinnings may skip this chapter. So keep that in mind as we continue on and it can be an area of discussion and learning and such. But let's approach these themes and formalisms with the authors for warning that this is something we can look for more detail in the appendixes, as well as skip this chapter. Even if we might disagree, of course. Okay, any other overall comments on chapter four? Just for whomever read it, even a part of it, what was their overall perspective on chapter four? What was it trying to do? What approach did they take? Yeah, Ali, and then anyone else? I think if we take the materials and chapters one through three as the foundational materials on which the active inference theory is supposed to build. Well, this chapter four is one of the first steps toward building the actual theory and I mean going beyond just the basics and foundational materials. So using the tools and foundations established in previous chapters, we are now perhaps ready to tackle the problem of actually constructing the generative models in two different situations as discrete time models and as continuous one. Awesome. Thanks. Yes. In the chapters page, we can recall back to chapter one that just laid out the structure of the book. Chapter two provided the low road to active inference, which began with Bayesian inference and talked about a few other pre-requisite or preliminary themes, including introducing variational and expected free energy as imperatives in the sense that they're able to be bounding surprise. Chapter three introduced the high road to active inference, which was starting not from the mechanistic kind of nucleus of the Bayes equation, but rather from the imperative for survival and persistence, also introducing in a first pass the Markov blanket concept and partitioning. Chapter four is indeed when we start to get into many details that were not covered in earlier sections. It's going to first begin by bringing us closer to this connection between Bayesian inference and the free energy evaluation. And then the central idea of a generative model is discussed. That is going to be described in discrete time, specifically using the POMDP formalism, partially observable Markov decision process and then as well as in continuous time. Then there's going to be some very interesting figures and formalisms and discussions on what generative models underlie predictive coding and motor reflexes, which is moving us towards chapter five, which is going to have some empirical work mainly cited out and some discussion on the plausible neurobiologies that can be implementing or modeled as implementing or modeled with the kinds of generative models of which have been prepared for in chapters two and three motivated in chapters two and three really, and then described in their essence in chapter four. Also, just a reminder that in the math group activities, but we're all in the math group, we're all in this learning journey together, we've been striving to make the natural language descriptions for equations. So that would be extremely, extremely helpful. Every annotation that people can make is helpful. People shouldn't feel abashed or ashamed to make any kind of contribution. It can always be reordered or edited by others, but this is how we're going to create those like natural language descriptions of equations and ask questions about them, even just at copying and pasting. What does this mean? What does this mean? They're all helpful contributions because this is very technical and it's not immediately apparent how, for example, these copious equations in chapter four relate to some of the broader discussions from the earlier chapters. But look at that. Okay. If anyone has more questions to add, they can do that. Or if they want to upvote questions, feel free to do that. It's motivating for other people when they see that their questions are like being improved or other people are paying attention to their questions. So that's always something free and helpful that people can engage in. Are beliefs, policy or state? If the person who asked it is here, they can feel free to address it because it's a very short, partially formed question. I also wasn't sure whether this was using the CAD CAD ontology, which uses policy and state in a slightly different way. But beliefs in a general sense coming from the Bayesian ontology are referring to any distributional expression. Bayesian beliefs. Those are states. Policies in the act of ontology, which you can just mouse over and find out, it's a sequence of actions. And policies are constructed or enumerated by the affordances, the E vector that the generative model has over a given time horizon. So if it's able to go left or right over two time steps, all the policies are left, left, left, right, right, right, left. Does anyone else just want to add anything else to this short question? Otherwise, it's a fine clarifying question. Okay. In the discussion of active inference in POMDPs, the author's right to update beliefs about policies. We find the posterior that minimizes the free energy. Does a posterior at time t become a prior at time t plus one? What does anyone think? I guess it depends on the specific posterior and prior because that can mean multiple things. But in the context of figure 4.3, I'd say it does because we are updating our prior, which becomes at the end of a time set becomes the posterior probability, which then is fed back as the prior. Thanks. Totally agree with that. There needs to be an initiating prior D, which is shown with a three here in the discrete time. And that is like the initial prior. And then prior and posterior are referring to what? Incoming data. So after the data at point one comes in, the updated posterior is now playing the functional role of the prior for the next data point coming in. Okay, then here's sort of a related question. Well, we've addressed this question in the narrow sense, but it would be very helpful to look through the order in which different topics are addressed in chapter 4. Because these are some questions that are invoking things that are much later. And we don't assume that people have high or low comprehension of this chapter yet. But many of these questions. It's really important that we understand like why we're bringing them up. So let's just walk through very quickly the chapter to see why they're bringing up things in a different order. In terms of looking at the figures and formalism. 4.1 is returning to base there. And I think this is a really nice representation. Here's the joint distribution of X and Y hidden states in the observables. And it's like it gets split up into Y condition on X and X. So this is like the probability the joint distribution of the coin flip and the die is the distribution of the coin flip conditioned upon the die and the die itself. So it's just separating out a joint distribution. And that's the heart of base theorem. This is a key move to move an integration problem. A sum in the discrete case or an integration problem in the continuous case into an optimization problem. Which permits incremental solutions of improving quality. Unlike an integration problem which you might just be tallying up numbers and not necessarily moving closer to knowing when you're done with that integral. They introduced Jensen's inequality. Which is the log of the average is always greater or equal to the average of a log. And that's true for any shape that has this kind of a curve. Anything where there's a decreasing curve. And here's where that gets applied. Above we had this. Taking this joint distribution. Multiplying it by an arbitrary distribution divided by itself. So that's one. Whatever Q is. Later Q will play a more specific role. And then here the expectation is inside the log. Here the expectation is over the log itself. So the one that we really want would be like the joint distribution. Would be like the expectation of the joint distribution divided by some function Q. Jensen's inequality allows us to make this. Instead of an equal sign, make a greater than an equal sign. And then pull the expectation out. And utilize that as the negative free energy. Which is going to be bounding us on this one that we would truly want. But going about it by using this nice feature of the natural log. Base theorem in the logarithmic form. Anyone can totally raise their hand and add details. Allows rearrangement into this form. Which is already recalling the free energy that we saw from variation free energy from equation 2.5. More details to be filled in. And that's why we want to be annotating these equations and so on. Generative models is the name of this chapter. To calculate free energy, we need three things. Data, variational distribution family and a generative model. Which at minimum is composing of the prior and the likelihood. They're going to be describing two different kinds of generative models. This is a really nice and intuitive graphic. For those who might have different familiarity with statistical representations. The circles are random variables. And the squares are probability distributions describing relationships. These are some base graphs that are simpler and less action involved. Then this canonical active representation. Where pi is policy. Yaakov you wrote 3 is the B matrix. Yes, it is. These 3s are the B matrix. But this is the D letter. So I agree though. Sorry, which one is the D letter? The first three? Yeah. Because they're playing similar roles. Or what do you think about that? I was thinking about it as this being just one snapshot of an infinite factor graph. Just the fact that it's assigned the P of S, T plus 1 given S, T and pi. I understand what you mean by saying that it plays the role of the... I'm not entirely sure though. We'll come back to this because there's actually a few other interesting notes. So this is kind of the simplest of the graphs that they describe. This is like one variable influencing another variable. This is like two factors Z and X influencing Y. So 2 is like a little factory that outputs Y conditioned upon the state of X and Z. So knowing this notation and how to read these directed graphs is relatively essential for interpreting more complex graphs that involve actions and so on. Here's X as like an upstream hidden cause that influences two downstream variables that don't influence each other. And here is like a hierarchical model where V influences X and X influences Y. Does anyone have thoughts or questions about this? This is a graphical probabilistic model. The variables are stochastic or they're statistical random variables and it's a graph. It's consisting of nodes and edges. 4.3 is going to introduce these two basic forms of the generative model dynamic through time used in active in the factor graph form. Does anyone want to describe what they see in this figure? The top is in the discrete time case. S is the hidden state, temperature of the room. O are the observations, the thermometer readings. The 2 is the relationship between the temperature in the room and the readings of the thermometer. 3 is how the temperature is changing through time. And then it is just like Jakob said it's kind of like here we can think of this as being like a little bit of a snippet from this infinite sequence of temperatures through time. But in practice there has to be an initiating prior. Pi are policies. Policies are sequences of actions, sequences of affordances that are concatenated over some time horizon. That's what we're evaluating like expected free energy on. And policies represent causal impact in the world. And we can look at this graph, which is why this is an importance prerequisite to understand. Because the way in which policies influence the world isn't like by taking the temperature and like just changing it to a different value. It's changing 3, which is how the temperature changes through time. Any thoughts on the discrete time formalism? Yes, Mike? So as represented in this figure, the policy is not changing over time? Is that correct? The set of policies in this figure is not changing. It's not that it can't. This is just only showing 2 steps of some policy being applied. Yeah, just the way it is in this figure. So if we were to maybe extend this figure with 2 more of the bottom section, we could also extend policy and have that changing over time and feeding into that. Yes, and that's actually one of the amazing and interesting things about this graphical representation is we can say, OK, well, let's carry out this S to 5 more time steps. And then let's have Pi 1 implemented here, and then let's have it do Pi 2 here. And it's kind of like if you can draw it, you can do it. And that's very important work from DeVries and Friston and Parr from around five years ago was demonstrating that if the Bayes graph can be drawn, that there's a 40 factor graph representation. And where there's the 40 factor graph representation, there's a tractable variational message passing approximation. I don't know exactly what the guide rails are for, like, are there graphs that can't be amenable to message passing and etc. But just at a first pass, for graphs that look like this, they can be drawn and more variables can be added and so on. So it's kind of like a visual code or probabilistic graphical models. So maybe that's a fun exercise is like to think about causal inference in our day. Like the temperature of my room is being influenced by like, whether the air conditioners running in the other room and the temperature outside. Which one of these scenarios does it model on to? And just kind of taking scenarios that are familiar to us and then separating them in terms of how are the hidden states changing through time? What are the observations? What policies influence how hidden states change through time? But this is going to be an interesting different view on the bottom here with a continuous time model. Does anyone want to describe the continuous time variant? Also one interesting piece here is in the live streams that we just did over the last few weeks, 46. They make it clear distinguishing between DAI Decision Active Inference and MAI Motor Active Inference and beyond just modeling cognitive decision making versus motor reflex arcs. They also highlight how the DAI is a discrete time model that's using the POMDP while the motor models tended to use continuous time. So they're laying out and also kind of generalizing to show the similarities between these two different variants. And we can use the notation concordance table to like highlight some of those parallels. This is where the Taylor series approximation comes in and the generalized coordinates of motion, though they can also be applied to the discrete time. Here we have x, x prime and x double prime. The prime notation indicates temporal derivatives and second derivatives. If anyone has a thought on that, feel free because this is not doing time series prediction in the same way that the POMDP is. The POMDP is in the memory of the program. There's a value at the previous state, the current state and the next state and however many other states in the time horizon. The way that a continuous time Taylor series approximation is dealing with reduction of uncertainty about future data is quite different than the way that these discrete time models are doing so. What's happening here is the value of the function at now is being estimated or provided. Then the first derivative is calculated. So here three has the same structure, but whereas this is the hidden state at the next time point conditioned upon the hidden state at this time point and the policy. Here's the derivative of x conditioned upon x and the policy V. V also has a slightly different interpretation because it's not a sequence of events either and analogously for the second derivative and the higher derivatives. So they're still emitting observations, but these are not observations at future time points in the same way. And that is visualized in this box 4.2. So here is X of T at time T. So this is like the way that the time series is going to go. X of T is just this value here, you know, five. The first term being added in the Taylor series expansion is the first derivative rate of change at that X zero. That's giving you a better approximation through time. Then the second derivative adds this quadratic feature. And so Taylor series converge closer and closer, moving further and further away from the target point as they include higher and higher derivatives. But there isn't an explicit calculation in the Taylor series of like, let's just say this is like, you know, one, two, three time steps. It's not like, well, what is going to happen at three time steps from now? The Taylor series could then be plugged in with three to calculate that. However, this is not calculating X X prime X double prime at T equals three. It's calculating it for X sub zero and then using this expansion to achieve reduction of uncertainty of more and more distal points. So that's quite interesting. And it's again, it's an extremely different way of doing time series prediction in the continuous time framework than this kind of explicit consideration of past, present, future state values. Okay, any other thoughts on 4.3? Because this is one of the most key figures. But the things we can explore and ask. Ali? About this figure on page 69, it says the relationship between a state and its temporal derivative here depends on slowly varying causes. I just wanted to make sure, does this mean this slowly varying causes here is used to account for the fluctuations or it has some different meaning? As slowly varying causes. So policies are influencing the causal unfolding of states in this discrete formalization. Here in the continuous time setting, the very the cause slowly varying causes more slowly than these changes. Play a similar role to policies above. So these are these causes intervene in how the derivatives are calculated. No, I mean, here in the discrete time. Well, obviously, we have, we can say stable policies that doesn't that don't change through time, but in the continuous time. Well, I don't know, at least that was my understanding that the policies can slightly change. But the change can be some somewhat somehow negligible. And well, that's because of the additional term for fluctuations in the equations. And they don't necessarily affect the main the main components of the equations. One could set it up so that the policies are ineffectual so that the state changes through time. Or with their own endogenous dynamics, not influenced by policy. Or one could imagine a situation where the states don't change at all through time and their changes are entirely driven by policy. And I think analogously, there could be a setting in which the derivative of X is hardly influenced by these V slowly varying causes. Or V might entirely describe the derivatives of X. But I'm not sure if just at this level of generality, we can allocate importance to kind of the endogenous dynamics or like sort of the policy independent changes. Just the way that those states are changing or analogously derivatives are calculated. But that's a great question. Lyle and then Mike. Yeah, this might be just slightly off topic, but the tools. If you if you're using a multi mode simulation tool, then it can handle both. In some sense, it's going to try and solve both continuous formulations and discrete time formulation. So one they would typically call agent based and the other would be ODE based or something like that. Just the way that they do that is a little bit of sleight of hand because of course they're not solving the equations per se they're estimating the solutions. And they always if you have that then if you have a slowly changing set of parameters like you might model with the discrete case. Then then you you know you've got an event queue and those events are however far apart they need to be. Then to estimate the continuous form. They simply set that delta T delta T goes to zero. In some sense never goes in the software exactly to zero but then you you compress that delta T down to very small time slices. And then you have estimated your continuous form. And so by by doing that you can really have a pretty comprehensive way to represent both cases in the same modeling environment. So yes, great point. Numerical approximations to continuous processes can be done through breaking them down into discrete processes. And then making sure that as your delta T is getting smaller and smaller that you're getting a convergent estimator. The approach that's taken analytically here is to use a Taylor series approximation. Mike. And so in thinking about how we should build the mental model of the Taylor series approximation or incorporate that in our model. That's one example of how you might capture the time series structure. And you could potentially substitute in other approaches to maybe capture finer detail or discrete events that might occur on the series like what like low as decomposition or something like that where you're taking apart components of the time series things like trend seasonality discrete events. Yes, I think the analytical comparisons would be tight or loose but Coleman filter generalized Bayesian filtering splines time series decomposition. These are all in the category or like in the genre of this. Ali by the way line thanks for the clarification that was really helpful. Well, to ask my question more explicitly on page 78 equation 4.15. We have some additional omega terms which are defined as stochastic fluctuations. My question was that is the term slightly varying somehow related to these stochastic fluctuations or not. Yes, I believe that the slowly varying is in relationship to it being included with the flow component rather than with the at that time scale stochastic more thermal like vibration. And that was explored in like live stream 45 with three energy principle made simpler but not too simple with this idea of the length of end. And how physics and mechanics are predicated upon the separation into flow like actions. At a given scale, and then stochastic non flow aligned changes and the path of least action is when the limit of the stochastic term going to zero. Let's come there. Yeah, as people can see, even though it's like, we could totally read this like 20 times. And it still is like, why is the next word there? Why is the next equation there? But okay, they're introducing these two types of generative models that have like kind of tantalizing isomorphisms but also very interesting differences. They're going to first focus on the discrete time. The partially observable Markov decision process formalism is used. Someone asked like, why is the categorical notation used? It just allows it, it makes it easy to look at a matrix and to interpret the matrix as like a confusion matrix, not just a matrix that is confusing like they can be, but one that has to do with like a coin flip. So categorical outcomes. The three nodes are the transition function here in the categorical context between different states. That's this B. And here's where we see the D, the prior over the initial state and B. And together these account for the three nodes and figure 4.3. They play a functionally similar role. Here's where they get distinguished. Selecting between models of behavior. And that was another question, which maybe we could get to requires selection amongst these categorically discrete plans. And then the softmax normalizes and ensures that the probability over those policies is normalized so that they can be understood as a probability distribution. A categorical probability distribution. Here's the expected free energy that we've seen before. More on expected free energy. Specifically on how this KL divergence term, which as we talked about last week was about how the preferred states are realized. Active inference uses F and G. They're related, but they play different roles. BFE, F is the primary quantity minimized over time. That was interesting to read because EFE is what is minimized prospectively through time. Whereas here, the claim is that variational free energy is what is minimized over time in relationship to generative model. So, again, more could be explored there. They're going to focus on a rearrangement of equation 2.6 here with ambiguity and risk. Being juxtaposed with the informational value, information gain, infomax, and pragmatic value. They then move into the linear algebraic form where the bolds... I don't know if bold is used in the same way in this appendix as it is in these equations. Does anyone know about what bold means here? It could mean the vector. Ali? Yes, I also think bold most probably means vector matrix. Yes. It gets pretty subtle though sometimes because there's times where there's italics, bold, and neutral being used very closely. And it's like when they're used closely, it can be confusing. And then when they're used not closely, it's hard to juxtapose. So, more expected free energy rewriting within the linear algebraic form. Softmax. Bringing back the logs. Variational inference rests upon factorization. That's related to the sparsity of the Bayes graph. Hey, so just going to 4.10 again. I was wondering what's the use of the categorical distribution here? Because maybe I missed it, but it's not really explained in the text what they use it for. There's one sentence saying the fifth line shows that the prior belief about observation is a categorical distribution. What is it for? What does it do? I've never seen it before. I don't know if it has to be categorical. A preference distribution could be a continuous function. But here it's just simpler perhaps to show it as a categorical. Like there's two outcomes having the food and not having the food. So then that's a categorical distribution of preferences and of observations. So they're just modeling a situation where there's categorical differences that are being preferred. As opposed to like a preference over some continuous distribution. Does that address it? They're just modeling a situation where there's a categorical difference with observations and with preferences. Just because mathematically it's easier now to start with that? Yeah, I think didactically it's a lot clearer because you can see a 2x2 matrix. Did I observe the food or not? Did I get the food or not? Instead of like a distribution of temperature preference and a distribution of observations, I'd expect that the formalism will work out the same in the sense that you're still minimizing your surprise and you're still performing distribution matching. But also this is like a distribution matching in a categorical context that has an interpretation of almost like false positive and false negative. All right. Okay. I will go with it. Thank you for the explanation. Yeah, it's a good question and I mean, yeah. The matrices help connect to the linear algebra and the MATLAB representations and also just to like sort of match or no match. Like in 46, the example had to do with like wanting ice cream and then they observed ice cream or not. Okay. POMDP and also like where does this eta will come back to another time but just to kind of get through it all the first pass. More details on a POMDP. S and V. Auxiliary variable. I don't think this is the V here because we're in the POMDP setting. So it's just an auxiliary variable used as sort of an analytical convenience. That concludes their discussion of the discrete time model into continuous time. First they describe again Markov blankets and take kind of a second pass. The blankets are the causes of X upstream, which are parents and the children and the parents of the children, the kind of co-influencers. Of course, there's more to say, but that's what the pearl definition is. How that gets mapped to sense and action states and what influences what what's in the blanket on the blanket, etc. Those are model specific. And then here are two common message passing schemes that are used for approximate inference. Unless the base graph is fully connected, there is a Markov partitioning. Here, this is pretty funny. NB. Notepene. Good note. Good note to note. The slightly non-standard use of the expectation operator, perhaps by overloading it with a massive subscript, but it's unclear what exactly they meant there, is showing alignment between these two different schemes. But there's definitely citations to go into more detail on like belief propagation and message passing inactive. And then we see several of these figures personally. I think it's slightly challenging to understand whether this is being used illustratively or whether these specific topologies are as directly interpretable as these topologies are. But here we see kind of, if you blur your eyes, here's policy at the top. Here's the observations. Here's time minus one. S minus T minus one. T and T plus one. So we can see some resonances with 4.3. But now we're in the continuous time. Motor active inference from 46 is justifying the use of continuous time based upon the continuous unrolling aspect of sensory input and motor output. They're going to start in a different place than they did with the POMDP model. In this case, we have much more of a physics grounding. Again, check out live stream number 45 to see about the length and how this is connected. But this is a much more physics based grounding in terms of like a flow operator and a stochastic term at a given time scale. The stochastic term is assumed to have a normal distribution and that kind of relates like what Mike said about detrending. Like one can think of this stochastic term as being what happens when you've detrended all of the signal that's non-Gaussian out. You're left with like the residuals, which have a Gaussian nature. Precision is capital pi and that's the inverse of the fluctuation. And so that's they're going to connect that to the Kalman filtering. Here we return again to the generalized coordinates motion. Coming from this Taylor series approximation route. So if somebody said I have 10 numbers and we need to predict 10 years in the future. The discrete time way would be like, well, let's try to predict it at each of those 10 years and that those will be your 10 numbers. The generalized coordinates of motions approach, which we explored the most in live stream number 26 with the costa at all in Bayesian mechanics. The 10 numbers could be like the value today and the derivative today. The second derivative and the third and the fourth and the fifth, however many, that's the generalized coordinates of motion. So it's like a snapshot of the process and all of its higher derivatives. And so the generalized coordinates of motion are very similar to Taylor series approximations in how they represent centered at a certain point X naught. How one expects as movement happens away from that X naught reducing surprise. And so here is like X with a dot on top is the change in X. And then this is like the derivative of the change in X and so on. Here the tilde notation is used for the generalized coordinates of motion. So we see this equation, which was just for one value, like where you are on the freeway, you know, and then this is the generalized coordinates of motion. The free energy is written down or this generalized coordinates of motion with precision. Here the closest that we came to exploring it was in 43 on predictive coding. There's some more details on Gaussian, the relationship between like Gaussian processes. And also, I think Lyle brought up this like sort of mode and multi mode simulations. This is modeling a single mode, not that it can't model a bimodal distribution, but this is where the Laplace approximation comes into play. This is potentially even unnecessarily complex, but it's there. And the Laplace approximation is fitting a quadratic distribution that tracks the mode. So like the highest point on the distribution and then it fits a parabola around that. Here is more details on the hierarchical nature and the kind of multi timescale nature that's implied, ultimately by the lengthen. And then there's a it'd be good to juxtapose figure four six and four four to see how they're similar indifferent. But here we see message passing happening on the generalized predictive coding architecture. And that was like explored a lot more, not with this exact figure, but this was explored more from like an analytical and empirical perspective in 43. Then there's an abrupt ending, but the following chapters are going to appeal to the formalisms and apply them. So any thoughts or ideas on chapter four as we close out. And hopefully next week we can have a lot of questions and things like that, even basic questions. It's like, if the if you read it and you understood it, then it'd be awesome to contribute a question that would explore someone else's understanding or prompt towards how you thought about it in a way that made sense for you. And if you don't understand it, then just ask the question, Ali, and then anyone else. I just wanted to briefly mention an additional point related to the question about the reason behind using the categorical distribution. On page 74, it says the fifth line shows that the prior belief about the observations is the categorical distribution who is sufficient to statistics are given in the C vector. I think here sufficient statistics is the key term to understand the reason behind using categorical distributions because for sufficient in sufficient statistics. We don't necessarily use the exact the exact modeling or distributions. Instead, we use a substituted statistics that is simple enough to calculate and also close enough to the to the actual data. So I think that can help in understanding the reason behind that decision. Good point. And this is definitely a technical note. But is it fair to say that like the mean and the variance are sufficient statistics for a Gaussian distribution? Yes, I think so. So like that's one of the perhaps proximate mechanisms by which the Laplace approximation and variational inference more generally are able to deal with arbitrary true distributions by fitting a family of distributions that have tractable optimization structure and a vastly reduced set of sufficient statistics. If you had even just a simple bimodal distribution, well, there'd be like the location of the two peaks, the relative heights of the two peaks, the skewness and all the you can have many, many parameters needed to describe it. In contrast, the Laplace approximation only requires the location of the mode and the variance estimate. And so where that's adequate, it's simple and fast. Where it's inadequate, you'll at least know because you'll continue to be surprised at new observations coming in. Okay, any final comments on this chapter four part one? Definitely a challenging though interesting chapter. And it really is at the heart of active. And as we hopefully explored a little bit today, it includes like basic intuition pumps, as well as some Rosetta stone like representations. And many, many equations, which I hope we can unpack. So could someone explain it in simple words? It's what we've been asking for every single equation. And it's something that everyone can contribute to. All right, well, looking forward to people's edits over the coming week and additions. And we'll come back in one week for part two on four.