 Welcome. It's cohort one meeting 17 already and it's September 23rd, 2022. We're having our first discussion of chapter seven in the textbook. Well, we have many ways to go. We have questions on chapter seven that people have been adding. We also have some summaries and overviews and can walk through the text directly. But first, does anyone want to just raise their hand or unmute and give any thought that they had on chapter seven, their experience reading it, their understanding of where it's situated in the textbook overall, what was in it or what was not in it, et cetera. Okay. Well, I'll open. Again, please raise your hands or write in the chat if you want to address anything at any point. I want to open by acknowledging a lot of the contributions Ali made and in some conversations that he and I had earlier this week that I believe all of you cohort oneers will find interesting. But first, let's just start with this quotation. We're gonna start with the opening quotation of the chapter that I'm gonna surface some discussions with Ali and then we're gonna go into some details of the chapter. So the quote is, what I cannot create, I do not understand by RF. So what does anyone think about that quotation or what does it mean in this context? I think, I mean, it's more applicable. The more complex the thing is that you're trying to build, but how could your generative, what priors or what way could your generative model create an accurate prediction of the operation of something or the underlying dynamics of something if you do not kind of step-by-step generative and take the actions that generate those affordances. I don't see any causal path to doing that. So without, like you would have to step through it necessarily if it's computationally like complex. So that's most things in the world, but. Awesome. It's like, this is like the low road answer. Like how can the generative model have anything like understanding without generating? You can't just have it on the shelf. And then someone has added a mild answer. We can't have understanding just through mental envisioning that the algorithms need to be implemented for a learner's journey. And then this is also an even deeper or stronger point which Frist and others have been working on for a long time which is like it's sentient artifacts in the world that will be the realization of active inference. It's not just like some nice derivations, Brock and then anyone else. I was just gonna add something about hidden states there. I mean, it's again coming back to computational complexity but there's things now that we're starting to build that we don't really 100% understand all of the dynamics of what is, or just in general, like we invent things that we don't completely understand first and then based on the observational, based on the evidence that we observe that it is consistently exhibiting some behavior that directs our attention, that directs where our generative model pursues like more observations. But we're definitely getting towards a point where that's like kind of no longer going to be possible in the way that we usually use, try to use math to just shortcut stuff where the operation of the dynamics of the things that we're gonna build aren't necessarily going to be, they will be the proof, their existence will be their own proof. Awesome. Okay, so just fun, starting quotation. Upload that one for sure though. There's some detailed things about the examples. And I also, and some of these I've been working on, everyone is welcome for every chapter to be contributing on these pages, like kind of just trying to overview what these examples are, because the chapter, it states it up front, yet I missed it the first several times reading it. Like these illustrated models of every color in the rainbow are the section titles. And those are the functionalities that are getting layered in, sometimes building on the same model, like there's like TMAs version one, and then there's TMAs version two. Other times it's switching between models or showing two different models that illustrate the same cognitive functionality. Other times the simulation outputs are shown, other times they're not. So there's like a lot of heterogeneity in the observations in this chapter, but it really wasn't until I saw that these first paragraph words were the section titles, that it made more sense how the chapter was being like laid out. So that's just kind of one note. Ali, do you wanna raise any thoughts or describe a little bit about our discussion on like how we're gonna move forward on the equations? Okay, if you unmute, then go for it, otherwise I'll, let me see. Okay, yes, no mic. Thank you Ali, all good. So we're discussing how there are several layers as with other scriptural traditions. There's levels of readings. So there's a level of reading that is even more granular or below where we've been currently targeting. Like here's equation two, five. There's a reading for accessibility and understanding that would potentially be like F open square bracket, Q comma Y closed bracket equals negative sign, expectation over Q open parenthesis, et cetera. So it's like the reading of the notation. Then the notation can be substituted for terms in the act in fontology, which is like the level that's being described here. But this is not to be seen here. This is the level where notation could be aligned across papers because their notational reading would be different. But then they might be referring to the same exact composition of the act in fontology. Then once the ontology definition's been squared, the terms could be condensed into meaningful units like accuracy, complexity, risk, expected risk that gives a symbolic in the more conversational sense rather than in the like notational sense. And then that's where memes and themes and rhetoric around like what active inference formalisms say start to arise. And then there's the consequences of changes in relationships within and outside the formalism. Because it's kind of like there's this larger structure for like a POMDP and changing one variable if you're only looking at like one view or one kind of sub formalism, like, you know, turn it up to 11. Yet that might influence other variables that aren't part of that formalism itself. And so this is like a very open question that touches on everything from like the implications of changes to relationships that aren't formally specified. For example, the Bayes graph, the edges reflect a certain type of relationship as we've explored, but changes from one can propagate throughout a whole system potentially in non-linear ways. So it's always gonna be this kind of open area how changes including like counterfactuals about the formalism, what if risk were situated this way instead of that way? Or how does that relate in this case? Like, so this is kind of an open one. Again, people, please update it and modify it. But like some of these equations, which we'll get to, so we'll come to why we had that conversation in a 7.8 later. But that was like one really interesting conversation. And then there was one other, okay. Then just to kind of jump into a chapter seven topic, but also kind of reframe our learning journey as we're all working through this for the first time in the textbook group. So they wrote this choice in the teammates. And then Eric or anyone else, I know that you had some questions on the teammates or we can go to the earlier examples, but this choice of what to do. Do you seek out the informative queue and then go to the arm that you've now reduced your uncertainty about where the reward is? So do you take an epistemic action and then have like a better chance of making the pragmatically good choice? Or do you just go and make the pragmatic choice? Do you take one step up or do you take one step back for two steps up? It speaks to the exploration, exploitation, dilemma in psychology and machine learning. A dilemma that's resolved under active inference. So we were just cracking up because first a low familiarity learner might not know what these terms are or like why it's relevant to even talk about this situation. Familiarity might look like knowing metaphors, examples, intuition, some relevant citations or historical anecdotes about this. The understanding in terms of what the active inference formalism says about this, if it's resolved under active inference, then understanding that resolution is about the basis. And then from understanding, there could be like further developments and applications. And then we'll come to this, but what was making us laugh was, surely there are parameterizations in which the rat and the teammates always does one thing or always does the other. So it's not that the Explore-Exploit trade-off is simply resolved under active inference, even though people commonly point to things like the ability to rapidly switch between exploratory and exploitative behavioral policies, because there's parameterization of that model and structure learning, including that, that has to be accomplished such that the model can be in a kind of critical point where it behaves adaptively and thus in a situation manages the Explore-Exploit tension under certain constraints. But how could it be said that active inference resolves that dilemma? So maybe to whomever wrote this one, what is happening? Yes, please, Eric, classical Eric writing here. So what about this? Yeah, I'm kind of the hecklier at the garden party here. You can see in most of my comments and questions. But yeah, I mean, maybe it's just best to, I mean, it's maybe a bit of red and spoken, but yeah, I don't see this as a valid claim that the, well, first of all, it's not, I don't want to call it a dilemma for psychology. It's a trade-off. It's been understood for years and there's ways of formalizing it and you solve it through optimization. It might be a dilemma for a creature because they don't know what they should do, but that's not a dilemma for the field. And then I guess the other paragraph is, I don't see that they offer any resolution. All they do is they put it into the same, they use common language, free energy. They put exploration and exploitation into the same equation and call it, you know, with this information-based term called free energy, which is perfectly valid. That's great, but that doesn't resolve the problem because as you pointed out, you've got parameters and the designer or somebody has to figure out what those relative weights are, the parameters are. And so you really haven't advanced anything fundamentally simply by using this free energy way of expressing what the trade-off is as opposed to some arbitrary measure called energy or preference or something like that. So I just think that it doesn't help them when they over-claim, which is my opinion and you can convince me otherwise. I'm open, but it seems to be, it doesn't help them by overstating their contribution to resolving exploration versus exploitation. Yes, very insightful. I actually totally agree. I purposely tried to not mention it as a dilemma either. I think tensions, trade-offs, these are all valid, but that's like kind of micro-linguistic and a more serious issue is that claims that do crystallized out from the text and things that people can copy and paste and quote, if this were a courtroom scenario, I mean, did you say that active inference resolves exploration, exploitation? It's gonna have, the answer has to be yes, it was claimed in the book. Could somebody contextualize that? Like it's providing a first principles way where there are tunable or learnable or parameter-sweepable features of models that sit at a really well-positioned intersection of human interpretability and manifolds of relevant model variation? Yes, in an extremely charitable reading, one could kind of bring this from coming off over the edge back onto the table. That would also make active inference less surprising and less realistically hyperbolic and be part of this process of seeing the evidence in the realization, not in the kind of ironic, like we have a different partitioning of a statistical variable value. So nice points, Brock. I guess I wanted to agree with that and then also maybe play a little devil's advocate because it's like, yeah. We're all, it's all information. So solved, that's not really, just because it's been formulated or renamed to something pragmatic value, epistemic value, it doesn't seem like that resolves it or something. I'm wondering if the reading of that though could be that not that there is a, there is still a trade-off but that the dilemma is that, what they're talking about is that the trade-off like when do you explore or when do you exploit? Not that there is a trade-off, but that that is this last seven and a half here of, or the resolution stems from the minimization of free energy, is that whether you would seek pragmatic value or epistemic value is conditioned on which your model believes to be the free energy minimizing choice. Though it doesn't, you still have the dilemma, so to speak of the trade-off, but which side of the trade-off you take is maybe resolved by a generative model here or not, I don't know, is that a... I think the field is set for a situational resolution that may be as good or may be better than other ways that it's been addressed, but it comes down to the exact numbers that are chosen. So like in the example of the teammates, so we can look at, for example, this. Zero, six and negative six, huh? What if it was negative 3000 and positive 3000? And so this came up in the live stream 45 with Ryan Smith on the folk psychology where the intensity of the preference was represented. And so if the preference is so, if you're okay, we prefer having more food. Is that gonna be a thousand versus zero? Because if so, even the scantist probability of achieving food will be pursued. In other words, that model is gonna be parameterized way, way, way on one side and it's gonna exploit only or it's only gonna take locally greedy behavior. Whereas my behavior for food is 0.0001. Then maybe even the most obvious strategy would not be undertaken because it's so close to having a flat preference. Why is 0.01 versus 3000, why is 3000 too high and 0.01 too low? Well, it has to do with like the ratios and the interactions of a lot of the parameters. So it's not even gonna be like, well, my preference is three comma one. That is gonna be just like two parameters being drawn out from a potentially massive parameterization of a model. And so preferences, just only speaking of preferences, may be very hard to compare across situations because it's not like plus six for the preference for food and negative six is, it doesn't have a semantic meaning in the world. It's literally a model parameterization. And so then the question becomes how can this framework or this way of having, of combining probabilistic inference and energy-based inference for strategic decision-making, how can those parameters be tuned into a region of subspace or manifold where the behavior is flexible and adaptive? So I think the stage is prepared for a first principles resolution in that charitable reading. Yeah, Ali, I was just gonna say, Ali shared this paper meta control of the exploitation, exploration exploitation, it's in that nearby chat link. Nice. But yeah, it's not that it solves it like in some grand, it eliminates exploration exploitation or that it even completely currently, yeah, it's just the affordance, I guess, seems to be there to explain which one, why, like you're going through that, there's a preference heavy here or heavy there and it's not that it's, it's baseball, right? It's not that it makes sense or it's adaptable or anything like that necessarily just that there is some causal through line, maybe. Yes. Eric? Yeah, so I want to first read Ali's comment here. The critical point to establish is to define action selection as a balance between maximization of expected reward and expected information gain which are functionals of posterior beliefs about late states of the world. So that touches on different, one of the questions which I also wrote comments on which is about the team A's example in particular and we may, I don't know if we'll have time but we may get to that one today. So the, but the questions specifically I want to raise about this comment or maybe this I guess is a quote about there being a balance between maximization of expected reward versus expected information gain. That's one way of having a trade-off but you might look for an agent that only wants to maximize reward in the long run but in order to do that that entails information gain and that therefore gives a motivation for why you want to attend to the other term, the information based, or learning based term or knowledge refining of state based term but that's in the service of the larger gain and that's essentially what a reinforcement learning is tries to do is say, look, we're gonna look ahead so that the moves we take now will bring us advantage in the future and there's a time discounting and all that. So that doesn't, that takes away the idea of information as being a value in and of itself but it's only for a purpose and so I've raised that specifically with regard to the teammate's example in a different question. Yeah, thank you. The balance between pragmatic and epistemic is the optimization. So that is the question and that is the critical point and that's the situational balance that modeling is going to be evolving around. It's like some situationally optimal or preferable strategy around that is the point but pointing to the point is not the resolution. Does active inference providers a natural language grounded? First principles approach, it may but that is also not a dilemma resolution. So on to some questions but these are really great points. Okay, this is just an example of a type of question that I encourage people especially if they're going back to the earlier chapters or just whenever they feel like making a contribution of this kind these are like super helpful questions. Again, the questions we write some of them may be included in really important educational materials not the least of which many further textbook groups but potentially even far beyond that. So what would you ask somebody to understand their comprehension of the materials? So I just said what are the rows, columns, and numbers in these equations? And it even says it in the text. It's the probability of the next state. So the next state is in the rows and then the current state is in the columns and this gets distributed across everything so that the rows and columns sum to one as any other transition probability matrix would be expected to have. So we don't need to discuss it unless somebody wants to go into a little more detail but this is gonna be at the core of that third step of the recipe which is you might have the structural form from the second step of the recipe in chapter six. And then the next step is to kind of specify or instantiate like in code what the dimensions are of these matrices. This is just saying like they are matrices. There's some prior tensor matrix, some hidden state, some transition matrix, some emission matrix and this is like the structure. Step two of the recipe. Step three, how many rows and columns? And as Yaakov and others have been exploring there's the rows and columns of the analytical representation with the equations and then there's how PyMDP does it and there's how different computational realizations do it. Yaakov, would you wanna add anything about like the dimensionality of these matrices? It could be interesting. Yeah, I mean, I can probably speak more to how PyMDP deals with the dimensionality which currently is probably as unnecessary like redundancy to them. Like if you're dealing with more complex state space or rather more high dimensional state space and you have different modalities that you wanna encode in these matrices, you, because you're then performing matrix multiplications you need to keep the dimensions of the matrices in those given modalities equal at least in PyMDP, not in other methods as far as I'm aware, but that means that then you have dimensions within these matrices that don't contain any information. So you have a grid that you represent for the likelihood mapping of observation and state in physical world, but then you have another type of observation like where is another agent located with respect to you? Like are they in a certain direction and that might be represented by five observations say like they're above, they're below, they're to the right, to the left or not at all. Then you would have, then you would need to encode the observation in the nine by nine matrix, but then you need to replicate it five times because that is the same for each of those relative observations. But then when you want to encode the likelihood mapping just for the second type of observation, I'm actually not entirely sure what that would be what I'm presuming it's gonna be like five by five but then replicated nine times or something like that. Just so the dimensionality matches because then you're doing inference over both modalities and at least in prime BP, you're then performing matrix multiplications over both of them. Thank you, Jacob. One way I'm kind of like seeing that is there's the sparsity of the graph, but then in time DP or in any given computational implementation, the data structures including potentially like auxiliary data constructs may have dimensionality that reflects potentially at worst, a combination of the dimensionality of other aspects as part of even just a temp file that you didn't even exactly specify. There might be some intermediates, stated or unstated intermediates whose dimensionality isn't merely the dimensionality of the analytical representation. It's a little bit of a subtle implementational point but this is the mountain to climb to implement these kinds of models. And also just while we're on this topic, I think it's one area where CAD CAD is gonna be very interesting to explore in active blockference package because we also can imagine that the execution order of certain operations, even in a single agent setting, not just the multi-agent setting is really relevant. So there's that question. All right, so figure 7.2. All right, so the first example is about a musician playing music. We are not aware of this example being used anywhere else, like the citations following those, this musical note type representation's not like shown. And previously we kind of discussed how some of these trace black and white images are really ambiguous because of being like intersecting black lines. And then of course, the empty square. So, yes. Can you see where those black lines come from? You wanna dive into that right now? Yeah, well, okay. The upper lefts, each black line, even though it can be hard to trace them, I think they're just being shown like representationally. There beliefs about each note in the sequence at each time step. So the time steps are on the x-axis and the belief in terms of a probability are on the y-axis. So initially- Right, so there's some sort of, and this gets to another question that was raised, but maybe this is time to talk about it. There's some sort of engine, some sort of differential equation simulator or something running underneath here, right? That is not discussed in the chapter, but that's generating these belief curves. Yes, yes. It's related to some functions in MATLAB. And I was just going to mention that because my sense from the discrete model would be you'd have five, let's say points on time zero, then you'd have five points on time one. So I thought, well, what is happening with this kind of like little fractal crash right here? I mean, how could you have a continuously differentiable oscillating type behavior when all you're calculating analytically is just one, two, three, four, five? Then I thought, is it meaning, are the points actually at the midpoint? And there's a differential equation that's sort of like ribbining an action perception loop through a discrete time matrix. But then if this is a function that's evaluatable at every point, aren't we in a continuous time setting? So wouldn't that have to imply like a machinery for unpacking a continuous time action perception cycle from discrete time specification? So I agree very much. Yeah, I mean my guess is what they do is they've got their equations, the free energy equations and the parameters are these distributions. Okay, what's your belief in these different properties? And those parameters evolve over time through a differential equation because you turn these things into a differential equation. And when you change the state, something like, okay, now we've got another note comes in, then that's what's gonna trigger the differential equation to go in a different direction, essentially. And that's why you've got the five discrete regimes here or target points that it kind of evolves to. And then the next note comes in and now the boundary conditions change, essentially. So your differential equation takes you in a different direction. So I recall that there's something like, you guys probably discuss this more than I paid attention but there's some sort of universal solver that they are providing. So you just plug in your matrix, matrices, your state equations, your transition matrices and then it'll solve it all for you. And that may be what this differential equation solver is about. Do you think that sounds right? Yes, I think the, yes. Whenever, variously in the method sections and supplemental sections of paper, they'll say, in the paper we're only specifying the generative model and the matrices and then we use standard routines to address it. Now, I don't think that has the, it's not the pinnacle of accessibility and reproducibility. However, it does at least use standard routines, but again, those are not always specified in the paper, which is an issue. So it'll be important to regenerate examples within SPM and then also to use some of these things that we were discussing with like the ability to move between different languages. But like VBX, variational Bayes X, I don't know what the X is actually for, it is very, it's one of the core functions. It is actually doing the variational inference in SPM. And again, SPM is like this sort of like chimera package because it arose from the immediacies of needing to do neuroimaging registration and dynamical analysis. That then broached into dynamic causal modeling, random field theory based statistical testing, permutation testing, all of these areas that are not themselves formally linked per se, but rather useful as part of the toolkit of a neuroimaging researcher. Then just as we were a little bit discussing like chapter two, and of course many times earlier, in that generalized Bayesian inferential framework, hierarchical modeling, including action, is actually not the hardest thing. Yes, implementing action is one thing, but treating action selection as inference, planning as inference, it has some challenges that arise. Relative to just doing time series anticipation. But then those functions became compiled into SPM in a limited capacity. So they, again, they go through some of this, but I think it's a key issue. Ali will definitely look forward to what the free energy gradients are. But where the continuous nature of these curves and why there are the numbers that there are. One, two, three, four, five, six, seven, eight, nine, 10. Now there's five notes, 10 is two times five. Like it's just, there's a little bit more that needs to be fleshed out there. And then one question that was like, again, somewhere between testing our comprehension and being clear about what we're doing here. And in this ambiguity, we can ask it without any bias. The negative free energy gradients, i.e. prediction errors. What? Is a prediction error a free energy calculation? Well, ignore gradient for a second. Free energy calculations are not prediction errors. Prediction errors are observations minus expectations. So this is like some finite value in the state space of what is being measured. Free energy is not that. It references potentially those variables, but it does things that we know about like KL divergences and such. Okay, so free energy is not prediction error. And that's a huge difference between for example, a predictive processing framework and a free energy hierarchical predictive architecture. Eric? But they're saying the gradient and free energy is a prediction error. Exactly. So even if it were, then that was the next question, which was, what's the difference between free energy and the free energy gradients? Are we talking about a landscape of derivatives of free energies? Or are we talking about what here? Like, is it being described as a gradient, as a landscape? And hence gradient-ish because it can be calculated over some discreet, by continuous state space? It's like five skyscrapers that you could have derivatives over in the y-direction, but you couldn't take the partial x? Or is it truly, or is it a continuous landscape that just being tethered to the discrete values in x? But then again, that's the whole continuous discrete question. Or are the values gradients, which has an even more complex interpretation? So we don't have the answers on these, but these, it's hard to understand how these figures could have epistemic or pragmatic value for learners or practitioners without some of these questions being resolved. All right. The first example, this is just kind of from walking through. So the first example was just hidden Markov model, hidden states updating through time. Bayesian filter, common filter, this is like just standard signal processing in a Bayesian framework. And we're gonna see a lot of echoes of the step-by-step model stream one. They give an even simpler example with static perception, one step perception, and then go second to the dynamic case then introduce action pie policy and then nuance policy through like uncertainty and so on. So that's the first inferential case. They're then going to head into decision-making and planning as inference. So we see policy being introduced as a variable that influences how states change through time. Does policy do that or does action do that? In the one step limit policies are actions are affordances, but pie is specifically reserved for sequences of actions. So is it that sequences of actions like do that or is it, you know, is this the actual based apology? But nonetheless, that architecture allows the evaluation of alternate policies in terms of their relative variational or expected free energy, depending on whether a one step policy is accomplishable with variational free energy. That's like the instantaneous move. Most consistent instantaneous maneuver versus even one step in advance is going to broach into this whole expected free energy space. We had on 130. So there's a really important discussion. Bozaki is also a very interesting researcher, neural rhythms and so on. Factorization is a really important topic and being able to distinguish like neurologically what do we mean when we talk about the what and the where stream, the dorsal and the ventral stream and critiques of it, et cetera, but just what and where in the brain and then in the factor graphs and computationally because the un-factorized models, if we're just doing the all by all by all by all, those become intractable very fast and the specification of the sparsity of the model is one and the same in the variational based framework as the factorization of the model. So factorization is like kind of focusing your search efforts on manifolds where you've constrained certain things to be like linked or unlinked. That is what motivates the structure learning problem and the need to not be locked into factorization schemes at a given level of analysis. Okay, another just question to explore like here we have the top of a half a figure for three there's just one A matrix. So why are there two A matrices here? Well, it's not an impossible question and the symbols go a long way. Here's the four locations that the animal can be in starting position, bottom position, left arm or right arm. Then here are the, it's five on the X here because that bottom could reveal an R or reveal an L. In this case, it reveals an R and the food is honestly there. And then this is a second A matrix that is describing how location is associated with the food, none aversive or positive. But this ties into the earlier discussion like it might be all one thing to say, well, it's four by five and then four by three. But is this actually a four by five by three? I know it's not. Eric, what do you think? Yeah. And they'd mentioned in there is a tensor also which in the tensor, it means that you're stacking this one, this image against the version where the reward is on the left side. So the tensor is gonna be four by eight by two. So that makes sense for being the tensor. So the state is another layer on top of this. The hidden state is another layer on top of that. Yes. Section seven three. So this would be a fun, you know, kind of like PhD qualifying level question. Just describe the whole teammates, every row, every column and every value. Why is it that way? So we're only gonna type it out here, but that is to understand that example and what the B matrices are and just what and to have agility and just identifying the differences between these matrices numerically, like pattern recognition, like, oh, here, there's two ones here, two zeros here, two zeros here, two ones here, one and zero, these stay the same. And then to be able to transpose that in your understanding to what this means in terms of transition frequencies, every place where there's a difference, it's a difference that makes a difference. So being able to understand what those are is about understanding this example. And one can imagine, especially if they're wanting to make a application of active inference that's more complex than this four by four, being fluent with how these matrices are constructed, their dimensionality, how differences in the generative process are incorporated, how differences in the generative model are incorporated, all these different features are important. This is, yes, Eric, this is definitely a very important question, I agree, like, it looks extremely neural. This has, and these traces come up all the time. Similarly here, we have three discrete time points. Starting, going to the Q, getting the L, and then going to the left arm. But then what is a step one and a half? Is the one, all of these points, and then by two, the uncertainty is resolved, the belief about certain things goes to zero, the belief about other things goes to one, and this is just a pure interpolation, but that is quite a specific interpolation, including what appear to be some, like dopaminergic spikes or something that are not reflected at either of the time points. My guess is that you've got three different blocks there, so if you take the one as being the width, you know, one third of the way across, the two is another third and the three is another third, then that switch there that you see happens as soon as you transition from the one to the two. Yes, although still there's even the graphical interpolation question, like, and it's just, it's so easy to be, like reading, is it just a quirk of the dashed line? But are there two dashed lines? And one, two dashed lines diverged in a neural trace. So, but it's kind of clear. These ones, the discrete formulations are clear. Yeah. Are rats prone to useless behavior? What do you think? What useless behavior? I wrote it out. Okay, okay. Yeah, this is good. Maybe we'll have time for today, but that's my most provocative comment on this one and maybe we can save that for next week. Okay, perfect. Let's pick up with this one and any other questions. Then we'll glance over the visual saccade, look at the learning example, consider the hierarchical example, there's the maze, then there's the hierarchical example, another nested discrete meets neural continuous trace, and then we end on seven squared. So, thank you all, have an excellent day.