 Hello and welcome everyone to Act in Flap livestream number 32.3. It's December 16th, 2021. And we are really excited to be here with Connor, one of the authors. Today, we'll just jump right into it. Our goal is to be learning and discussing this awesome paper stochastic chaos and Markov blankets from 2021 with Carl Friston, Connor Heinz, Kyle Ulzhofer, Lancelot D'Costa. And Thomas Parr. And we had a dot zero dot one dot two on this paper. And today we're just excited to have a dot three. So we'll have a presentation by Connor when we'll all be providing our regime of attention and writing down any questions and comments we have and writing any questions in the live chat if we would like. And then we will have a discussion to kind of follow up on the presentation. So Connor, thank you for joining and please take it away. Great. Thanks Daniel. Thanks for having me. I'm really happy to be back. This is my second time on the stream. So it's nice to be here. Okay, I'm going to share my whole screen. And now here's the presentation. Looks good. Thank you. Looks good. Okay, cool. So yeah, could you maybe write where it says hide meet Jitsie is sharing your screen. Yep, and then hit hide. Yeah. The whole thing disappeared but then yeah, perfect. We're back right. Yeah, thank you. Yeah, so as Daniel said, I'll be presenting on this paper that was published this fall in entropy called stochastic chaos and Markov blankets. And it's really great that you guys have already gone through the paper in detail from what I remember from watching some of the past streams. So we don't have to go through every single equation and every figure, because you guys have already kind of covered it very thoroughly, but we can dive in with as much technical details anyone wants to. So I'm kind of going to give broad strokes of the paper and also frame it in the context of other literature that's come out on similar topics in the past few months like as a bunch of literature has been published on the same sort of themes. So that'll kind of help contextualize what I consider to be the main contributions of the paper. And there's both methodological and theoretical ones so there's a lot to talk about here and as Daniel said I guess afterwards we'll do this discussion but if either of you has question or anyone in the Jitsie has questions during it like feel for from my end I'm fine being interrupted but whatever the format is fine. Okay so without further ado I'll get started. So just a quick introduction to me. So I'm one of the co-authors on the paper and I'm a PhD student based in Constance, Germany where I'm supervised by Ian Cousin who is a kind of leading scientist in the fields of complex systems and collective animal behavior. So he's the head of the Department of Collective Behavior at the Monts Planck Institute for Animal Behavior. And so yeah in Constance we're basically building a kind of hub for the study of complex systems and collective behavior in particular. Then I'm also co-advised by the lead author on the paper who's Carl Friston who kind of came up with the fundamental premise of a lot of the concepts in the paper like the free energy principle, Bayesian mechanics, Markov Blanket, stuff like that. So the outline first I'll talk about the motivations for the paper, some of the contextual literature, and then I'll summarize what I consider to be the main takeaways but there's a lot of different perspectives on the paper. So these aren't by any means the only way to interpret the paper. And then in the interest of just kind of getting us all in the same conceptual footing I'll review stochastic processes and in particular a special interpretation of stochastic processes called the Helmholtz decomposition which is something that you guys have already talked about in the past streams here quite a bit. And then I will step through the primary moves of the paper in my mind and I'll leave out, I'm not going to go through every single figure like I said but I'll just step through what I consider to be the primary moves. So in this past year a bunch of work has come out on the general theme of Bayesian mechanics. So if you want to kind of look at this paper in the context of other work I would say that the major works are things like the Bayesian mechanics for stationary processes which just last week was finally published in Proceedings A of the Royal Society so we're very happy about that. And that kind of established the fundamental mathematical formalism for Bayesian mechanics and the conditions under which you can apply a Bayesian mechanical lens to a dynamical systems, random dynamical systems. So this is something that's been covered in previous live streams and a few of which Lance and I were attendants at. There's another great paper that's a preprint right now by Miguel Aguilera and some of his colleagues that studies very similar grounds that's covered in this first paper basically studying the necessary conditions for Bayesian mechanics and in particular uncovering what's known as the synchronization manifold and studying that in the context of simple linear Gaussian systems. And then finally there's this paper also produced also in the same specialist issue of entropy as the present paper called memory and Markov blankets and that's again studying how Markov blankets actually evolve through time. I mentioned these three papers because they all study Bayesian mechanics and in particular they're all focused primarily on linear systems so systems that are often in the stochastic literature called linear diffusion processes or Ornstein-Wulenbeck processes. So that's a really deep common thread to all of them is they leverage the mathematical tractability of these simple linear systems to get really nice analytic proofs about when Bayesian mechanics exists and when it doesn't. I say the primary focus on linear systems because in for instance Bayesian mechanics were stationary processes. The mathematical derivations actually apply to any processes that have a Gaussian steady state not only linear ones but the primary examples used in that paper were all linear systems. I mean there are a few nonlinear ones but the focus and the the tight derivations focused on linear systems. So given that kind of contextual background now we can talk about the main takeaways of this paper stochastic chaos and Markov blankets. So the first main thing is basically taking a random dynamical system in this case we're taking a stochastic version of Lorenzo tractor and we're trying to basically apply the lens of Bayesian mechanics to it. So the same thing that these previous papers have done to look at the paper from the perspective or look at the system from the perspective of how is the system look as if it's forming beliefs about other subsets of variables in its environment. So that's what's something we also call the physics of sentience. How can physical systems not necessarily things we typically think of as sentient actually exhibit the hallmarks of something like biological computation or representation. The second main takeaway is finding that you can fit the Helmholtz decomposition which is this special interpretation of stochastic processes that we'll cover in the next few slides. How can you kind of learn that decomposition automatically from data or not from data but from stochastic dynamical systems where the usual tractability doesn't exist. So that's a big methodological contribution actually lays out kind of a steps or an algorithm for doing this automatically for systems that lack the usual analytic tractability like you get with linear systems. And finally, I think another kind of big contribution here that was kind of tucked into one of the later sections of the papers was pretty interesting is a new result on the relationship between the Markov blanket and causal coupling between systems. So we basically show that a Markov blanket does not always imply causal relationships and vice versa. So the Markov blanket is fundamentally a statistical relationship that doesn't always go hand in hand with a causal relationship. So those are the main three takeaways that I'll discuss today, but there's a few others that we can also dive into that may have been covered in other live streams. So to do the first goal, like basically deriving a physics of sentience, we need to actually start with Bayesian mechanics as our goal, and then from there we'll get to the study stochastic processes. So I'm here quick first given overview of Bayesian mechanics. The general idea of Bayesian mechanics is very, I would say it can be described very non mathematically, which is basically you just have a system that's embedded in some kind of ambient environment. And crucially, that system has an interface with the environment can only exchange information with the environment through some kind of sensory or active interface. So in the context of the most popular use case for Bayesian mechanics, that would be something like a brain and nervous system that's insulated from the world via things like muscular organs, effector organs, and then sensory modalities. So the interface is comprised of the way we can impact the world around us, as well as the way that we kind of absorb or are impressed by information from the world. And then you can zoom in, right, and say that a single cell has the same kind of relationship where intracellular processes are like the equivalent of the brain. You know, this is genetic things going on with DNA or just intracellular metabolic cascades. And then the interface would be something like the cell membrane, which kind of mediates information and energy transfer between the cells, intracellular contents, and then some extracellular environment. And then from my perspective, what I'm interested in studying personally is kind of zooming out and saying, how can you look at an entire system, let's say a school of fish or a flock of birds, as if the whole thing is a kind of internal system that's insulated from some kind of social or physical environment, where the interface here might not be as simple as your skin or a cell membrane, but something more embodied by other individuals. Maybe the sensory interface between a group and its surrounding are sensory or active individuals. And that's, you know, this is very catered to my own interests. But as you know, there's tons of people are trying to apply the same framework to all kinds of systems, even like socio-technical systems and societies and stuff. So it's a very generic framework. That's the fundamental kind of partition of the world that we're interested in with Bayesian mechanics. And what Bayesian mechanics is saying is that if you take these three sets of states, which I've now termed Mu internal, B blanket, that's the interface states, and then Eta, which is the external world. If you take that fundamental three way partition of the world, Bayesian mechanics is just saying that under certain conditions, it looks as if the internal states are encoding the external states. And the thing that allows that encoding to happen is a special function or a manifold that's called the synchronization map, which just maps information from the internal to the external. And the conditions for this happening is basically that information that gets to the internal world is basically conditionally independent of the external world when that conditioning is done on the interface. So the interface kind of statistically insulates the internal from the external. And with that at play, you can actually have the internal states look as if they're encoding the external states. So that's the gist of Bayesian mechanics, leaving out as much math as we possibly can. So then Bayesian mechanics research first asked the question, what are the necessary and sufficient conditions for that map to exist? So that's something that the earlier paper, Bayesian mechanics for stationary processes, that's what it explicitly explored, what are the mathematical conditions for us to be justified in saying, yeah, that map actually exists. And once we've answered this first aim of Bayesian mechanics, then we can try to actually look for hallmarks of Bayesian mechanics in real systems, biotic and abiotic alike. So those are the kind of two that the first thing is what is Bayesian mechanics, what allows it to exist. And then the second thing is saying, once we know those conditions, we can actually look for that in the real world. Okay, so to do this proper analysis of a system, to identify this three way partition internal blanket external, you need basically a probability distribution over the states of our system. So that's what's to know with this P, probability distribution that tells you for every setting of mu, b, and eta, internal blanket external, what's the probability of that particular configuration? So that's kind of a tall order, getting a full probability distribution for any system is not trivial. Yeah, for some systems it is trivial, but for most of the things we're interested in, we usually don't have access to that. So that's why a lot of the work in this field so far has not been really applied to actual empirical data because it's so theoretical at this point and very like in silico. So we're still at that stage where we're still figuring things out theoretically and we haven't moved to like empirical application yet, mainly because of these issues of finding what that probability distribution. So that's Bayesian mechanics and now to move from there to stochastic processes were motivated by the fact that in the case we need to find this probability distribution and the internal blanket and external states change over time, i.e. their dynamic, then we need to move to the realm of stochastic processes. The reason that is because stochastic differential equations are the mathematical construct that allow us to write down a probability distribution over variables that change over time. So a way to express that is using a stochastic differential equation. This is the basic kind of abstract formalism for understanding probability distributions over paths or over sequences of states over time. That's really what a stochastic process is, but we like to formally summarize that that probability distribution using this kind of differential equation. And all this differential equation is saying is that the rate of change of some state is a function of the state. So it's basically a deterministic function of the state where I'm going is a function of where I am plus some random noise. And the random noise is just basically there to incorporate the fact that I don't know my position with certainty over time. I have some sense that's given by that deterministic flow, but overall there's uncertainty in where I'll be next. So we capture all that uncertainty with this thing noise. And that's the basic structure of any stochastic process or stochastic differential equation. Okay, so now we're going to zoom in on to this flow component, which is the deterministic part of the stochastic process. So here after when I say flow or drift, we're referring to the deterministic part of this flow of this stochastic differential equation. We're going to take that and we're going to expand it to give it a new interpretation that will allow us to get back to our original goal, which is writing down a probability distribution so that we can do Bayesian mechanics. So the interpretation that allows us to make those moves is called the Helmholtz decomposition. And it's basically saying that any flow of stochastic differential equation, there are a few conditions on the flow that we can get into. They're kind of technical conditions. But if this flow meets those basic conditions, which are quite general, then you can decompose it into the sum of two parts. One is called the reversible part of the flow and the other one is called the irreversible part of the flow. And we'll get into now what that actually means. So the first one is called the reversible part of the flow and it can basically be seen as a gradient flow towards the maximum of the probability density of the whole system. So that probability distribution we talked about at the beginning, it has peaks and valleys, right? There's areas of higher density and lower density. This reversible part of the flow is just telling the system, bring me to the local maxima or to the peaks of that probability density. This is also referred to as dissipated or time reversible because of a system that's driven by this gradient flow as time reversibility or time symmetry. In the sense that if you run one version of the process forwards in time, it'll look the same ran backwards. And the important part here is that it's tightly related to the noise or the random fluctuations that we talked about from earlier. So the strength of the force that pushes you towards the maxima of your probability density is directly proportional to the amplitude or the variance of the noise. So that's a very interesting concept that might be a little counterintuitive at first. But basically those random fluctuations that we're adding to the flow in the Lange-Revan equation, how big those fluctuations are determines how quickly or with what force we get to the probability densities maxima. Which you might think that's a little weird. Why would more noise actually make us get to where we're going faster? Wouldn't that push us away from our path? But if you think about averages and think about the density of paths trying to get somewhere in a probability density, the more noise you have, actual more chances the system has to kind of explore state space and then reach its eventual average like steady state density faster. So that's one part is this gradient flow, this dissipative flow that's whose strength is proportional to the amplitude of random fluctuations. And then secondly, we have this more interesting thing which is called the conservative or time-irrevolvable component, also called the solenoidal flow. And this is mediated by this anti-symmetric operator called Q. And this basically doesn't point either in the direction of the minima or the maxima of the probability density of the system, but rather it points you along the iso-contours of equal probability of that density. So this is the kind of circuitous component of the flow. And it's very interesting from the perspective of thermodynamics because systems with this time irreversibility, they exhibit positive entry pre-production for instance, and they basically have conservative dynamics. So here's an example on the left of the stochastic process that's just driven by gradient flow with some noise on top of it. And you can see that it's kind of ascending or running parallel to the probability density. And then on the right we have a deterministic system that's just driven by solenoidal flow. And it kind of acts like a celestial body, like a planet that's orbiting some center of gravity. And I should say that the blue background on each plot is basically showing how high the probability density is of the states at each point in state space. So this is just a two-dimensional system. So the Helmholtz decomposition is saying that you can take these two components and together they form the full dynamic of any stochastic system. The full dynamic has components of trying to get to the maximum of the probability density. And then it also has components of kind of surfing the isocontors of probability density. And all of this is subject to random fluctuations that are kind of knocking it off the path. So you can get kind of rich dynamics with just this very simple decomposition of any systems flow. So this is another way that we can visualize these things in terms of vector fields. So on the left I'm showing the solenoidal flow, like the strength of that component at each point in state space. So x and y are just the two levels of some two-dimensional random variable. Here are the components of the dissipative flow. And you can see that they're parallel to the gradients that pull you towards or away from the center of the probability density. And then if you add these two things together, then you get the force that's driving the system at every point in state space, which is this combination of this kind of orbiting fluctuation as well as this thing that's pulling towards the center. And just a kind of technical note, the reason that the dissipative flow looks like it's pointing away from the center in the middle plot, but it's pointing to the center in the right plot is because there's a minus sign in front of the dissipative operator on the right plot. So that's an important thing that often confuses people. So now we've done this decomposition. We have this gradient flow. We have the solenoidal flow. And the thing that brings us back to Bayesian mechanics is this important point that I left out, which is that the gradient flow and the solenoidal flow, they don't just operate on the state of the system, but they operate on the gradient of the probability density, which is given by this kind of weird i slash j operator, which we can just call the suprisal. So the suprisal is defined as the negative log of the probability density. It's also known in physics as a scalar potential, and this just says how basically tells you how far away you are in some sense from the center of the probability density. It's saying, what is the local curvature right now? How steep is the probability landscape at the point I am right now? So this in turn, these are the things that scale the dissipative and the solenoidal operators, and this thing in turn can be formally related to the steady state distribution of the process. This probability density over mu, b, and eta that we were originally interested in. So because it's the negative log of that density, you can exponentiate its negative and you get yourself back to the density. And that's excluding something that's called a partition function. But that density is called the non-equilibrium steady state density. And the existence of that is kind of fundamental to the whole free energy principle Bayesian mechanics. If a system has this nest density, then the system can achieve Bayesian mechanics. And another important point is if q is non-zero, the system truly is non-equilibrium. If it is zero, if the q does not exist and there's only dissipative currents, the system is known as an equilibrium system and it just has an equilibrium steady state. So the q is basically what mediates all the things that we're interested in biological systems, like the breaking of detail balance, positive entropy production rate, and time irreversibility. That's all kind of built baked into the system with q. So the main point of all these slides on stochastic processes, Helmholtz decomposition, is just the fact that that deterministic flow component of the Langevin equation can be rewritten in terms of gradients of its log stationary density. So it looks as if any stochastic process is basically being dragged towards its stationary density subject to random noise and these solenoidal components that kind of drive it along the isocontours. So we can now kind of take the steps required to rewrite our full Langevin equation using this new interpretation. So first you have this flow operator, which is just a combination of the irreversible and reversible components. Then you can plug that into our expression for f, which is the flow. So it's the flow operator times the gradients of the log density, the negative log density, and then minus this extra term that I'll talk about in a few minutes. And then you can just combine all this together, plug it back into the Langevin equation, and now we have an expression for the Langevin equation, namely the rate of change of the state in terms of a gradient descent on this suprisal. So basically all this is saying is that a system looks like it's trying to minimize the suprisal of its states, i.e. find itself in parts of state space that are the least surprising or have the highest probability. So as I just mentioned, there's this extra term that we haven't talked about. It's kind of called this housekeeping or correction term. And this, I think, has come up in a few of the past livestreams, if I remember correctly. And this is going to be really important for the current paper. It kind of mediates all the interesting things that we see in things like stochastic chaos. So this is called what Karl calls in the paper the housekeeping term. And it only arises in the case that those solenoidal and dissipative matrices depend on the state of the system. So for these earlier papers, we were talking about like the Bayesian mechanics for stationary processes and Miguel's paper as well as the memory Markov-Blankets paper. Those three papers for the most part assume that these matrices will constant in the sense that they don't depend on the states. That's what makes a system a linear versus a nonlinear system. If these matrices are constant, then the system is linear. If they are not constant in the states, i.e. they depend on the part of state space you're in, then it's a nonlinear system and this housekeeping term is nonzero. It actually has some influence on the dynamics. So this is something that is figures into the current paper on the Lorenz attractor. The Lorenz attractor, which we'll get into later, is a nonlinear system. So this housekeeping term becomes nonzero. And if anyone's technically interested, that's just what it looks like. It's basically a type of matrix field divergence, which is a mouthful, but it's basically a sum of partial derivatives of the matrix field across different state variables. So given that, we can now, in the case of a nonlinear system, where the Q matrix actually depends on the state, now we're showing the solenoidal operator as a function of state space and you can see that now the solenoidal flow actually depends on which part of the XY plane you're in. Here denoted X1, X2. The dissipative operator I've kept constant just to be simple. And then you have additionally this housekeeping term that is also a function of state space and it's kind of actually counteracting in some sense the dissipative operator and it's a weird combination of kind of flows that are pulling you towards the center as well as away from the center. And then the full dynamic is going to be this combination between a housekeeping term, a solenoidal term and the dissipative term altogether. So these are the kind of systems that you're going to more likely encounter in the real world, so to speak, because most real world systems are nonlinear. They can't be described with nice Ornstein-Ulenbeck processes where the housekeeping term disappears. Okay, so that was our big detour into Helmholtz decomposition. It was a lot of math and theory, but it'll actually become concrete when we apply it to the Lorenz attractor. So now let's go back to our desired goal of getting that probability distribution. So we want a probability distribution of a mu internal states, blanket states, external states, so that we can do Bayesian mechanics. And now we're going to say that our system of interest X is just a 3D system, a 3 vector that's comprised of internal, blanket, external at every time. So we want to have a 3D probability distribution over those variables. So once we have that probability distribution, we can basically do what's called marginalize the density and find conditional density. So we can say, give me the conditional density over internal given blanket states, give me the conditional density over external given blanket states. Those two densities are theoretically the things that get mapped to each other through the synchronization map. So that's something I didn't really mention in the beginning with this theoretical kind of more semantic description of Bayesian mechanics. But technically Bayesian mechanics has to do with conditional densities. It says, tell me conditioned on some level of the blanket states. What is the most likely value of the internal states? And if you know that and you have a synchronization map, you can kind of guess the most likely value of the external states conditioned on blanket states. So the synchronization map is really mapping the two conditional densities to each other. So let's go to questions. Quick question. So at that point, can we do, because you've used phrases like zoom in and zoom out. And when you talk about decomposition, do we know where on the scalar that will this tell us where something is focused or versus something that's looking at horizons? Do you know what I'm saying? Like, can we tell what the field is based on this that the thing is trying to map? And when you say horizons, do you mean like temporal horizons or do you mean more spatial? I'm saying like, can we tell whether it's zooming in on a focal point or whether it's looking at a larger picture? Oh, I see. Time wise and map wise. So yeah, so the map technically is very focal in the sense that I haven't shown it in the lower right because I didn't want to get too technical. But the synchronization map technically maps the sufficient statistics of the densities together. So it maps a particular point on a statistical manifold. So when I say statistical manifold, I mean a space that is not spanned by the values of the system, but actually by the expected values or the averages of the system. So it says on average, the internal states will get mapped to the external states. So in that sense, it's kind of a point because you're saying I'm point I'm mapping from a most likely value or from an average to another average on the other side of the blanket. But it's so in that sense, you're kind of doing a dimensionality reduction because the map is not defined for any particular little fluctuation of the internal or the external like some particular instance. But it's actually really defined on this average level. So you're only going to be seeing the inference when you take averages across multiple noisy realizations. Honor, if I could ask one more question to kind of lock in the gains of this great section. Is it fair to say that the synchronization map is literally the internal states map of the external states territory? Yeah, exactly. Okay. The synchronization map will take any internal state and it will map it into a map like it'll map it into a space. And that space is a map of the external states territory. And it might not be correct, you know, it could be off, but it's basically taking any internal statements projecting it into a representation. And that representation's metric space, like when I move around in that represented space is supposed to mirror how external states are actually moving. And the key point of the synchronization map is the map and the territory will only be aligned on average. But for any given particular instance of the internal states, that guess or that place it gets mapped to on the on the map on the representation might be arbitrarily far from the true external state that's going on at that time. But on average, they'll be exactly on top of each other. And there's one comment in the chat from Miguel Aguilera. So I'm going to read it to you Connor and you can address it now or if it comes at a better time later than address it then. So Miguel writes, great introduction of the Helmholtz decomposition. How general is the housekeeping lambda term? Does it account for any nonlinear systems with higher order terms, for example, larger than quadratic? That's a great question. Yes, it totally can. So if we go back to it, where was it? I'll just address it now if that's okay with you Daniel. Yeah, yeah, so if you look at this housekeeping term, it's defined as this divergence of the matrix field. This is the sum of the partial derivatives of the flow operator. So now I'm talking about the last equation on the slide, the flow operator again is the q minus the gamma. So those partial derivatives, if they're higher order than quadratic, they will, they will have nonlinear terms, right? So if you're taking the partial derivative of something that has a third or a fourth order, then the partial derivative itself will be nonlinear. What that means is that the housekeeping term itself will be nonlinear in the states. So the housekeeping term is also a function of the states. And in this case, with higher order, more than quadratic terms, then you'll just get an appropriately nonlinear housekeeping term. So for instance, if we go back to this slide. The third column, the housekeeping term, because I made the q matrix, the solenoid matrix, a second order in the states, i.e. there's an interaction between x1 and x2, a multiplicative interaction, the housekeeping term becomes linear in one of the two states. So it's just a simple linear function of x1 and x2. If, however, we were to imagine that either q or gamma, gamma is the dissipative operator, I made it constant for now. But if one of those had a higher order term, like x to the third term, x1 squared times x2, then the housekeeping term would also have now a nonlinear term. And now housekeeping would be nonlinear in the state space. And there's nothing wrong with that. It would just mean the housekeeping term would be nonlinear and the dynamics would be even more complex. Yeah, that's a great question, Miguel. Thanks. And if I could ask some follow-up housekeeping questions. So one, can the housekeeping be non-differentiable? Or what if there's some function that doesn't have a nice derivative? And the second part is, what does the housekeeping term represent for a given example system? And how is it different than the regular Helmholtz decomposition? That's a good question. So the first question is about differentiability. So I mentioned in the beginning that for a system to have these nice things we were talking about, like to basically be applied to the Helmholtz decomposition to be applied to it, etc. One of the conditions is that the flow function that f of x is a smooth function of the states. So as it basically means it's differentiable everywhere. As long as the flow is differentiable, it implies that all the subcomponents are also differentiable. If my memory of calculus serves me well. So basically it means that q will be differentiable, which means that all its higher order derivatives are also differentiable. So basically the very premise of this decomposition assumes that things are differentiable. So that problem shouldn't arise. If there's non-differentiable flow function, then you're right, all these things will have discontinuities. And I don't think the solution or there are any guarantees that the decomposition even applies in that case. The second question, can you remind me with the second one? Oh, sorry. Yeah, the interpretation of the housekeeping term. That's a really good question. This is something we talked a lot about with Carl. When we discussed this paper at a T and B meeting, like how do you interpret this? Because the solenoid flow, I can imagine it's you're going along the isocontors, dissipative makes sense. You're going up and down the gradients of the density itself. But the housekeeping term is this weird thing where it has to do with the derivatives of the state-dependent solenoid and dissipative flow. I don't know how to interpret it, to be honest with you, in terms of like one of these nice kind of orthogonality. Because it's not an orthogonalization of the state space. It's a mixture of both state-dependent solenoid and dissipative terms. The one thing he did say that Carl said to us that will come to later is that the housekeeping term in the case of chaotic dynamics is actually what mediates the wandering, seemingly non-steady state-ish behavior of things like the Lorenz attractor. So the Lorenz attractor seems to kind of go on these itinerant wandering that are arbitrarily far from its steady state density. Whatever that steady state density is, because again, we don't know what it is. And basically the housekeeping term in this case can push the system away from the gradients of the log density and also away from the level sets of the isocontors of the density given by the solenoid flow. So an example of that is actually here, right? Where you see in kind of the main x, y planes, if you compare the red plot to the purple plot, the red arrows are actually pointing sometimes in the opposite arrows of the purple arrows. So one interpretation is that for sufficiently complex systems, sometimes they'll actually be quite strong components of the flow that are driving it away from the probability densities maxima. So this might actually, if you want to try to interpret this in the context of biological systems, systems sometimes seem to wander very far away from what in a greedy sense would be the best state to be, i.e. a local mode of the probability density. And this housekeeping term can sometimes drive the system into long wandering sojourns that are far away from those local maxima. And it's the strength of that housekeeping term versus the dissipative term that Carl argues is actually the Helmholtz decomposition take on chaotic behavior. So chaotic behavior is when things wander arbitrarily far, basically because the housekeeping term dominates and pushes things away from the modes of the density. That was the closest that I got to a kind of interpretation of the housekeeping term. But it's a really good question. I would be curious to hear what other people have to say about that. Okay, so yeah, so right, this is where we work. So we're talking about the synchronization map. That's where we want to get to. We want to get to being able to write down those conditional densities so that we can map them to each other and then imbue our system with some form of sentience or at least a sentient interpretation. So how do we get this density? So this brings us to the first major contribution of the paper, which is a new method for approximating a system where we don't have that density. So for instance, the system we're about to talk about the Lorenz attractor, we don't know what the probability density is. So we're going to use a new method proposing this paper to approximate that density. And the method used to approximate the density relies on a combination of polynomial regression as well as the Helmholtz decomposition. So that's why I just spent so much time going through the Helmholtz decomposition because this new method proposed by the paper for fitting the density relies on the Helmholtz decomposition, as well as basically some statistical inference techniques or polynomial regression. So now we're going to actually make things specific and dive into a particular system. So the Lorenz attractor, a phase-based portrait of it is shown in the top left, is a famous deterministic dynamical system that exhibits what's known as chaos. And again, I'm going to not go too into depth here to save time. And last time in one of the early streams, Daniel gave a very nice explanation of chaotic behavior, but it basically means when you very slightly alter the initial conditions of the system, the end resulting trajectory can be arbitrarily far in state space. So if I start the system at setting 0.1 versus 0.101, the resulting place that those two trajectories end up can be super far away from each other. So it's quantified most often by analyzing what's known as the eigenvalues of the Jacobian or the Lyapunov x-point, the sign of which will tell you how far trajectories on average should diverge, and that's kind of the hallmark of chaos. So that's the Lorenz attractor. It gives these nice kind of itinerant, wandering butterfly lobes when you look at its trajectories over time. An interesting property is that no one trajectory is the same for a slightly different initial condition. That's why we call it chaotic. What we study in this paper is exactly that deterministic chaotic system, but we just add noise onto its flow function. So that's what's represented with the equations here. So we can call it the stochastic Lorenz attractor. It's not been characterized super rigorously as much as the Lorenz attractor has been, so it's a kind of interesting experimental stochastic process to investigate here. So on the bottom right, we see the original stochastic Lorenz flows, which is a nonlinear deterministic dynamical system. That's that f of x. And what we're doing here is we're just perturbing the system at every time with a little random Gaussian fluctuation. That's what's represented by omega. And I've bolded everything here to represent that. We're now dealing with a vector space of states. So we're dealing with x1, x2, x3. It's not just a single variable, but it's a 3D system. Okay, so now we can get into the major moves of the actual paper. So the first approach is to say, okay, we know about this Helmholtz decomposition thing. Let's try to rewrite the Lorenz attractor as if it's doing a gradient ascent on a log probability density or a gradient ascent on a surprize. So an important thing to note here is the surprize, as well as the flow operator, the q and the gamma are state dependent. And as we know now from our discussion of the housekeeping term, we're going to get that extra lambda term because the system is state dependent. And you can just look at the flows for Lorenz attractor to see that very clearly. So, for instance, f of x is not just first order in the state, so that would mean it would be a linear system. But for instance, the flow of x2 depends on a multiplicative interaction between x1 and x3. And likewise, the flow of x3 depends on a multiplicative interaction between x1 and x2. That state dependence will be exactly recapitually in the fact that q and gamma will not be a constant function of the states. And because of that, we'll also get this housekeeping term, this big lambda on the side. So that's why the Lorenz attractor was chosen as opposed to an OU process. Exactly because it has this non-linear state dependence. So if we want to write down an OU process, and this is a decomposition that people like Miguel know very well by now, you can just write it down as this linear matrix function, which is just q, which is a matrix, minus gamma and other matrix multiplied by the gradients of a Gaussian probability density, a log Gaussian. Okay, so the first approach to tackle decomposing Lorenz attractor as if it's doing a gradient descent is basically fitting that dependency. So fitting q and gamma using polynomial expansions. So assume we didn't know the true Lorenz flows and we just want to basically fit. Okay, if I want to express the Lorenz flows in terms of a q and gamma, what would I do? The first approach is to, first of all, take our state space, which is x1, x2, x3, and expand it in terms of polynomial basis functions, which again is something I think was covered in earlier live stream. It's also known as a Taylor expansion where we're saying, I'm going to express each state of the system not as x1, x2, x3, but things like x1 squared, x1, x2, x1, x2, x3, all these interaction terms and higher order terms. And then I'm going to say, find me a function that maps from that higher order basis in terms of polynomial expansions to the coefficients of q and gamma respectively. And then I'm also going to model this suprisal function, which is the negative log of the density, as a function of the polynomial states. So the problem of polynomial regression is just finding the coefficients here q and h that need to be multiplied by these polynomial expansions in order to fit the observed flow as best possible. So this is a very common technique used in data science, right? Instead of just doing a linear regression between, let's say, housing prices and the size of the house, you add in higher order terms like the squared size of the house or the size of the house times the size of the pool or whatever. So it's like a very classic thing is to use polynomial expansions to get more expressive functions to explain some observed data. So we're doing the same thing here, except that the data we're trying to explain is the actual flow of the system. So what the first move of the paper is, is to actually do this fitting process and figure out that you can rewrite the deterministic Lorenz system as if it's doing a gradient descent on this suprisal function. But as we'll see later, it's actually not a proper suprisal. So we can just call it a potential function for now. And basically the solution is a second order function of the states. So there's a multiplicative term between x1 and x2 with coefficient h5 and then there's a linear term in x3. And then what this means is that if you do this for the deterministic system, so you actually assume the random fluctuations go to zero, then what you can do is rewrite the parameters of the Lorenz attractor, which are given by sigma, beta, and rho. These are the classic ways that people just parameterize the Lorenz attractor and certain settings of rho and beta lead to chaos and different kinds of attractors. You can rewrite them in terms of the coefficients of the Helmholtz decomposition. So that's the first major move is that this Helmholtz decomposition can be used to write down the system's flows if it's doing a gradient descent on a potential function. There's a problem with this though, which is that if we look at that potential, it's actually not a proper probability distribution. So this isn't really Bayesian mechanics, this first move. All it's saying is that we can force the Helmholtz decomposition to explain the flow function of the Lorenz attractor. But the resulting suprisal function we're left with is actually not a proper probability distribution. And if you just analyze that equation and try to interpret it like a probability function, it won't actually work. So it has to do with whether the gradients are positive definite, essentially. So that motivates the next move, which is where we don't try to approximate the Lorenz attractor exactly, but we try to approximate what Carl refers to as a Laplace approximation to the steady state density. So here, instead of saying, okay, let's rewrite the Lorenz attractor as Helmholtz decomposed, let's assume that the Lorenz attractor is dragged towards a steady state that's Gaussian, so the nest is a multivariate Gaussian. And if we force that assumption on and then use our polynomial regression, what is the kind of resulting coefficients that we'll get? Because now we're forcing the system to look as if it's doing a gradient descent on a proper suprisal function or a gradient ascent on a proper probability distribution. So if you do that constraint and then you fit the Helmholtz decomposition, so again, we're fitting the coefficients of the flow operator Q and gamma in terms of the states using polynomial expansions of the states and we're constraining the suprisal function to be quadratic, i.e. the steady state is Gaussian. Then we basically create not a perfect recapitulation of the stochastic Lorenz system, but something that looks like the Lorenz system with this weird constraint that it has to be occupying on average a Gaussian steady state, which is not necessarily true of the real Lorenz system. That's an interesting move here. And then what you get is again what is referred to as the Laplace approximation to the stochastic Lorenz system, which can be validated in terms of how close it is to the real Lorenz system using things like the Hausdorff dimension, which is a quantification of the fractal dimensionality of the system. So it's very well known that the Lorenz attractor basically lives on a fractal submanifold of its ambient state space, which is quantified by a fractal Hausdorff dimension or a non-integer Hausdorff dimension. And that's just a function again of the eigenvalues of the Jacobian of the flow. And then he also validates the approximation quality by just looking at how correlated the approximate flow is using this polynomial method with the true flow, which is on the x-axis of that correlation plot. And then on the right hand we have some example trajectories of this Laplace approximated Lorenz system. So this is a very interesting kind of plot because it's showing that even though the underlying steady state density is a multivariate Gaussian, you can see that the actual trajectories of the system don't necessarily look like they're just neatly sitting or wandering around that Gaussian steady state. Again, because of the solenoidal flow and because of that housekeeping term, which is very large in this case, the trajectories of the system look as if they're making very itinerant wandering trajectories. But on average, if you were to take a very infinite time observation of the system, the system would still look as if it sits on a Gaussian steady state. So that's like the first, I would say, major move of the paper is doing this Laplace approximated Gaussian version of the Lorenz attractor, the stochastic Lorenz attractor, which anecdotally has some features of the Lorenz attractor, like it has these two lobes and all of these quantifications kind of make it correlated, but it's not exactly the Lorenz attractor because we're putting constraints on it. That was a lot of information. Are there any comments or questions so far? Okay, so I'll just keep going. So importantly, now that we've rewritten the nest density as if it's Gaussian, we can use it to read off conditional independent structure, i.e., the Markov blanket. And the reason that is, is obvious basically from earlier papers like the Bayesian mechanics for stationary processes and Miguel's paper also on linear diffusion systems where basically you can just look at the Hessian or the inverse of the covariance matrix and you can use that to read off conditional independence, i.e., to read off the Markov blanket. And that is that simple relationship between the inverse covariance and the Markov blanket only obtains because of the Gaussian nature of the steady state density. So that's why the Laplace approximation works here or is a nice move to make because it means we can then write down or read off the Markov blanket very easily. And so the lower right part is showing that the Hessian and the covariance matrix and you can, so the white here means very low values. So what we immediately see is that there's basically a Markov blanket between the first and the third states conditioned on the second states. Or actually, sorry, rather the first and second states are both conditioning independent of the third state. So there's kind of a Markov blanket between those two and the third state. This one isn't as nice as the coupled system as we'll see further because it's actually not clear what the blanket states are here. This more seems just like marginal lack of correlation. So even though the third and second states are also kind of conditioning independent, you can still treat the second state as a blanket state and you can basically parameterize the density over the first state given the second state. So here we're writing down a synchronization map between the first and third states and this is symmetric. So first can infer third or third can infer first, but the conditioning, the blanket state is the second state. So you can kind of see that if I just take the instantaneous value of the third state, it's in some way performing a guess or making an estimate about the most likely value of the first state and this relationship goes vice versa. And the only reason we can write down that function, that maps between the first and the third, is again, because we have an expression for the covariance matrix, which follows from the Laplace approximation or the Gaussian approximation to this stochastic Lorentz tractor. So once that's been done, the Laplace approximation, then the next move is kind of, it's not a full analysis, but it's showing that you can extend this to non-Gaussian steady states and kind of make higher order approximations, which might be called density learning. So in a case when you don't know, you don't have a parametric form for the steady state density, you can parametrize the surprising function and the Helmholtz decomposition in such a way that you know that the resulting density is a proper probability distribution, but you don't necessarily know what its neat parametric form is. So the way he does this is basically by parametrizing the quadratic or the potential as quadratic, but the covariance or the Hessian of that quadratic potential is state dependent. So it's almost like you're fitting locally linear synchronization maps to different parts in the state space or you're fitting locally Gaussian approximations. So the way he does this is essentially I have a efficient parameterization of the covariance matrix of my steady state, but that covariance is actually changing as a function of state space, whereas in Gaussians, the covariance by definition is constant. By doing this, you can basically achieve a nicer approximation to the true stochastic Lorenz system where all the conditional densities are still Gaussian, but the full multivariate density is not necessarily Gaussian. So that's basically what's shown in the right plot here is this higher order learned stochastic Lorenz system. But it's not clear in this case what the synchronization map would be because the synchronization map is not just going to be a simple function of the Hessian because now the Hessian is changing over state space. Honor, could I ask a question here? Yeah, please. So from the chat, Martin Beale has asked, is this right? There are two approximations here. One of the dynamics and one of the stationary state. Yes, I would say that the approximations are intrinsically tied together though because what you're doing here is you're approximating the potential function and then that approximation directly factors into the approximation of the flow. So you're fitting in this case these kernel matrices, these K matrices as being state dependent. So the kernel matrix is basically a Cholesky factor of the covariance matrix or really of the Hessian here, the precision matrix and the gradients of that will then factor into the parameterization of the flow because the flow is defined in terms of the Helmholtz decomposition which is Q minus gamma times the gradients of the potential. So however you fit the stationary density those coefficients that you fit will also factor into the parameterization of the flow. So in this case in the upper left I have the parameterization of the density, this quadratic function. So the gradients of that with respect to the states it'll essentially look like K transpose K times X minus mu or basically a precision weighted prediction error where K transpose K is the precision and then X minus mu is the deviation of X from some mean vector. So you're doing all that simultaneously you're doing both the parameterization of the steady state as well as the parameterization of the flow you're doing it all just by trying to fit the flow with these coefficients but for free some of those coefficients you get out you can use them to write down a potential function which can be converted to a stationary density. Does that make sense? Yes and just on this slide what is PD? Oh sorry positive definite. Yeah that's a good point. So yeah because these kernel matrices are symmetric these K transpose times K they're symmetric they're state dependent but they are always symmetric it means that their product will always be positive definite which means that the stationary density evaluated different points in state space will always be locally convex i.e. there's going to be some local mode but since that PD matrix is changing over state space you're going to basically have a multimodal density but locally it'll be unimodal that's another way of expressing everything on the left side. Okay so here's the more interesting part where we can really start talking about like the interesting Bayesian mechanical inferential interpretations where now we have two coupled Lorenz system so again it's a stochastic Lorenz system but since we have two of them that are talking to each other it's really a six-dimensional system so three of one Lorenz attractor, three of the other and crucially we're coupling them together through the first and fourth states so the fourth state of the system is really like the first state of the second Lorenz attractor if that makes sense so basically we're saying that X1 the first state of one Lorenz attractor is talking to X4 which is the first state of a second Lorenz attractor and so just by coupling them with this one state we're mediating kind of the flow of information between these two Lorenz attractors and now the next move is to go back to the Laplace approximation where we're enforcing the surprising function to be quadratic, a quadratic potential and now we're applying it to this six-dimensional stochastic Lorenz system so now we're fitting this whole coupled system with a 6D multivariate Gaussian and because of that Gaussian constraint we'll be able to read off the synchronization map from the entries of the Hessian the inverse covariance measures so on the right what we have are just example trajectories from the coupled Lorenz systems so this is what would call evidence of a synchronization manifold in a less rigorous sense although we'll get rigorous with it later but in a more anecdotal sense you can see that the systems are basically synchronized with each other even though they're only connecting through one state they're kind of surfing along similar submanifolds of their state space they're kind of hugging each other and again these other metrics like the very bottom is just to show that the approximated flow using this Laplace approximation plus polynomial expansions is qualitatively at least positively correlated with the true flows of the full nonlinear system that has no Gaussian approximation and then the Bayesian mechanics interpretation comes in next where now we again read off the Markov blanket that exists between two of the Lorenz attractors so here the upper right is what I'm focusing on now here the second two states state two and state three are the internal states of one Lorenz attractor whereas states five and six that dark blue box are the internal states of the second Lorenz attractor and they both share a pair of blanket states which are x1 and x4 so the synchronization map is a map that maps from x2 and x3 to x5 and x6 where the mapping is done on the conditional densities where the conditioning is done on the sensory state of either the first system or the second system which is respectively the active state of the other so that's a little bit confusing is basically the purple sensory state of the first system is actually the active state of the second system and vice versa but the most bottom plot is basically showing that at any given time the internal states of one system are making a best guess or a conditional estimate about the external state which are the internal states of the second Lorenz attractor so you basically have two mines little two dimensional mines that are making inferences about each other but the inferences are done with respect to these conditional densities where the conditioning is done on the sensory states that's basically what all these plots on the lower right are showing so that's the demonstration of this inferential or sentience interpretation and then because as we said the full system is multivariate Gaussian it means all the conditional densities are also Gaussian and then you can literally write down the synchronization map as a linear function of the states where that linear function is given by appropriate sub matrices of the full Hessian which is the inverse covariance and again because we're fitting the potential function using a polynomial expansion it means that the Hessian function which is the double derivative of the probability density with respect to the states that will also be a function of those polynomial coefficients so the end result of all that is basically that given this polynomially approximated Helmholtz decomposition I can then write down the synchronization map precisely in terms of those coefficients that I fit and then you can compute all these nice things like a free energy function which is basically a sum of accuracy or prediction errors and complexity which is effectively a prior penalty that penalizes how far the inferred expected state is from its expectation under the full steady state density so this is a nice this is a crucial result because it's saying even if you don't know what the true steady state of the system is you can use polynomial expansions and a Laplace approximation to get something that still has rich nonlinear dynamics but by enforcing this Gaussian constraint on it you can then write down the synchronization map using linear functions and the form of that function you know because you fit its coefficients by doing this Helmholtz decomposition okay so that's basically the main brunt of the paper Can we go back and ask a few questions? Sorry, I have one question and there's two in the chat so the first question is in this free energy equation on the bottom pi is representing the particular states but also we've seen free energy minimization on pi representing policy where free energy minimization is involved in the selection of affordances for policy selection so what's the relationship between free energy minimization of particular states like the autonomous states and how is that related to free energy minimization as planning as inference? Yeah that's a great question so this has nothing to do with the policy one it's just about pi which is basically just saying the expected value of the particular states which I should have mentioned are the autonomous and the blanket states or the autonomous and the sensory states so in this case the particular states would be the two internal the sensory and the active state of one of those Lorenz attractors where its active state is actually the sensory state of the other Lorenz attractor and they have one overlap in terms of their blanket so this here is just saying that on average the system is instantaneously minimizing at all times this free energy functional which is just saying is penalizing particular states from how far they are from their conditional expected value that's the expectation of pi given EDA where the expectation is taken under Q which is this variational density that's prioritized by the synchronization mapped by the functional states the expected free energy which is what you're talking about is how do you write down a functional of trajectories or paths of future autonomous states that is minimized such that you get the semblance of planning behavior where you're penalizing not something in terms of how far instantaneously is but over future horizon how much is the free energy expected to deviate from the optimal policy or path which is going to be a sequence of autonomous states that is a more complex question that has to basically do with yeah free energy basically minimization of actions which are under certain assumptions sums of surprises over time and that's something that is not treated at all in this paper but it is explicitly treated in this paper that I believe is in review right now where we explicitly make the connection between the instantaneous free energy minimization like we see here and then this more temporally deep expected free energy minimization but basically for the present purposes of this paper there's no relationship and nothing can really be said here about planning or temporally deep free energy minimization. Okay let's keep it on this slide and I'll ask a few more questions in chat from Martin. Is the actual steady state distribution of the Lorenz attractor known? No basically yeah. Okay. I think there have been other there's been other papers where they're not using the stochastic Lorenz system but what they're using is the distributions of a bunch of Lorenz attractors so this is the difference between I guess stochastic differential equations and random dynamical systems where I'm randomizing a bunch of initial conditions and then I'm looking at the resulting densities of the deterministic systems after a bunch of time has passed so that's a different kind of density it's a density over the deterministic ending states or deterministic trajectories of the system from a bunch of initial conditions and I think there has been on characterizing those densities but again I doubt that those densities are parametrically defined I bet they're more like approximations like what we did here but this is something that honestly I'm not an expert on the Lorenz attractor so that's something that I should probably just defer that I don't know much about it but I'm pretty sure the stochastic Lorenz attractor doesn't have a neat solution to the Fokker Planck equation which is what you need to do to get the stationary density. Alright. I'm going to ask a pair of questions from Martin and from Miguel so it's awesome thanks both and everybody in the chat because it's cool to have this kind of real-time feedback so the first question is from Martin in what way does the Markov Blanket emerge here is it not here at some point and then later it is so the first question from Martin is what do you mean by emergence and then the second question by Miguel is it is impressive that you can find a Markov Blanket in a non-linear system have you quantified how precise are the approximations of the covariance and Hessian values e.g. comparing against numerical simulations so question one in what way does the Markov Blanket emerge question two how precise are the approximation of the covariance and Hessian yeah okay so that's a good question first question about the emergence the Markov Blankets is defined with respect to the stationary density the conditional independence relationships expressed in those two upper right panels log covariance and log Hessian so those things do temporarily vary in the sense that if you condition on some initial state the time varying densities will will have different log covariance and log Hessian matrices this idea of conditioning on an initial state and then looking at the time evolving densities that's exactly the whole point of this paper by Thomas and a few other of us called memory and Markov Blankets where the Markov Blanket can change in the sense that if you look at the evolution of conditional densities condition on some perturbation those conditional densities do change but if we just look at the stationary density condition the stationary density doesn't change its sufficient statistics over time because it isn't defined with respect to time you've marginalized out sequences of variables so in this sense Markov Blanket if you don't condition on a particular state and you're just looking at the unconditional stationary density then it's just always there doesn't emerge and maybe I said emerge and I didn't mean that in a temporal sense so there's no sense in which the Markov Blanket is just a property of the stationary covariance matrix or more specifically its inverse and that is a constant value that is just a characteristic of the full six-dimensional multivariate Gaussian so even if we didn't even think about stochastic processes if I just wrote down a six-dimensional multivariate Gaussian with these particular entries of the covariance matrix that is the constant Markov Blanket that just always exists the second question was about the precision of the covariance and Hessian numerical simulations that's a really good question and I'm assuming was that from one of the comments that was from Miguel yes so I'm assuming you mean the overlap between the empirically measured covariance in the Laplace sorry in the real stochastic Lorenz system and that one measured or in that one estimated in the in like so the one estimated from the real system and then the analytic Hessian from the Laplace approximates stochastic Lorenz so I have I have the paper up here that I can show that's a really good question so one of the reviewers actually asked us something very similar and we have an extra figure that demonstrates exactly that can you still see my screen yep looks good okay perfect so here we're essentially doing exactly that comparison where we're looking at the partial correlation which in the case of Gaussian is actually the same thing as the Hessian which is an interesting or sorry it's not the same thing as the Hessian the partial correlation is the same is at the same measure of conditional independence it's not the case for non-Galaxians it's a case for a few classes of exponential distributions that if you measure the partial correlation which is a kind of conditional correlation between two variables it's the same as the corresponding value of the Hessian so what we're doing here is exactly that we're basically measuring how well the empirically measured partial correlation between different subsets of states so here the third and the sixth states which should be these are the two internal states of two mini Lorenz attractors within the whole system so this is like internal state of brain one internal state of brain two how their partial correlation actually measures up with the true Hessian value and it kind of converges to about very close to zero which agrees with the analytically measured Hessian so you can basically quantify this more coarsely by just looking at the structure of the Hessian matches the empirically measured partial correlation and the partial correlation was basically measured for increasing segments of time so the partial correlation measured here was from the first 50 time steps and then up to years the cumulative partial correlation from the first until the 500th time step and so on so basically by looking at the partial correlation from increasingly long windows of time it should converge to the true value of the conditional independence and it approximates that which we get from the Laplace approximated system what I should say however is that this partial correlation was not measured from the full Lorenz attractor the stochastic one but actually just from our approximation so we're just saying that if we make this approximation and then basically measure the partial correlation from real from realizations of that approximated system the empirical measurements line up with what it should be even the way that we parameterize the flow using the polynomial expansion however what we haven't done but I think it would be very interesting and maybe this is what was being asked in the question is actually to measure this from the real stochastic Lorenz to see if there actually is a Markov blanket in the real Lorenz system the only issue with that is we can't necessarily use partial correlation if we did it in the real stochastic Lorenz system because as we were just discussing like Martin asked do we know what the stationary density of the real Lorenz system is the stochastic one if we don't know that we would have to use a less parametric method like neutral information conditional information in order to assess the presence or absence of conditional independence so it would be harder to do just because evaluating massive conditional information mutual information for increasing amounts of time is computationally difficult but I agree that that is something that should be done but no we didn't do do it for the real stochastic Lorenz we just did it between our approximation and our analytic Hessean I hope that answers the question and that was clear I can we can keep talking if there's any lack of clarity on that I think continue with the presentation and if anyone has more questions at the end of the presentation we'll return to them okay cool yeah but that was a very good point is actually measuring the Markov blanket like empirically okay so that I'm almost done with the overview of the paper so the last big thing that's kind of tucked into the one of the results section is something called the sparse coupling conjecture which is important because it dissociates conditional independence from causality so conditional independence as we said in the context of a Gaussian actually no it's not even for Gaussian systems for all systems conditional independence is determined by a zero entry in the Hessean so if the Hessian between two states is zero which means the conditional densities don't change as you change like the conditional density over one variable doesn't change as a function of the other then it implies this relationship which is shown in terms of the Jacobean on the left the Jacobean basically measures causal coupling we can go into what it means mathematically but you can just think of the Jacobean the Jacobean is the derivative of the flow function with respect to the states and entries in the Jacobean that are non-zero mean that one state is influencing another so what this relationship shows is that if at most one state influences the other but not otherwise like there's a non-recipical coupling between one state and the other then the two states are conditionally independent i.e. there's a Markov blanket that's a really weird counterintuitive statement it means that if I'm affecting you if you're not affecting me then we will be conditionally independent in the steady state and this is what's called the sparse coupling conjecture and it's the way we originally wrote it it was actually much stronger of a claim and then we realized there's actually some conditions on it so it's called a conjecture because it doesn't always hold but there's certain conditions under which it should hold given a high dimensional enough and non-linear enough system to kind of prove that more mathematically by saying if you expand the Jacobean by basically taking the derivative of the Helmholtz decomposition so that's what's shown in the middle thing and then you assume that certain entries of the Jacobean are zero then you can basically show that it implies or sorry not certain entries of the Jacobean but certain products of entries in the Jacobean are zero that implies that there's an accompanying zero entry in the Hessian which is shown on the bottom so what's shown on the bottom more specifically is what's known as the normal form of the flow or kind of a circular coupling where if you have this particular structure of the flow operator where certain block or certain sub matrices of the flow operator are zero then you're guaranteed to have this kind of sparse coupling which means that one state can influence the other and then you'll get conditional independence in the steady state so that's another big result that I think during the earlier live stream when we talked with Lance we had an example in his paper where that was also the case where basically two systems were one state was affecting the other but they're actually conditionally independent that's basically we rigorously showed the conditions under which that exists here so just because you see a zero in the Hessian that means don't assume that the systems can't affect each other two systems are conditionally independent doesn't necessarily mean that they don't affect each other through having a non-zero entry in the Jacobean but it just means that those connections have to be asymmetric or non-reciprocal so this is an example physically of what that would look like say we had some system with external sensory and active states under the normal form of the flow which is this special block diagonal flow operator what this essentially means is that you can get this sparse coupling conjecture his sparse coupling conjecture holds as long as sensory states do not affect internal states and active states do not affect external states directly but only through the sensory states so that's a particular differentiation of the normal form of the flow where you have these basically particular entries of the Jacobean and the solenoid flow operator are zero and then you'll get this kind of sparse coupling or circular coupling and one question on that and then Dean if you'd like to ask any questions and this is also related to the work of Miguel the paper that was brought up earlier the topology of perception cognition and action is it just sense influencing internal and internal influencing active and active influencing external which then feedback to sensory sort of the simple clock model or here we have a edge between sense and active states the blanket states and so in free energy principle and active inference are we committed to a certain topology of action that's a really good question given the sparse coupling conjecture and the normal form of the flow so basically no we're not committed to anything there's so many different combinations that will give rise to a conditional synchronization manifold that can be interpreted as Bayesian mechanics what all we're saying is that this coupling structure we have here is mainly defined by the lack of certain connections so all that we're saying is if external states do not affect active states and sensory states or sorry internal states do not directly affect sensory states so it's the lack of those two connections can you see my mouse actually yes yes so if you don't have a connection going from blue to red and you don't have one going from blue to sensory but anything else is allowed like all I'm saying is putting constraints on what cannot exist then the conditions laid out here will hold and the sparse coupling conjecture will exist which means that an asymmetric relationship implies conditional independence so given that what it means is that actually sensory states will be conditionally independent of internal states conditioned on active states because there's only a unidirectional coupling so those are some weird results from this is that sensory and internal will be conditionally independent because one affects the other but the other does not affect the first one and likewise for active and external states but all that being aside external and internal will still be conditionally independent and therefore there will still be a synchronization up between them so the conditions about the topology that are laid out the absence of these connections just have to do with satisfying the normal form of the flow as laid out here and satisfying this weird sparse coupling conjecture about unidirectional non-reciprocal coupling but other than that the active inference story and the raising mechanic story you can swap around all these influences and it will still pertain as long as this guy is conditionally independent of this one so if these two are conditionally independent dependent on the blanket states then that synchronization map still exists but you can have all kinds of weird coupling loops between the two thanks very interesting Dean do you want to ask anything here or well there's so many things I could ask probably and I'd get a great answer back and I'd only be able to use a small percentage of the great answer one thing I do want to I will ask this and I'm not sure if it's if it's good or not but I've been doing a lot of work and trying to see whether or not the active sensory state is a form of kind of decomposition of where a concept might rest between things like content and context and I'm just wondering when you do this math and you get a sense of knowing what you're working with and the relationship between all of the different components of what you're working with do you see in this a lot instead of things always assuming that things just sort of pop out as an emergency do you start getting a sense that sometimes things just exist within in between and those constraints and some of the things you talked in terms of linearity and just getting an understanding of what you're working with do you see things like concepts sort of resting in between these other areas we call them in this diagram external and internal states but would that be that sort of fluid and appearing and disappearing based on some of the things that you've looked at here in this paper yeah I definitely think so I mean for me when I start thinking about concepts I immediately start thinking about relationships right like what we've shown here are very low dimensional systems where you have a few internal a few active a few sensory a few external once you start having larger higher dimensional systems so say we weren't modeling a Lorenz a two like kind of dumb Lorenz attractors just oscillating around each other but say we had some 100 dimensional neural network synchronized with another 100 dimensional external state space like another neural network once you're getting into that world I think you get the you can have the computational representational capacity for things like concepts because you just have a higher correlational structure so imagine that we had 100 internal states the internal states we know in some way are relating to the external states but they're also relating to each other so there might be like subgroups like groups of 10 or 20 neurons that are all correlated in their firing to each other there might be rich recurrent internal dynamics just within the internal states and the particular relationship of submanifolds so to speak of the internal state space how those conditionally relate to external states that's where I think you start getting into the world of concepts where you can have under a certain part of the external state space this part of the internal manifold gets more or less activated versus when that part of the external space lights up this different part of the internal space gets lit up so I think the key to some of the concepts really relies on having higher dimensional systems which then allow you to have kind of more rich lower dimensional correlation structure that kind of exists in the ambient dimension of like a hundred dimensional state space that would be that's how I at least interpret your description of like the concepts kind of existing in between it really rests in lower dimensional manifolds of ambient high dimensionality so yeah that's like we're pretty much done with the thing that this is just a summary of the contributions I would say like yeah how else decomposition fitting it to a stochastic dynamical system where we don't know the steady state distribution and then going through the steps with this Laplace approximated one studying its synchronization manifold doing this cool couple of Lorenz thing and then finally this large coupling conjecture about circular coupling and yeah you can move to the discussion now or go back dive into the slides or dive into different parts of the paper that you can talk about here whatever you guys think best great maybe you could unshare and we'll have some time just to talk and see if anyone else has any questions in the chat so just one starting question from Martin was the question above was whether it is even known that the Lorenz attractor has a stationary state did we address that earlier oh sorry yeah we addressed it like my initial answer that was I'm pretty sure it's not known but I'm not positive like I'm not an expert on the stochastic Lorenz structure but I'm pretty sure it's not known a unique solution to the Fokker Planck operator would be very difficult for that which I imagine is difficult for any system that exhibits stochastic chaos okay and another question from Martin is whether the sparse coupling conjecture requires a Gaussian as the stationary distribution no it doesn't the requirements on it are not about the Gaussianity of the density they're more actually so for this new paper that's in review now there's an appendix in that paper that flashes out the conditions on the sparse coupling conjecture more rigorously and the conditions are not about the form of the stationary density they're actually more about basically how local the coupling is in the solenoidal term so if the solenoidal terms are state dependent like solenoidal term i, j depends on the states then the sparse coupling conjecture only holds if that dependence is only a function of states i and j but if the coupling between states i and j is actually a state dependent function of state k then the sparse coupling conjecture can be more easily violated and I have a proof of that that I can share my screen and show you but it's not really related to your question which was more about is the Gaussianity related to the sparse coupling so yeah it's not what you described there reminds me a lot of neuro modulation and about how the effect of connectivity of two regions could be modulated so here's a question from Miguel and then we can go to Dean and then I have a question as well so from Miguel in non-linear systems high order correlations might arise I think this means the Hessian may not be enough to assess conditional independence do you see a way to explore that with this method um my yeah that's a great question so my intuition is that if you use this method that we talked about in the second part where it's like high order density learning if you are basically basically the high order density learning where you're not assuming a Laplace approximation but you're enforcing a state space local quadratic approximation to the potential function that means that you have a Hessian that's changing over state space what your point is basically saying is that because of higher order correlations there will probably be parts of state space even if for like three quarters of state space the Hessian is zero there's a final quarter where the Hessian will not be zero in that part of state space and I think what that allows you to do is basically say that insofar as the Hessian does locally approximate the Markov blanket between two states it means that there's regions of state space where the Markov blanket condition applies and then other regions where it doesn't and the Markov blanket disappears the claims in this paper about the conditional independence relationships especially as they relate to sparse coupling only apply to Hessians that even if they're state dependent Hessians which implies a non-Gaussian stationary distribution that the entry of the Hessian is zero for every part in state space so I don't know if the sparse coupling conjecture for instance applies where there's a state dependent Hessian entry that's not zero in some parts of state space and non-zero in other parts I think that will throw a wrench in this formulation potentially but it would be interesting to say if you can still interpret the entry of the Hessian as being a on-off signal of there's a Markov blanket between these two things what does it mean if that Markov blanket is changing as a function of where you are in state space does it mean that sometimes there's a synchronization map between these two states depending if they're visiting this region of state space versus that or maybe that actually doesn't make sense so I don't know I think it can certainly be used to explore those questions but offhand I don't have an easy answer I think that's a really good question cool thank you Dean if you'd like to throw one out yeah I'm just curious so we were just doing a recent paper and it was talking about sort of how did you get here how did you come to this place and I would like to ask Conor I don't know the history of how housekeeping was introduced to this formalism so I was just wondering if maybe you could walk us through that I don't think you sort of put a notice up on the door and called housekeeping it just seemed to insert itself so maybe you could tell us how that how you now when it was introduced not assumed right how you kind of dealt with that yeah yeah it's a really good question so basically well okay yeah it's interesting the whole point so from my understanding the evolution of this paper came into the beginnings of this paper was there was a meeting that I presented at Carl's lab meeting in 20 must have been late 2020 where I was trying to do this kind of base mechanical analysis on a very high nonlinear system and I was trying to model basically schools of fish can you look at certain groups of a big school of fish and interpret their states as if they're doing inference about something happening on the other side of the school and so at the end we had a bunch of conversations I think we had three meetings total where we just went through the mathematics of how that's possible and it drove Carl to say okay what we're going to do is we're going to write down up we're going to write a paper where we deal with a system like that where things are very state dependent and nonlinear in state space so that forced Carl basically to go back to the drawing board with the Helmholtz decomposition and as a result of all the mathematical formulation here which is like 95% just Carl like writing that down and figuring things out he realized that there's this extra term this housekeeping term that pops out of the Helmholtz decomposition so my assumption is he named it housekeeping term because it was basically this unintended side effect of trying to do the Helmholtz decomposition on systems that are highly nonlinear and so the housekeeping term came out as like oh I need to have this thing to make sure everything is neat and in order because if you actually look at the mathematical derivation of where that term comes from I'm not sure Carl was familiar with it from the literature or if he just derived it and kind of thought of it himself I'm actually really not sure on his own intellectual journey to getting there but it is in the stochastic processes literature it's not like a totally new thing it's been known for like decades now but I think from his perspective he labeled it housekeeping term because it was this new thing that pops out that he has to account for that keeps the full Falker Planck solution like in order so you basically need to add that on to make sure that certain terms don't be like remain zero when you're trying to solve the Falker Planck solution it has to do something called the probability current to make sure the divergence of the probability current to zero need to add in this housekeeping term so my interpretation of the word housekeeping is very analogous to the word correction like it's a correction term or a cancellation term but that was a vocabulary that Carl just started using in the paper awesome.4 32.4 it's never too late yeah you briefly touched on where I was going to go with this next question which was about collective behavior so just how are you thinking about these kinds of formalisms in collective behavioral systems are we thinking about multi-agent simulations of active inference agents like we had done with ants or is there another level of analysis where the collective is sort of this shifting entity in and of itself like where does this apply to collective behavior yeah that's a great question so I can just personally speak to that because that's like essentially the thing I'm doing during my Ph.D. so one half of my Ph.D. or I'd say one third of my Ph.D. is doing the first thing which is very similar to what you've done with active inference where I'm just taking a bunch of active inference agents and I'm making them school together I first tried doing that and like that's a separate story that like maybe we can have another live stream and talk about that because that's a cool thing but that has nothing to do with writing down the stationary density of the entire system I tried to do that at first but it's very hard because every active inference agent is a mixture between S.D.Es and O.D.Es and all the things we talked about today were just S.D.Es stochastic systems whereas when you're doing a single active inference agent like doing predictive coding its neural dynamics are actually solved as an O.D.E which is a straight deterministic gradient descent on variational free energy so in a big collective dynamical system that has some O.D.Es and some S.D.Es writing down the stationary density becomes very difficult so because of that we had all these meetings and this paper was one of the results of those meetings so the approach I'm taking now is I'm not modeling individuals as active inference agents I'm just modeling each of them for instance the 2D position of a particle that has some velocity and some position and then there's classic schooling models like the V-check model or the Reynolds model or the Cousin model which was identified by my advisor where you're basically just trying to model schooling fish or flocking birds as a bunch of little stochastic differential equations that are coupled to each other so the approach I'm taking now is applying exactly the formalism described today to higher dimensional systems and you encounter special challenges when you do that because trying to fit a polynomial expansion with all the higher order interaction terms to like a hundred dimensional system it's just not going to be feasible if you're including every possible interaction it's just like you're going to have a combinatorially explosive state space so I'm basically a lot of the work I'm doing now is figuring out ways to fit the Helmholtz decomposition using non polynomial expansion methods that are more amenable to higher dimensional data where you don't have this cursive dimensionality and so a lot of that involves what you could probably guess using a more ties to approximations like deep neural networks to fit the coefficient for the Helmholtz so that's something I'm like literally working on right now so there's not too many results on there but that's like that's one of the two directions I'm going there's the multi agent active inference and then there's the high dimensional Helmholtz decomposition stuff like I'd imagine that nested cognitive systems with shifting boundaries and functional relationships that's challenging for any framework but at least we're quite rapidly approaching the ability to have the grammar and the motifs to actually talk about those questions and then also see them from multiple perspectives either as cognitive agents interacting in a way where we don't even need to have the top level be anything specific or more from a top down perspective which as you pointed out it's hard to then introduce the cognitive elements into those particles yeah exactly yeah I mean just if I could just follow up on that I think the ultimate goal would be to bridge both right like it would be sick to write down a big multi agent active inference simulation where every single agent is actually doing a gradient descent on free energy to choose their actions whether they're MDPs or predictive coding agents but then you can also write down a stationary density for the whole thing and identify the Markov blanket of the whole system because then you would truly have the multi scale perspective you would have the generative model of the individuals and then you would have the generative model of the whole existing simultaneously and then you could really ask questions like if I change the sensory precision of this single agent how does that affect the actual Markov blanket at the collective level and that's just so like technically difficult that's like originally what I planned to do but I ended up splitting into two separate things because just the mathematics behind it was too hard for me but yeah it would be really cool I hope this isn't a misleading physical intuition but it's almost like if we have a ball rolling on a surface there could be the sort of particles within the ball which are minimizing a potential function relative to their local position and then there's the ball rolling on the landscape and then the housekeeping term is almost like the landscape is soft and so it's state dependent what the landscape actually looks like that's why you can't just set it and forget it with the way that the landscape is the flow operator are Dean if you have a question then Martin I have a question from him you have a so on that you on that same theme I'll give you sort of a metaphorical story so if I'm standing on the bunny hill and I'm trying to control gravity and I'm sliding with all my friends we're flocking down here on the bunny hill and having a good time and we're looking up at the head wall and the irony is of course if you've been up there mathematically you know that you can't control gravity on the head wall just because the pitch is different now to the person on the bunny hill and all of their behavior they're all kind of swimming around down here on the sort of more gradual surface conceptually they would have to let go of their concepts of what they're doing to control gravity down here those same themes don't apply on that on that head wall how do we get our minds wrapped around that the letting go of what we think we would our first intuition we can see the head wall we're not even blinded to it our first intuition would be well how will I control gravity up there when in fact the first thing you have to do is control your current concept so mathematically how would we show that the benefit isn't in a continuation or a combinatorial thinking around that it would actually be a reconciliation a letting go of what you know in order to be able to take up something new that's the part where I'm not sure if your paper answers that it leaves enough space open that it isn't always a linear I'm down here I'm going to make my way up the side of that hill and I'm going to be able to apply exactly the same thinking to that new situation that's the part that I'm quite fascinating I really appreciate the idea of collective behavior but for me it's how do I let go of one idea in order to be able to take up a new one that allows me to continue to FEP my way down the hill yeah that's really cool yeah no I think that's like a critical I would say that's a critical insight from all of this is showing that like some of that seems optimal at your local level or whatever the concepts are that you use at your local level really it's a different game when you zoom out or when you look at like a different scale so I think that has a lot to do with this idea of what me and Daniel were just talking about how do the individuals their language that the terms the state space they think in terms of could be a totally fundamentally different language than the language that the collective thinks in terms of which is the same thing of like me trying to plan now versus planning the future, planning now is based on making sure like I am sitting up straight and I'm drinking enough water but planning in the future is what am I going to do in 10 days and what's optimal from those perspectives can be like a totally different state space like a different language essentially so the nesting of different timescales is important and also it's one reason the housekeeping term is important because for active systems that can undertake epistemic actions if action is not recalculated every single perception cognitive and action loop then it's like you're going to be navigating based upon a mirage or a fantasy let me ask this good question from Martin in the chat the construction in this paper of how internal states do inference on external states is still the same as in a free energy principle for a particular physics or not is Miguel's criticism of the derivation of this in a free energy principle for a particular physics taken into account here or not that's a really good question so my understanding okay so the big thing that changed between free energy principle for a particular physics and what was done here as well as what was done in Lance's paper the biggest change and I think I'm not sure Miguel if Miguel is here he can correct me if Miguel also made this is we went from argmax to expectation so in free energy principles for a particular physics the synchronization manifold was defined as a map between the modes of two conditional densities i.e. the maximum probability point to another max probability point the big move that allows us to get away with the difference between argmax and expectation is that we make everything Gaussian so that they're the same and crucially everything's differentiable because Gaussian's are smooth and the argmax function is crucially smooth because the expectation is smooth so I don't know how you get around how you deal with multi-modal densities when you're mapping between the modes of the two conditional densities because the argmax function may not be differentiable anymore and one of the main things that and the differentiability of argmax slash expectation also funk factors into the differentiability of the synchronization manifold which needs to be differentiable by construction so I think one of the main things that Miguel was talking about was talking about the differentiability of sigma but he was also talking about the flow of the internal states and whether the flow of the internal states can also be thought of as doing flows on variational free energy in that paper they very nicely showed that the average flow is not the same as the flow of the average and therefore it's not guaranteed that any individual realization will look as if it's doing inference so this paper as well as Lance's paper on Bayesian mechanics that does not say anything basically about the direction of the flow so I don't think there's any disagreement there in the new paper that's coming out we actually do directly have a new result that talks about the flow of individual trajectories that will more specifically address the flow based argument that Miguel was making in his paper where we're actually looking at the flow of an individual trajectory of the system in terms of gradients of the suprisal function or gradients of the variational free energy and we're doing that from different marginals of the steady state density so the flow of the active states as a gradient descent on the variational free energy of the particular states for instance or the blanket states but like so there's multiple things that can be answered from your question one has to do with the differentiability of sigma which relies on the differentiability of things like argmax versus expectation and other things have to do with relying on the differentiability of sigma to look at the flow of the most likely internal state given blanket states and neither of these papers actually have to do with the flow argument and whether the average flows equal the flow of the average they just have to do with on average does the most likely internal state blanket states parameterize the density and that is the case that's mathematically the case and it also gets around the argmax versus expectation issue by making everything Gaussian is there something more specific because there are several critiques in Miguel's paper and maybe I didn't get at the one that you're talking about Martin I think this is a perfect place for either of you to have a last word Connor though thanks so much for scheduling this for breaking into the .3 tier which was unexpected but awesome and we really look forward to your future work and joining us for future discussions so any last comments if you would like to have them yeah no thanks so much for having me on it's been a real pleasure it's been a really stimulating conversation and it's also thanks to Miguel and Martin who were in the and whoever else was watching it is really appreciate the comments that was really nice and and he also just general apologies on the part of all the authors that they couldn't be here today and also for the earlier live streams on this paper but you did a really great job in discussing the paper so yeah it's really cool to like that we put out the paper and then we get this attention and we get this chance to like talk about this stuff with you so thanks and thanks Dean for your great questions too it's really always fun chatting with you on here I think you were here at the last one as well that I was at so yeah yeah it's awesome and just like wow eight hours is not enough to explore and there could be so many ways to connect what's being said to core terms and the active influence ontology and it's just a great so thank you Connor thank you Dean and everybody who is participating live and asynchronously