 Hello and welcome everyone to Act In Flab live stream number 32.0 at the Act In Flab. The paper we're discussing is going to be Stochastic Chaos and Markov Blankets and it's November 7th, 2021. Let's play our theme. Welcome to the Act In Flab everyone. We are a participatory online lab that is communicating, learning and practicing applied active inference. You can find us at the links on this page. This is a recorded and an archive live stream so please provide us with feedback so that we can improve on our work. All backgrounds and perspectives are welcome here and we'll be following video etiquette for live streams. At this short link you can see our past and upcoming live streams and here on the main tab the live stream calendar for 2021. We're here in early November. We had our fourth quarterly lab roundtable summarizing the kinds of projects that you can get involved in and what we had done in the previous year, what we're looking forward to next year and then the first two weeks of discussion are going to be on November 9th and 16th for this paper that will be discussed tonight. In stream number 33 we'll have Abel's paper thinking like a state and we haven't set the papers yet for 34 and 35. The goal of 32.0 is to learn and discuss this paper Stochastic Chaos and Markov blankets 2021 by Carl Friston, Connor Hines, Kai Oldshofer, Lancelot, Dacosta and Thomas Parr and just like all the dot zeros this is just an introduction to some of the ideas. It's not a review or a final word and especially if you have any experience or you want to learn more about any of these areas and help improve our presentation or understanding of it. That'd be very helpful. You can join a live stream or just come get involved in some of the aspects of the lab because these are things that we want to understand but also there's a lot to learn and in the coming two weeks we're going to discuss this paper. So I'm Daniel and I'm in California. The big questions of this paper are related to the general discussion in the active inference and free energy principle literature and beyond what is a good model of thing this in a chaotic dynamic and dissipative world. So do we think of things as as they are in the snapshot or do we have to think of them through time at what spatial or temporal scale or with what kind of measure stick does it make sense to talk about things and then how does the result of that first question this model or approach to thing this speak to system sentience which is going to be a word that is used in this paper and notably this is not being defined in terms of feeling like experiencing it's defined in the paper as where internal states look as if they're inferring external states. So perhaps closer to what Dennett would call the intentional stance but this is sort of just pragmatism meets anti-dissipatism third point what is a Markov blanket and how is one modeled and identified or defined statistically this is going to be the technical bulk and contribution of the paper. So anyone wants to learn more and help us present some of the technical details which we're going to go through in this lecture but total disclaimer if it's not accurate or it's an incorrect generalization. A word that wasn't a keyword per se but maybe is just a qualitative entry point is this idea of flow and here we have people flowing like a long time exposure in a subway station and there's water flow which has several aspects the water itself is moving but also there's the flow of energy there's information flow and then there's this psychological concept of flow and so different aspects of these could be said to be flowing in different ways and people jump across these different areas. So wouldn't it be cool if there were a quantitative model that kind of incorporated some of what was meant by all these concepts including a science of perhaps perception cognition in action. So the big picture is how are we going to connect kind of physical flow models to potentially other kinds like macro systems and informational and even psychological or behavioral. The paper is stochastic chaos and Markov blankets 2021 and two of the aims and claims of the paper of which there are definitely multiple so this is just a few of them. In this work we attempt an end to end derivation of a Markov blanket starting with a normal form of stochastic chaos and ending with a functional form of a synchronization map. The second aim or like a summary of the paper is we conclude with the discussion of the implications and generalizations of the following heuristic so not necessarily formal but more like using empirical simulation or approximation proof of principle for understanding simulating and characterizing an elemental form of sentience in systems that self organized to non equilibria. So that is where sentience comes in and again this isn't just saying that because we did some matrix math we think it has phenomenological awareness sentience is going to be more like intentional stance it's resisting dissipation perhaps it's acting strategically given the affordances in its niche. So one really relevant paper or stochastic chaos and Markov blankets is the paper Bayesian mechanics for stationary processes with multiple of the same authors and this one was put on archive on June 25th 2021 and then this one looks like it was received by the journal at July 9th 2021 so one question is like to what extent are these complimentary are these requisite or are they somehow related to each other or are they too independent kind of dimensions of advance there's some overlapping areas but also there's some different areas so that would be good to learn about. So there's the keywords of the paper were Bayesian thermodynamics information geometry variational inference and Markov blanket. So let's go just one level deeper into the flow example. We have a lot of fields like math physics information computing thermodynamics geometry a few other fields where there are models of physical flow so everything from heat engines and heat flow to quantum and classical computation clouds and everything else so this is a engineering experience and ontology and from multiple different areas of mathematical formalisms that sometimes map very closely to the real world or not then there's these again flow models or process models for mental flow for example thinking about a intentional allocation or informational processing and the big idea which is proposed in Friston et al 2006 least one phrasing of it and an image that gets referenced back many times is this idea of like a winged snowflake and so below the phase boundary so above the temperature of melting ice the droplets are unable to enact policy so they simply fall from phase and then they become rain but above the solid phase boundary so below the freezing point of water there are the snowflake structures so there's an organization and then one can imagine that if this shape above the phase boundary is able to enact action on the environment first if nothing other than by chance you would find that there'd be for example persistence of the most adaptive action policies so that is the exchange and the interfacing of active systems with their environment that precludes the phase transition so that's the winged snowflake example of early or Friston and it will come back at the end in a pretty interesting way so one of the first keywords is Bayesian here's Rev Bayes and we've talked a lot about Bayesian things so just to kind of pick a few new images here's some overlapping between frequentism here and Bayesianism here and some of the overlap so just kind of interesting to look at and what exactly does it mean for a theory or for statistics to be Bayesian explicit modeling of priors and here's the sort of classic Bayes theorem which I'm just not going to go into right now I'm sure there's awesome resources to learn more about Bayesian statistics and just sort of the way that it becomes meaningful is just embodied how we want to be able to bring our full knowledge to bear on scientific questions or any other kind of question we want to be able to integrate different kinds of data sets and have models that can be able to receive data as well as generate it these things are all tractable in modern Bayesian frameworks thermodynamics is the floating wikipedia here branch of physics that deals with heat work and temperature and the relation to energy radiation and physical properties of matter so there's this sort of thermodynamic patterns or laws and the way that they relate to statistical mechanics so how the kind of jostling of molecules leads to the ideal gas law or kind of work capacity in temperature engines and then just looking around there was kind of just some old ideas some new ideas so this is something we've talked about before with decision-making but also it's um how we think about chemical reactions in terms of the lower activation energy is the kinetic product that's like the one that is easily jostled to and then the one that has a higher activation energy is not favored to occur but then it sometimes can be like a lower thermodynamic product so releasing more delta g and then if we're going to have some sort of an information thermo synthesis what does delta g informationally look like like what is the informational imperative the title of the paper has stochastic chaos so just found a few different citations on it and it's an idea that has been going back since before 2000s so I don't know when the exact beginning was but it's been around so that was just one question for the authors or for anyone who's knowledgeable about it what is the scope of stochastic chaos and how how do we think about the relative importance or meaning of solving one category of dynamic systems so what is stochastic chaos stochastic subject to probabilistic variation in one way or another and we'll kind of like look at multiple ways that that can come into the picture through sampling or approximation and then chaos which is sensitivity to small changes which is summarized at least in one way by the Lyapunov exponent which is a measure of how closely separated points either converge or diverge as a feature of the shape and the type of the matrices that describe the systems another keyword was information geometry so I'm definitely not the person to give a specific definition it would be great to learn more from someone who could link this to the specific modern advances so a few people who have emailed us but John Baez who is a professor in the UC system wrote on his site which has a whole curriculum on information geometry not that I've taken it but heard it was good information geometry is the study of statistical manifolds which are spaces where each point is a hypothesis about some state of affairs this subject usually considered a branch of statistics has important applications to machine learning and somewhat unexpected connections to evolutionary biology so this image it just kind of was like if these are dimensions of something abstract that projected down onto the graph paper is going to look like a triangle and then this shape which maybe is kind of like a saddle shape or if the two distal parts are connected it's like a tetrahedra in a way then that projected down from a specific angle is going to look like a right angled square but from a different angle it would have a different profile so the same tetrahedra rotated in different ways can project down into a different geometry so kind of interesting to think about that with information and if you're measuring several axes of information you have an informational tetrahedra as if it doesn't need to be saying that it is actually that way one of the other keywords was variational Bayesian inference and this was from the slides of 26 and some also good blog posts that others have written it was defined in this blog post by variational inference methods that consist consist in finding the best approximation of a distribution among a parameterized family so it's kind of like finding the best linear model for a data set this is a way to fit more complex data and models but it keeps it constrained within a family so that you're just tuning the knobs that you know interact in a good way versus all the possible degrees of freedom in modeling which is basically infinite and then two ways that Bayesian approaches can be deployed there's probably many more than this too there's the sampling based approach or the Monte Carlo like gambling approach which is like pointillism where it kind of samples and then this if the underlying distribution you're kind of agnostic to it if you run this properly and the underlying distribution has the right properties you can sample something that's totally adequate for making good decisions and then the alternate approach it's kind of like the svg approach the variational inference you kind of have a library of cat or of mammal subunit vectors and then that vector set gets um bit in a way that's more tractable than just this sort of endless sampling oh what if there was a whole body and you just missed it type questions however if there was a whole body in your library or the family of functions we're trying to fit we're only whole body then uh I mean you can have a mismatch there just like you get to fit any other model incorrectly so the Markov blanket is one of the um terms that I think many people are curious about certainly ourselves included and from number 14 live stream number 14 we talked about this continuum slash development of the concept from the way it was phrased by Markov by the way we're still trying to figure out which one uh slash exactly which contribution the father and the son made towards the development in pearl 1988 in the uh textbook uh shown here and this is where computational methods start to come in and some of the pure math assumptions get deployed in real data sets so then the Markov blanket is defined as the set in a Bayesian graph of insulating nodes so moving from this matrix representation towards this graphical Bayesian representation and then there's some of the developments by Carl Friston and others which this paper is part of that do a few different things relating to connecting uh separating the blanketing states into incoming sensory and outgoing action so partitioning of the blanket as well as implementing some of these components of principle of least action which is not implicit in just mere statistical insulation on a Bayesian graph so we'll talk about some of that stuff but i think there's enough to cover in what this paper here adds and number 14 number 20 number 26 many other papers that we've discussed and many that we haven't have very insightful uses as well as criticisms of the Markov blanket concept so what is the free energy principle this keyword one way to answer it since it's a keyword in many papers what are three possible attentional regimes that you might be able to expect will reduce your uncertainty about this question because if you're not worried about this question it's all good then but if you are you want to reduce your uncertainty so what are the actions to reduce your uncertainty well the third sentence of this paper that we're discussing here is in brief the free energy principle is a variational principle of stationary action applied to a particular partition of states where this partition rests upon conditional dependencies which is the other side of the coin of which which things influence each other and which don't kind of two sides of the coin there there's then a link to citation 15 so citation 15 if you want to go one more layer into establishing whether this is a meritous claim or not there's first in 2019 monograph free energy principle for a particular physics particular being specific but also referring to particular or autonomous states as they're known and also much discussion has occurred since this paper so look forward to seeing how it develops and then a third possible attentional regime would be to come participate and contribute with actin flab because every day we are reducing our uncertainty about terms as well as any participant can attest and we're always trying to improve our informational niche through projects like educational courses and ontology development so that the questions that people ask every day like what is a mark of blanket why does it matter what is the fvp why does it matter there will be awesome resources to share about those questions one other piece before we jump into the paper is just a little bit of matrix math vocabulary so I hope that I'm just even quoting the definitions correctly because I'm not super familiar with a lot of these distinctions and would think that there's a lot more expertise that people could share which would be awesome so the Jacobian matrix and the determinant is a construct in vector calculus and what's relevant here is that it's the matrix of the first order partial derivatives so if you have just like a lump in space you can take the partial derivative it's like putting a ruler and finding the derivative of a certain axis of a certain variable at that spot and so you can find the partial derivatives of higher order models the Hessian matrix is defined as the square matrix of second order partial derivatives of a function or field it describes the local curvature of a function of many variables so same idea it's like how the linear slope and then also the curvature so that can tell you a lot in especially a certain case that's going to come into play in how this paper defines its analysis approach specifically the Laplacian approximation so the abstract in this treatment of random dynamical systems we consider the existence and identification of conditional dependencies at non-equilibrium steady state these dependencies underwrite a particular partition of states in which internal states are statistically secluded from external states by blanket states the existence of such partitions has interesting implications for the information geometry of internal states in brief this geometry can be read as a physics of sentience where internal states look as if they are inferring external states however the existence of such partitions and the functional form of the underlying densities have yet to be established here using the Lorenz system as the basis of stochastic chaos we leverage the Helmholtz decomposition and polynomial expansions to parameterize the steady state density in terms of surprise or self information we then show how Markov blankets can be identified using the accompanying Hessian to characterize the coupling between internal and external states in terms of a generalized synchrony or synchronization of chaos we conclude by suggesting that this kind of synchronization may provide a mathematical basis for an elemental form of autonomous or active sentience in biology okay cool how will that happen here's the roadmap the first section is the introduction the second section from dynamics to densities makes the connection between dynamical models and densities which takes one towards flow like if you have water in the ocean of different temperature and densities you can do the flow modeling section three is the Helmholtz decomposition which we've also revisited elsewhere and it relates to splitting apart a vector field into two components one that's kind of like hill climbing and one that's more like curvature the Lorenz system a classic complexity model is brought up and used as an initial case for this kind of dynamics to densities plus Helmholtz decomposition then some further Lorenz uh systems other than the Lorenz are analyzed and then the Laplace and systems other than the Laplace are analyzed section four Markov blankets and the free energy principle talks about sparsely coupled systems and specifically where the different partitions are engaging in some type of synchrony which is going to be like correlated action conditioned upon the blanketing states that leads to a discussion of particular partitions boundaries and blankets and a kind of revisiting using a simulation analysis of Markov blankets the last section is about the free energy principle and mentions active inference so it's pretty interesting just to start with the data availability statement they wrote that the code used to run the simulations and create the figures for the paper are freely available in the dem toolbox of the MATLAB package SPM 12 so if anyone maybe wants to like do SPM live with us that who knows how to do it that'd be awesome SPM stream it'd be great so let's start with the introduction slowly because there's going to be some more technical parts later the last sentence of the first paragraph of the introduction is from this one can elaborate a physics of sentience or Bayesian mechanics that would be recognized in theoretical neuroscience and biology so imagine if the top half were redacted what do you think would allow for quote the elaboration of a physics of sentience or a Bayesian mechanics so not possible or it's already been done who did it or well it would have to have this attribute it have to be able to run on this type of computer or it have to have this many variables or fewer so what are we looking for here what would justify this sentence that they start the paper with and why would it matter which is kind of tied up with maybe what would be the motivation for developing it however much effort it involved why would it matter to have a physics of sentience well why did it matter to have a physics of heat let's pull back one more sentence the internal states can then be cast as representing in a probabilistic fashion external states so that's what they're going to elaborate a physics of sentience on from this partitioning of internal and external states but not just any partitioning a particular partitioning where internal states can be cast as representing external states so this is kind of a minimal representation we have internal and external states and we're not going to define them as even action or sense partitions in the blanket yet just there's some partition between internal and external states and the question is is this how you think the world is partitioned or how our statistical models are predicated what is this partition and we've talked about that in a few different papers and that's going to be a big part of this paper then we can pull back one more sentences in the abstract or I mean in this first paragraph where they define the free energy principle which were which we read previously and then specifically if the states of a system whose dynamics can be described with random or stochastic differential equation e.g. the line of line equation possess a Markov blanket then an interesting interpretation of their dynamics emerges the conditional independence in question means that a set of internal states are independent of another external states another external set when conditioned upon blanket states and so that is the blue here is the precondition for the red claim which is going to be what is the basis for in the author's words elaborating a physics or sentence so that is kind of where we're going in this paper agree or disagree there's going to be a lot of specific points where one can raise a qualm and that's totally okay and I think a lot of the discussion I hope will be very interesting with people seeing which parts of this presentation resonate with them or their understanding or not those are the big questions what kind of blankets line up with different parts aspects or partitionings of the world map is not the territory etc what makes some partitionings in the world act informationally like inferentially anticipatorily predictively etc why are some partitions able to be cpus and other partitions are just divided halves of a cup of water how do partitions of variables that model the world lead to a physics of sentience and action is there another way to get to a physics of sentience and action is there another way to do physics they describe their approach to part one by applying the Helmholtz decomposition which we're going to return to so this is just kind of a preamble and we'll talk more about some of the keywords but they're going to apply the Helmholtz decomposition to the solution of the Falker Planck equation describing the density dynamics of any random system and then later on there still summarizing part one and they're going to say we approximate the flow from the yellow part with a quadratic expansion which means the steady state density reduces to a Gaussian form so it's kind of like if you fit a quadratic model a x squared plus bx plus c to some data set it's not a statement that the data set is going to have a derivative that's linear and then a derivative of that that's a constant that's not part of the data set of the world that's about the quadratic model which is a very constrained model set that you chose to fit and maybe other models would have fit it better other ones would have had more or fewer parameters but you knew something about the form of the solution because you constrained it to a certain set of parameterized equations which is why they talk about variational inference and then they're going to show that for a Lorenz system subject to appropriate random fluctuation there's going to be this third state which is independent of the first two states or dimensions and this means that the third state has no blanket states and therefore is not is no there's no particular partition of interest so one question for anyone who maybe understands the formalism just why does this claim specifically matter so section two the aim of this section is to get from the specification of any of a random dynamical systems in terms of its equations of motion to the probability density over its states in the long-term future from any initial conditions we are only interested in systems that have a limit set in other words systems that possess an attractor we want to describe systems that do not necessarily visit all possible states are not necessarily time reversible are stochastic in nature and potentially chaotic in short we are interested in stochastic chaos in systems with a pullback attractor and then there's several citations from the 90s and maybe that's some of the earlier models of it that's how they describe it and then they show this formalism where the langvine the change in the state of x that's kind of like a derivative sign sometimes is equivalent to like a flow and then these faster fluctuations so kind of like a wave and a ripple but that's going to describe the change overall of some location and then that is equivalent to the Fokker Planck density dynamics so they're the same thing that's the equal sign so that I can't say whether this is just a total tautology super obvious or whether this is a conclusion that was only more recently understood that would be helpful to know but that's one of the baseline equivalences in the paper is going to be you can have a specification of a random dynamical system like here's the set of equations that describe how this sensor is returning data and then turn it into a flow in a density estimation framework so the Fokker Planck they write the dot product the dot notation indicates a partial temporal derivative in the absence of random fluctuations i.e. gamma equals zero the attractor corresponds to a limit set namely the set of states onto which the solutions of one converge so there's kind of like some steady state attractor which is when there's a heat differential like there's a it's warmer outside it's cooler inside and then there's some point where the thing gets zero when the flow shows exponential diversions of trajectories with positive Lyapunov exponents i.e. real positive eigenvalues in the Jacobian of the flow the system can be said to be chaotic so there's systems that kind of converge nicely when they're pulled away a little bit so let's just say that it stayed the same temperature inside and outside in that previous example then that equilibrium would be like pretty nice because it would return to it if it slightly were moved away from it but there's also systems where nearby points diverge and that's a measure of this approximated by this Lyapunov exponent and then the question is for systems that do have positive leading Lyapunov vectors then what happens to those low models and then what happens when there's a reinstating of the random fluctuations how does that sort of system that would be chaotic if it were allowed to just somehow deterministically evolve so it's like you're fitting the underlying model and it's a little noisy around it and then you swap out the actual scattering of the real world populations of deer and rabbit for just the partial derivatives that you were estimating and then maybe you see that there's some bifurcation in the population density but of course any real population is either going to experience one branch or the other of that bifurcation so that's kind of the modeling that they're working on and then when you go from that dynamical systems approximation to the flow if the flow shows exponential divergence of trajectories we can impute stochastic chaos so they take one of the classic systems of deterministic chaos the and dynamical chaos the Lorenz system which is kind of a butterfly looking attractor and they first show the trajectory of Lorenz attractor within without this reinstating of random fluctuations so we know that there's a divergence Lyapunov divergence that's making the system chaotic when there's no random fluctuations so what happens when that is reintroduced the middle panels show the corresponding solutions in the three-dimensional state space illustrating the butterfly shape of the limit set so even when the and the random attractor so the deterministic one is the kind of clean butterfly and then the one with a random fluctuation the points are outside of the butterfly it's a slightly different shape but then it it still illustrates it's not a you know an x in the state space so it still is kind of following this but just it's of course it's not following the exact tracks that it's on because there's a random fluctuation term but it still is going to have a certain geometry because this is a model hence information geometry so that's going to get some transformations and approximations lower left panel plots the fluctuations in the potential evaluated using the Laplacian form this is kind of formalism that is going to come later but expresses the self-information as an analytic function of the states and then the states I was not exactly sure why they were called states rather than dimensions so if anyone has a thought on that that'd be interesting and then the lower right is the potential function of the first two states basically or dimensions is shown as an image with a trajectory superimposed so this is kind of the even this smoothly varying global potential plot with a lighter in the darker representing different values there's this like dynamical butterfly that's fit from the re-sampling of the true Lorentz all right so the key insight from figure one is that the underlying system this is kind of a key complexity insight that the underlying system can have very simple and defined rules yet unpredictable dynamics so in the case of dynamical equations it's systems like the Lorentz attractor and many other systems that are just very simple and they only have a few parts yet they result in these very incredible and rich behavior and then there's also kind of like simple rules complex outcomes you know ants and humans so just a simple rule can be underlying something that's giving an extraordinarily complex pattern and yet if we measure these systems empirically that um have some simple rules underlying them but there's also this random fluctuation or we don't sample it all there will be some but not overwhelming noise there will still be dynamical patterns to pull out so we can still reduce our uncertainty about the location of future points by using a statistical model that we define that doesn't even have to be similar to the generative process so if someone said well you get money for getting predictions closer to the actual position of the sampling even if your model that you're fitting to it is like a neural network or is this kind of model that we're going to discuss here rather than a differentiable equation that's actually the true generative model you can still do better than just guessing within this cube so section three goes to the Helmholtz decomposition and they write um so that was the previous section was we can regard the deterministic Lorentz system as describing the expected flow of a random random dynamical system that is subject to random fluctuation that was what we just talked about the flow of such systems which possess a non-equilibrium steady state density can be expressed using a generalization of the Helmholtz decomposition into dissipative which is irrotational and curl-free and conservative which is rotational divergence-free components with latter are referred to as solenoidal flow for an introduction to the generalization of this Helmholtz decomposition c appendix b of 16 which is Bayesian mechanics streamed over 26 so we're looking for where p dot so the partials of p partial derivatives of p are zero so where is that flow equilibrium um well that is going to be a function f that has a few terms including one that's I think introduced in this paper the housekeeping function but it has two terms that are the classic Helmholtz decomposition which is the solenoidal the tangential current that's just the spire the isocontor and the gradient which is kind of like the up and down so that's how a total current in electromagnetics gets decomposed and it's related to kind of fundamental vector field math so let's go and uncover what was hidden by this gray box so we have the the Fokker Planck steady state p of x equals zero p dot of x is zero there's some function that's going to be a decomposition of that into a solenoidal and a gradient flow component the Helmholtz decomposition and then this housekeeping term which is in an appendix but didn't go into in super depth so curious about what that is but it results in this whole flow term here is the fancy i maybe it's a fancy j the first dissipative part performs a Romanian gradient descent on the negative logarithm of the steady state density which can be interpreted as the self information which is the negative natural log of p of any given state or as some potential function so that's kind of like how surprising is that state the second part of the flow is the solenoidal circulation on the isocontors of the steady state density and then the third term is this housekeeping term that accounts for how changes in the flow operator change so it's an extension of the Helmholtz decomposition thinking about how to solve dynamical equations so this is from actin 26 the synchronization map and not going to go into the whole thing because you can check out 26 but the key question is about this sigma this synchronization manifold between the internal and the external states conditioned on the blanket states so how do you insulate yet have anticipation across that insulation and then also from 26 this is kind of what it looks like to see a stochastic trajectory decomposed into a gradient and a rotational component all right 3.1 our objective is to identify the functional forms form of the self information or potential function that describes the non-equilibrium steady state density so why that's how surprised you should be that's the potential function that's sort of the the fund the fundamental frequency of both the solenoidal and the gradient so it's an important variable to know but how is self information measured or calculated how does the self information for different non-equilibrium steady state or equilibrium steady states compare what does it mean for self information to be high or low so to be surprised or not so first they in formalism for address this set of questions basically for one simple er case which i'm not super familiar with the matrix math so we'll leave it to others to evaluate some of the technical details so then they follow up and say basically we could solve that easily without that correction term that arises when flow operators are a function of the states indeed it is this state dependency that underwrites stochastic chaos so that's like the double pendulum the reason why it's a chaotic system is because the states at the time depend on the previous time time in a way that generates chaos so there's so much dependence over short term that things can diverge really fast this present this presents us with a more difficult problem the problem can be finessed by using polynomial expansions of the flow operator and the potential as follows for n states up to polynomial order m so they're going to use polynomial expansion approximation on a complex underlying flow functional polynomial expansions restrict and scaffold our model selection approach to a manageable size and tractable computation so it's kind of like the taylor series and the voltera series and other polynomial expansions which can fit functions often that are very complex well and so this is showing the taylor approximation to a sine wave and it's showing how the first approximation gets you like between one and negative one the red line might be totally more than enough but with only a few more terms being added all of a sudden it starts fitting really well further and further out so that doesn't mean it can go forever but the um polynomial expansions can be very powerful in the numerical case like just doing good enough with quick and dirty as well as in the analytical case here's a figure from spm 2007 textbook which is a great textbook by the way and it's talking about the voltera series as a general nonlinear input state output characterization system so nb note to ben a good note to you voltera kernels are synonymous with effective connectivity and then they write and this is from the spm 12 so later than this but right up from the point of view of regression models modulatory effects can be modeled with nonlinear input output models and in particular the voltera formulation described above because the kernels are high order they embody interactions over time and among inputs and can be thought of as explicit measures of effective connectivity an important thing about the voltera formulation is that it has a high face validity and biological plausibility the only thing it assumes is that the response of a region is some analytic nonlinear function of the inputs over the recent past this function exists even for many complicated dynamical systems with many unobservable state variables so cool why does it matter to do a polynomial expansion of the functional form of the solution to this flow problem the parameterization in five allows for state dependent changes in the amplitude of random fluctuations encoded by the leading diagonal of the flow operator with that functional form of the polynomial it is straightforward to solve for polynomial called coefficients of the flow operator by solving the following simultaneous equations for a series of sample points some of the keywords that are from spm and elsewhere that might come into play to learn more about these topics are expectation maximization optimization parametric empirical bays dynamical systems identification and generalized linearization schemes for fitting glms to nonlinear systems and then something that's not as much in spm but comes into play later with active inference and hopefully we'll be able to demonstrate this in the literature as we sort of sift through it is that action planning amidst uncertainty and feedback with the environment radically changes the nature of that kind of flow estimation problem you can't use the same model necessarily for the leaf on the stream as something like the winged snowflake that's actively resisting so how good is this approximation to use the polynomial expansion of the fitting of the random fluctuations of the random dynamical system that is chaotic inspection of the lorenz system suggests that a second order polynomial approximation is sufficient given the flow is second order in the states this ansatz so vocab ansatz an assumption about the form of an unknown function which is made in order to facilitate solution of an equation or other problem has an interesting implication if the self-information could be approximated with a second order polynomial it means the non-equilibrium steady state is approximately gaussian this is known as the Laplace approximation in statistics here we generalize the notion of a Laplace approximation to cover not just the quadratic form of the log density but also the solenoidal flow that underwrites non-equilibrium dynamics big if true it has to be so here's the Helmholtz decomposition so now we're taking that solution that we had earlier and thinking about it in terms of fitting only up to the second order polynomial expansion of the flow solution so that is as they say what is known as the Laplace approximation in statistics and it's in SPM textbook too here's the actual flow decomposition of the lorenz system the solenoidal flow is red the gradient flows blue and the correction term is gold the flow is shown as a quiver plot at equally based points so it's kind of like the vectors of different direction at different points and then the right is the same but for the Laplacian based upon the one on the left here the key difference is that the dissipative part of the flow operator and Hessian are positive definite in the case of Laplacian which means the gradient flows converge to the maximum of the non-equilibrium study state density this is reflected in the blue arrows that point to the center of the space so even though the gradient of the real flow is pointing all the different directions because it's kind of like a mixing flow so there's all the vectors are going in different directions but then in the Laplacian second order polynomial expansion the blue arrows are all pointing to the center or to some manifold that's much more compact so it's kind of like a converging solution even though the actual system we defined it to have stochastic chaotic properties so here's SPM looking at the lorenz tractor alas poor lorenz i knew it heratio a model of infinite citations of most excellent fancy it had to return to the pullback attractor a thousand times and now how aboard in my imagination it is so that's modified from hamlet because right after that seemingly very cool solution they say indeed the equations of motions we have written down are deterministic implying that there is no stochasticity in the sample paths of the process assuming a fixed initial condition this speaks to the fact that Helmholtz decomposition of the deterministic lorenz system is not a description of a dynamical system with random fluctuations i.e. systems in which the dissipative part of a flow operator is positive definite we therefore need to look beyond the lorenz attractor so that takes us to appropriately 3.3 for beyond the lorenz system so okay cool now beyond the lorenz system we go to the chaotic laplacian so figure three this is a trajectory kind of like a point list trajectory of three states kind of three dimensions comprising a laplacian approximation to a stochastic lorenz system so here it's second by third burst by third burst by second of the nest density so it's showing three projections like the geb cover onto a two dimensional and notice how the light is in the center so it's like they're getting pulled back to the peak of the Gaussian on each dimension so that's the non-equilibrium steady state which by construction in the laplacian is a multivariate Gaussian that's super key and extremely interesting the middle panels show the deterministic and stochastic solutions as a function of time while the right panel plots the same trajectories in the state space the shape of the attractor retains a butterfly like form but is clearly different from the lorenz attractor so what do you see when you look at this attractor be honest but it has a butterfly like shape and the um generated trajectory has some uh similarity with the underlying generator function but it has a such a nicer form to solve and then the lower left is the self information the potential as a function of time based upon the analytic form for the equations of motion and the deterministic trajectories of the previous panel and then the flow of the laplacian which is the approximated expansion it's nicely solvable etc against the lorenz that's the true flow is evaluated at 64 spaced points it can be seen that although there is a high correlation between a flow of the laplacian lorenz systems they are not identical so not super visible could have provided like an r squared or some other information but clearly they are um drawing from the same manifold so if this is good enough to stay alive then say la vie upper panels just just focus on those so this is again the projection onto two by three one by three and one by two of three of this multivariate gaussian laplacian approximation which makes it really nicely solvable so one can then estimate the leopon of dimension which is approximating to the house dwarf house dwarf dimension which in this example was 2.48 so the laplacian approximation was used to sample and then estimate the leopon of dimension and then it turns out that that compares very closely 2.43 and 2.48 which in the grand scale things where above one means chaotic system it means that the laplacian expansion has some of the nicely solvable characteristics of the generalized laplacian like the multivariate gaussian and the easy computation yet it also identifies this system as chaotic because it's it still gives a positively operative exponent from this sampling in short the flows of the lorenz system and its laplace approximation have an attracting set with between two and three dimensions for our purposes the laplace approximation is easier to handle than the lorenz system because the functional forms of the flow and potential are immediately at hand so pretty interesting figure four so now this recalls our earlier discussion of the jacobian and the hessian so let us recall the jacobian is the first order partial derivatives the hessian is the second order partial derivatives okay so we have the jacobian the hessian and the covariance between these three dimensions so the lorenz attractor was in three dimensions okay it was in state one two and three that's kind of why there's it's like thermodynamic states so there they are dimensions though so the lorenz attractor was this like kind of you know ant flying around in space in three dimensions so we fit a polynomial expansion specifically to those three dimensions the measurements of them now for those three dimensions we can talk about their correlations in the first partial derivative and the second partial derivative and the covariance that's in the top here in this example the third state is independent of the first pair where the independence rests on the directed coupling from the third to the first state and so it's logged to show the sparsity so do you see where we're going with the markov blanket see how this third state it's not part of the lorenz system underlying differential equations it's actually just a total figment of the laplace approximation model but it's quite an interesting pattern the middle panel shows slices through the steady state density over the two states um at increasing values of the remaining state so the only correlation in place between the first and second state so this is related to sphericity error correction and evaluation and it's a topic that is discussed in spm textbook but basically spherical errors are the ones where the two axes are not correlated but then when there's non-spherical errors then there's you get a different shape the lower panel the correlation is illustrated in terms of the conditional density over the first state given the second and so this is kind of showing that by and large it's doing well largely confined to the 90 credible intervals would be good to see some you know simulation statistics and raw data on that so again alas i knew him well the preceding treatment leverages the simplicity of the laplace approximation to stochastic chaos in which sparsity constraints on the hessian are easy to identify or implement now just like lorenz was dropped now this is going to be going beyond the laplace system so not gonna go into too much detail here but that's it that's a section that people could explain more about or we could look at more in the dot one the dot two so 3.5 summary it is important to not conflate the simplicity of a nes density with the complexity of the underlying density dynamics in other words when prepared or observed in some initial state the probability density can evolve in a complicated and itinerant fashion on various sub manifolds of the pullback attractor so the map is not the territory that's something that has come up many times the laplacian is not the lorenz but the laplacian is a good approximation to the lorenz in their simulations figure six so now the chaotic laplacian system is it's kind of shown in a time series way where this is the true relationship between the first and the second state so let's go to here so we see the first and the second state here's like where there's this non-spherical error and so this is kind of like an interesting like coupling in the model that's independent of the random fluctuations and so that represents something very deep about the structure of the laplacian approximation to the lorenz now using the laplacian in a generative capacity there's this initial misattunement in their setup it starts somewhere far from the pullback attractor and then like the correlation evolves through time and then they say the density converges to the steady state density after about 16 seconds however it takes a rather circuitous route from this particular set of states okay so that's interesting it starts with like they were differently correlated because of they starting in a weird place and then they found their true correlation structure note that the average density over short periods of time can be highly non-gaussian even though the density at any point is by construction Gaussian something to think about so then we get to finally marco blankets so they repeat the analysis but now they approximate two lorenz systems that are coupled to each other through their respective first states so the first and the second states were coupled so it's almost like one of our coupled states we're gonna couple out now this induces a richer conditional independent structure from which one can identify internal and external states that are independent when conditioned upon blanket states so what does it mean to sparsely couple systems and what are these quote first states the first dimension figure seven so now there's generalized synchrony in a Laplacian system so this is now the same figure as figure three like this except now we're looking at the generalized synchrony so now there's a synchronization manifold in the first state when those two got linked up so now there's more lines because there's like two coupled chaotic systems but the middle is showing that their state spaces are coupled and this is illustrating the degree of synchronization and then again showing that their flows are correlated no stats figure eight now instead of having three by three in the matrix correlation and covariance in the first and the second derivatives now it's six by six so here is now the identification of using the log of the Jacobian and the Hessian to identify different partitions so that partitioning between the one and the four between that first state of both is the one that we knew about that's like the one that was wired into the system but then the other ones are showing other correlation patterns now for I don't know why this four and five connection is not included in a box but these are the partitionings and so the sparsity structure of the covariance supports a particular partition into internal states five and six dark blue active fourth state the one that induces the one that connects here sensory first state and external second and third state this partition is illustrated here and the remarkable thing is that despite their conditional independence there are correlations between internal and external states and here between the second and fifth states so interesting patterns showing these evolving through time nine so now they're going to look at the partial correlation of states and showing that the Hessian is recovered so here is like that Hessian on the top right and then now they're going to show that it's recovered it's a little bit light but it's showing how these so so these squares are showing the partial correlation coefficients and hopefully pulling out the similar structure as identified here and then like the strong correlation well yeah we'll investigate that more in the dot one the dot two and then showing how like the third and the sixth state for example so that's like three and six they're not supposed to be coupled and so then they initially start off with a high correlation because maybe they start in the same neighborhood but then they converge towards a zero correlation so that is sparse coupling it does not depend upon Gaussian assumptions about the non-equilibrium steady state density it implies that the dynamical influence graphs with absent or directed edges admit a Markov blanket which may or may not be empty these dependencies can be used to build a particular partition using following rules capture the flag figure 10 so here's where we see kind of the classic bacteria example but now it's two bacteria communicating I'm not sure if it was always that way and then here is a little bit more of a coupled systems with two times three which is pretty interesting so here is where we see the Markov blanket as we've seen it before but also in a new way how great here's some technical definitions of the Markov blanket so not for tonight but for another night so um here's a question they asked themselves why why one might ask why does a particular partition comprise four sets of states I don't know why is the tetrahedra the smallest polyhedra who knows in other words why does a particular partition consider two Markov boundaries sensory and active states so why can we skip straight to the frist and blanket rather than just identifying the pearl 1988 or the Markov earlier model the reason is that the particular partition is the minimal partition that allows for directed coupling with blanket states so that again is from figure 10 and the key thing to note from figure eight is that there are profound covariances between some internal and external states despite the fact they are conditionally independent so we're defining like nodes that are not connected are conditionally independent that's how the Bayesian graph works so here one and four get correlated get a connection with each other that's this connection right here right there and right there so now they're coupled one and four so even though some of the internal states are so now it's like the external states are at minimum two and bucky fuller famous quote unity is plural at minimum two the internal states have to be at minimum two external states have to be at minimum two so that's pretty interesting because they yeah um yeah the particular partition is the minimal partition that allows for directed coupling with blanket states okay and then um there the second the fifth states are highly correlated yet are twice removed and so this is leading to a general synchrony through partial coupling of dynamical systems note this is their words note there is no claim that either the original Lorenz system or coupled Lorenz system possesses a Markov blanket the claim here is that there exists a Laplace approximation to these kinds of systems that in virtue of the zero elements of the Hessian feature Markov blankets so oh how many days the having a Markov being a Markov modeling a Markov no claim a Laplace approximation approach to systems identification can add constraints and make the problem easier to solve the Markov blanket is arising literally within the Laplace approximation which is just an approximation it's not a description of the system so on to the free energy principle uh the existence of this any particular partition means that one can provide a stipulative definition of the conditional density over external states being parameterized by the conditional expectation of internal states given sensory states we call this a variational density parameterized by expected internal states mu so going from the minimal coupled systems to there are still some persistent covariances and these covariances partition out into a Markov blanket what if the expectation of the internal states were of external states in other words or inwards for every sensory state there's a conditional but it has to be conditioned on the blanket state which is how the whole thing is set up inwards for every sensory state there's a conditional density over external states and conditional density over internal states where internal and external states are conditionally independent this admits the possibility of a dyphiomorphic map between the sufficient statistics of the respective densities the existence of this mapping rests upon a continuously differentiable and invertible map which is linear under Laplace approximation so dyphiomorphic is like stretchable so that allows some of these formalisms to arise again would be awesome to hear what other people can bring to the table about this but this is like the energy minus entropy form the divergence term plus the self information about the policy and then the accuracy minus the complexity so would be awesome to learn more about this okay the font got a little out of control here but this functional the big one is going to be expressed in a few different forms so it's an expected energy minus the entropy that's energy minus entropy self information plus KL divergence okay self information plus KL negative log likelihood of particular states and the KL divergence accuracy minus complexity okay accuracy minus complexity it's also in the machine learning context evidence lower bound elbow this is the basis of the free energy principle put simply it means that the expected internal states of a particular partition at non-equilibrium study state can be cast as encoding conditional or Bayesian beliefs about external states equivalently the equivalently the flow in the internal manifold can be expressed as a gradient flow on a variational free energy that can be read as self information this license is a somewhat teleological and directed description of self organization as self evidencing in the sense that the surprise or self information that constitutes the potential is known as log model evidence or marginal likelihood in Bayesian statistics so wow great FEP work what does it mean to talk about the physiology perspective here though so what does it mean for self evidencing in the biological context the blood vessels want to observe themselves working so now in section 51 getting hopefully towards the end here in alternative this is later in section 51 in alternative and deflationary perspective rest on noting that free energy gradients are also the gradients of self information so the organism wants to be minimally surprised about some chaotic pullback attractor so whether or not things are wacky out there or not having this updating flow model that comes to a pull back attractor that is of the approximation that the model or controls is very important and the organism wants to be minimally surprised about the expectations and preferences that it has over observations and that is defined with some technical details here so this formulation of gradient flows is simpler and shows that they're effectively minimizing a different sort of prediction error namely the difference between particular states and the expected values at non-equilibrium steady states this leads to exactly the same gradient flows but a complementary interpretation in which autonomous states are drawn towards their steady state expectation heristically one could imagine this kind of stochastic chaos as apt to describe the motion of a moth attracted towards a flame but constantly being thwarted by turbulent i.e. solenoidal air currents because active states influence sensory states and possibly external states this would look as if the particle e.g. the moth was trying to attain its most likely state in the face of random fluctuations and solenoidal dynamics this perspective emphasizes the active part of self-evidencing sometimes referred to as active inference so how cool first in 2006 the winged snowflake 21 like a moth to the flame 5.2 so they give more summarization here about that sequence that they took of constructing the Laplacian approximation with a quadratic form and then looking at that first and second partial differential partial derivatives matrices the Jacobian and the Hessian looking at the conditional dependencies then the generalized synchronizing given the partitionings and then suggesting that there's some diffeomorphic like stretchable connection between the internal and the external states conditioned upon all of that and then that's the Bayesian mechanics in which internal states on average parameterized beliefs of a Bayesian sort about external states okay so some of the implications in the last few slides here so clearly the worked example based upon sparsely coupled Lorentz systems does not mean that a conditional synchronization map exists or is indeed invertible which would allow the tail of two densities the recognition and the generative model in any given system so you can't just specify any system potentially and find a conditional synchronization map however the above derivations can be taken as existence of proof that such manifolds and accompanying variational free energy formulation can emerge from sufficiently sparse coupling so this is one system classic system of complexity analysis and it worked here maybe there's other systems where it will or won't a particular partition is necessary to talk about states that are internal to some particle or person and which can be distinguished from external states one way of understanding the ensuing coupling between internal and external states is in terms of generalized synchrony that's that synchronization manifold this synchronization can be expressed as a variational principle of least action using the notation notion of variational free energy so this is from another Dekosta paper the synthesis on discrete state spaces and it is talking about how that variational free energy minimization of perception and planning as inference about observations can be seen as a generalization here's active inference if you have no prior then you're you have surprise pure surprise optimal surprise intrinsic motivation info gain info max principle because it's just about what's surprising given no priors when there's no ambiguity so when there is a model that abstracts away from for example measurement error you have risk sensitive policies and KL divergence based control and the Occam's principle and there's other cases like Bayesian decision theory expected utility theory as well as maximum entropy and jane so it's kind of there's a lot of pieces that are coming into play when we talk about active inference probably some that will still are yet to learn but there are some pretty interesting patterns arising from this partitioning here's another form another implication so the polynomial form of the Helmholtz decomposition from formalism five may provide a generic model for observe random dynamical systems in other words it could be the basis of a forward or generative model that explains some empirically observed flow estimated using the first and second moments to quantify the flow and covariance of random fluctuations over state space respectively this kind of generative model is appealing because of its parameterization in terms of the underlying non-equilibrium steady state density in other words one could in principle try to explain empirical data in terms of a Gaussian steady state density which may include constraints on conditional dependencies so let's think back to the beginning with flow and information flow and mental flow and grain it moving down a quarry or all these kinds of flow and then could there be something that integrates across them another option for scaling this approach to high-dimensional dynamical systems would be to learn the state dependency of the flow operator and the non-equilibrium steady state density using unsupervised learning approaches such as deep neural networks a similar approach has become popular in the deep learning literature in the form of neural stochastic differential equations so here's two recent stochastic differential equation papers and then they bring up this very interesting suggestion one could use a separate feedforward neural network to parameterize the different components of the flow operator and the self-information this would require differentiable transforms to be applied to the output layers of the network to constrain the Hessian and solenoidal flow operator to be positive definite and anti-symmetric respectively so suggesting some neural network architectures that might facilitate active inference then this is a pretty nice um piece it is interesting to relate flows across a Markov boundary to the constructal law which like the free energy principle seeks a normative account of the structure and dynamics of complex systems in the case of the construct a law this is articulated in terms of maximizing external access to the currents internal to the system so here is um Adrienne Bajon who's like a awesome professor and researcher who's done many years of work like Friston in some ways um on this construct a law so I was quite happy to see that connection here's another final implication one practical implication finally the practical implication great it licenses the use of inverse sample covariance matrices of a sufficiently long time series as petition potentially sufficient description of the steady state density so we could take a real time series and fit that Laplacian approximation this would furnish a description of the expected flow and Lyapunov exponents to establish whether the system was chaotic or not in conclusion the last sentence having a simple functional form for the flow of random dynamical systems may be useful for both modeling and analyzing time series generated by real world processes that are far from equilibrium that may or may not be chaotic so we can think of two chaotic systems the humans and the ants and then everything else so pretty interesting paper um the coming two weeks hopefully we'll have some good group discussions we're going to email the authors now that we've done our work on the dot zero and would be awesome to have them on for the two discussions we'll have or for any other time um thanks everybody who was watching I hope if you are watching to the end you found this interesting and come get involved with active lab because we could all have some shared resources so that some of these questions that are really being advanced in this paper like the connections to complexity and dynamical systems theory and um statistical thermodynamics um these are important contributions and this is not a fringe apologetics for active inference it's barely mentioned in the paper it's actually some really fundamental work that I hope is understood and critiqued where valid so what would a good understanding here enable of the kinds of things that are discussed in this paper which came out in 2021 but think into the future what are the unique predictions and implications of this paper what are the next steps for FEP and active inference what are the goals of this research and what are you still curious about so thanks for sticking around for a long live stream at a different time but so it goes so have a good one everyone hopefully see you around the lab or around bye