 Hello and welcome, everyone. It is September 1st, 2023. We're here in Active Inference Mathstream, number 6.1. Here with Sean Toll. We'll be hearing a presentation, Active Inference in String Diagrams, followed by a discussion. This is super exciting, so if you're watching live, please feel free to write your questions in the live chat. Really looking forward to this. Thank you, Sean, again, for joining and to you for the presentation. Thanks very much. Thanks everyone who's watching. Thanks to the organizers for this chance to speak to you and to Daniel for getting in touch and inviting me to speak. I'm really excited to share this work with this community, basically, and to hear from those people who work with Active Inference and do any formal work what they think of what I've presented today. I'm going to be presenting a formal approach to how you can describe Active Inference in terms of an entirely graphical language, called the Language of String Diagrams, and it's based on this mathematics called Category Theory, and I won't assume that you're too familiar with this already and try to introduce it to you in the talk. And ultimately, I'd like to convince you that this diagrammatic language will be really useful for those of you who work formally with Active Inference and encourage you to pick it up in your own work. Just introducing myself, I'm Sean Toll. I'm a researcher at Continuum, formerly a post-doc at Computer Science in Oxford, and at Continuum in this Oxford team we study what we call Compositional Intelligence, which includes applying Category Theory to topics in AI. And as well as this, the project was supported by a grant from FQXI. It's located at the bottom and hosted at Topos Institute, which is the centre for applied Category Theory. So let me get started, I think. So, yeah, here we go. So, for Active Inference, I won't spend too much time introducing it. I'll assume most people here are familiar with it and many of you probably know more about it than I do, in fact. So I just mentioned the parts of it that I'll be addressing in the talk. So thinking of it as a model of cognition that hopefully we can think of as applying at many levels, say from a whole organism or just a single neuron. And the key idea is that in this approach, you think of an agent that's coming with this generative model that it uses to explain the observations it receives from the world in terms of some hidden states, which you might call perception, and in terms of its own actions. And in Active Inference, it achieves both of these things through this form of Bayesian inference or an approximate form of Bayesian inference by minimising this quantity called free energy. And these are the ingredients we'll be talking about in the talk. And the thing that's really exciting about Active Inference, I think, for those of a formal background as well, is that it aims to offer a very principled approach to cognition that you can hopefully apply at all these many levels. But I think at the moment, it could also benefit from more formal work. And that's what this talks about. It's about formal approaches to the theory. In particular, I think nice clear formalisations of active instances would help to clarify sort of what the core or the key ideas of the theory are. So we'd like to be this very distinct principle that ideally we just apply to a generative model and everything else follows from. And once we've got to this, we can hopefully generalise it and understand it better. And also make it just more acceptable to those who come from formal backgrounds like in mathematics and so on and get them working on this topic very quickly and connect it with approaches in artificial intelligence as well. But the most important thing about a good formalisation, I think, should just be to make learning about Active Inference easier, make the framework simpler to understand that's what we're aiming for in this work. And I'd say in other places already, there's been some calls or some suggestions that a nice formalisation of Active Inference should be a diagrammatic one. So when you look at the generative models that come up in Active Inference a lot, they're very compositional in their nature and it's very natural to draw them in diagrams. So it would be nice to have, like, our whole approach to describing it could be graphical in this way. So just for example, this is paper called The Graphical Brain by Kristen Parr and Davies. And in general, you know, we've probably seen loads of these diagrams describing generative models where you draw many compositional features of different spaces of hidden states, observations interacting and so on. So these diagrams are used, but they just used to represent the model. You still have to then go to doing sort of traditional probability theory calculations when you've reasoned about them normally. But in fact, there is a whole graphical formalism and mathematical language for describing these kind of interacting processes just entirely with the diagrams. So the aeropathics is called category theory and the language of these string diagrams I'm going to talk today. And in particular, there's a lot of work going on in how you can describe aspects of probability theory and causality and causal models in terms of string diagrams. And these causal models are basically based on Bayesian networks, so the same formal structure as the generative models in active inference. And in particular, what we'll talk about today kind of draws on this paper co-authored with Robin Lorenz talking about causal models and a sense of pearl in terms of these string diagrams, so it's basically causal Bayesian networks. Which is the same formal structure as we'll be talking about today. So in this talk, I'm talking about this paper which was joint work with Johannes Kleiner and to be saying first of all just called active inference and string diagrams a categorical whole account for processing and free energy. And basically what we do is try to give a formalization of active inference that's nice and clear conceptually just entirely in terms of string diagrams. So we're basically taking the kind of formal containers in something like the active inference book and turning it into these diagrams. And as I mentioned, it was done as part of this FQXI project. That's actually on a project about consciousness that's about ways that category theory can be applied to theories of consciousness. And we've done some previous work looking at the integrated information theory of consciousness. And of course there's all sorts of ways that active inference has been proposed to connect the consciousness. But for those of this talk, we won't go into any of that. It's just a period of cognition I take it to be here. And there's also lots of other related work going on in category theory that's very close to this that often goes by the name of categorical cybernetics and this might include some of the things that Toby has talked about on the stream in the past. So what I do now is introduce categories and string diagrams and then later we'll apply them to all these basic ingredients of active inference that I've alluded to so far. So that would be generative models updating them free energy and active inference itself. So let's start with these categories and string diagrams. So you can think of a category in general as a sort of world of interacting processes. And the categories we're talking about here are always going to be these symmetric menoidal categories. But don't worry too much about the formal language because the way we talk about them is just going to come down to the diagrams here today. So a category amounts to a collection of these objects or sometimes called systems like this capital ABC here and what are called morphisms or you might want you to be called processes between them. So if you're writing normally you can just write a morphism from A to B as like this F, A to B. In string diagrams you draw it like this where we're reading all the diagrams from bottom to top in this talk. So you have a wire for the A input at the bottom and a wire for the B output at the top and the morphism is just drawn as a box F here and you just read the diagram up thinking okay it takes in this input coming in on this wire A process applies and then you have your output type B at the top. What you can do with these processes is compose them so if you have two processes they're F from A to B and G from B to C so the types like that you can just compose them in sequence and this just means plugging the boxes together in your diagrams and because we're in this minoidal category you can also compose in parallel so you have this operation called the tensor which means given two objects you can put them together to build this composite object so AB goes to A, tensor B you could say and you can also do this to morphisms so you can build this F tensor G but in the pictures it just means drawing them side by side and you just think of this as meaning we have F from A to C and G from B to D and they're just running in parallel and they're not interacting essentially so most of the time you just draw a picture like this and you don't even need to write the tensor symbols so if these two basic modes of composition and from these you can build much more elaborate string diagrams in your category so if you're writing things very mathematically you have to write lots of equations that a category or a minoidal category needs to satisfy but when you're working the diagrams they basically do some of the work for you because these things just come out for free for example you have equations like this that you'd have to think about when you're working in the conventional mathematical way but in the diagrams it just means if you have two boxes you can slide them along the wires it doesn't really matter where they are on the wires it tends to just be the connectivity that matters similarly we can crosswise over each other because we're in a symmetric setting and there's a few features that it was having a category so every object comes with this identity morphism which you should just think of as meaning nothing happening basically so it's just drawn as a blank wire there's also a kind of identity object as it were called a unit object which is just empty space so you don't even draw the wire this dash box just means nothing it's meant to be an empty picture and the latter thing is just useful because it now means we can talk about morphism which doesn't even have an input or an output more formally it has this object i and we give these things special names so the most important one probably is that of a state which is a morphism with no input as it were or really with input i so in the pictures it just looks like this so there's no wire going in and you'd call this a state of A you can also have a process that takes in A and it has no output that you'd call an effect and if you have nine of you just call this a scalar so this is just going to be like a number basically floating around next to your diagram so there are many categories out there the point of category is extremely general so these could be talking about computational processes or physical processes or quantum processes in particular and all sorts of things but this talk will only actually need about one category we'll just keep it simple with this one that's the category called a matter plus so it's positive real matrices so you can just take objects so the wires to be finite sets and the morphisms to be positive matrices indexed by these as I explained so if we draw a box like this m going from x to y this is like a matrix indexed by x and y for each input in this little set x sorry in the set x and each output in y you get a positive real number and you will write it like m of y given x so this box would mean a function like this now when we plug them together they let us turn some things that you normally have to do with equations to kind of simpler pictures I mean mainly this middle one so if we have two in sequence we just compose them by matrix multiplication so instead of having to write this formula what we sum over y we can just draw this picture of it where we just plug them on top and if we run them in parallel here we take the partition product of the sets and the tensor product of the matrices doesn't work but it's just the obvious thing where you have two things running independently so in particular a state here ends up the the monoidal unit is just a singlet in the set and you can basically just ignore it so a state here amounts to just a function sending each x to a positive real and affect the same thing and a scalar would just be a positive real the intuition though is that we're going to restrict to the particular morphisms in here which are probabilistic in the nature so they need to send each x to an actual distribution over y so I want to talk about how you actually pick those out next so to do that you use some extra structure that this category has it forms what's called a copy discard category so this is one more bit of mathematical of gadgets we have around so that is that each object comes with these distinguished processors there's one that we call copy that takes in a same we've got two copies of a at the top and one called discard where you just throw a away so you have no output at the top these satisfy some equations that are quite intuitive if you think about copying and then throwing away one of the outputs is the same as doing napping so let's explain why copying is symmetric and associative is the best one in this category Madel plus discard would be just a function sending each element to one and the copy would be like a delta so A comes in and two copies of A come out at the top as the intuition the reason we introduce this stuff is because it's been shown recently that you can do a lot of probability theory just in terms of these CD categories and particular ones called Markov categories so there's a lot of what's going on in applied categories here at the moment it's using this language of CD categories in particular they let you pick out some things to do with probability theory I'll just talk about a couple of them here the most important one is the notion of a channel so this is what let's us pick out the actual normalized matrices as it were from earlier so in general you call a morphism a channel when it preserves it's discarding and a special case is a state when it's a channel you call it normalized so for a state this means it would actually be a probability distribution so it's actually normalized if you sum over the values of this omega you'll get one and for a morphism being a channel it means it sends each input to a distribution so it actually is a probability channel in the usual sense Equivalently the matrix for F would be stochastic so these are the ones we'll use in generative models for example and as I said there's lots of probability theory you can describe with these diagrams there's very two simple examples there's the optimization and probability theory with this discarding thing so if you have box omega like this it would be a joint distribution over x and y if you just discard y you'll get the marginal on x and if you plug omega a distribution on x into an effect so that would just be any function on x this would be giving you a scalar now and that would be the expectation value of this distribution so I'll meet a lot more of this as we go but let's actually start doing some stuff related to active inference in particular now we'll talk about the models and how you view these in the diagrams so as we've said we're going to be talking about agents having generative models that relate things like actions, observations and world states and these are normally quite compositional in active inference and might involve many different spaces of states and observations and these processes relating them and you usually treat these with something like a Bayesian network claiming you can view it really as a causal Bayesian network because it's sort of describing how states are causing these observations but formally it's the same thing as a Bayesian network which you could normally say is described as something like a DAG a directory circuit graph which describes the different variables that are being related and then sets of values for each and probability channels describing each one in terms of its parents in the DAG and then you often look at this whole distribution over all the variables but already I've said the way that these Bayesian networks are drawn in active inference text and stuff is kind of converging on something a bit closer to string diagrams because you don't actually just draw the DAG and the variables it's very useful to actually give names to the mechanisms themselves as it were like you have in this picture the A and the B's so yeah, my claim is that it's sort of converging on the way that string diagrams will look as we'll see on the next slide which is where you really do label everything not just the variables so to describe those sort of Bayesian networks with string diagrams the key observation is really that DAGs correspond to certain class string diagrams that we'll call network diagrams there's a definition here but we'll just better just to see an example in a second but the diagrams built from copying and sometimes discarding and the key thing is just that they only have processes with maybe many inputs but only one output so this is going to be like a mechanism that produces each variable so the result is that if you have a DAG G and you choose some of the vertices to be outputs so those are like the observed variables you can draw a network diagram which expresses the same equivalent structure with those things as the outputs so here's an example we have a DAG with these four variables X1 to X4 and what you do is you have a wire for each variable in your diagram on the right and you draw a box that produces it it doesn't really matter what you label the box so this box C for example produces X2 and it'll produce it in terms of its parents so X1 and X4 in this case and if it doesn't have any parents it would just be a state as it were a box with no input and then what you do is you take each variable you copy it and you pass it to all of its children in the DAG and also out of the diagram if it's an output so this now means you've sort of expressed the whole structure of the DAG and also which variables are sort of leaving the system the output ones so this allows us to turn DAG into a string diagram and then if you want to make a generative model that expresses like a major network that's structurally going to this DAG you just have to now interpret this diagram in a certain sense so in general working in any one of these CD categories this copy discard categories we can say that a generative model in there is given by one of these network diagrams without any inputs and an interpretation of the diagram meaning you actually say what the objects are for each of the wires in the diagram and what the actual channels are in your category are for each of the boxes so for a generative model like this you would say you pick objects x1, x2, x3, x4 and pick channels for the ABCD and we'll think of the outputs of the diagrams like the observed variables and the rest of the hidden ones so for example if you're working in this category MatR+, that is the only category I've actually introduced here so this is going to be a running example this is the same thing as one of these causal vision networks so it just means you're picking the sets of values for the variables and picking probability channels for the boxes so you might ask why would you use this representation rather than the usual one which I think is a good question so it's equivalent to the DAG and probability channel description but the thing that's nice is that in the conventional approach to these networks you sort of have to switch between the DAGs which is for the variables and then doing calculations with probabilities whereas in a graphical approach you can use just one formalism because you can do probability theory with the string diagrams as well so it's quite natural in a sense you have this one language but both just intuitively drawing what's going on in your model and then reasoning about it it also lets you start to generalize things in a useful way I think so I kept having to say that you have no inputs to your diagram but there's nothing really fundamental about that and it's not very clear why we need that so what you can do instead is start to allow inputs to your model as well call this an open generative model so an open generative model is the same thing but now just drop this requirement that the diagram doesn't have any inputs so here's an example of the general network diagram now with these inputs x2x3 so there's no mechanism specified for these new variables x2x3 they're just input variables to the system and they can be outputs as well for example x3 is both an input and an output here and again an interpretation of this general network diagram just means picking the objects and the channels and it's the same definition we use to define what we call an open causal model in the paper with Robin Lorenz that I mentioned earlier so an open generative model is formally just the same thing but we're just thinking of it as a generative model possessed by a cognitive agent if you run this definition in Matthau first then this is just like a causal data network and now you have some of your variables just have no mechanism specified so they're just inputs to the whole thing a nice thing about these open generative models is that because they can have these inputs you can plug them together and compose them and these things in fact form their own category but I won't go into that today so that was the general theory of these generative models that just actually describe some examples that you'll see an active inference coming up all the time so it's a simple example that's just an argument of one space of hidden states and one space of observations then it would be a generative model of that form would just look like this network diagram where it's just two wires S and O S doesn't have any parents so it just has this prior distribution sigma over it and it's just this one channel often called the likelihood from S to O if you just draw that network diagram and say that model in C or in Matthau plus it would be the same as it's one of these simple generative models when you're looking at these you're often interested in this distribution over both variables together this joint distribution that you might write as P of S times P of O given S normally or a bit more specifically here introducing the names for the two distribution in the channel here in string diagrams it's just the same as this so you just take the prior and you make it an output now and then you compose those channels together and this would give you a distribution so a normalized state M over S and O at this so this is just the resulting joint distribution you get from the generative model and we'll come back to that later so a more elaborate example of generative model that you'll see for example in the active inference textbook are these discrete time models that are used a lot so I'll walk through this diagram now so this is an example of a more complex network diagram and a generative model it describes so here we've got these M time steps going I remember we read from bottom to top but I'll just talk about I'll describe things from the top down here so we have the observations O1, O2 up to ON observations at each time being caused by these hidden states S1, S2 and SN at each time via these channels A and the hidden state is evolving over time by this transition channel B where it takes the previous state as an input and then choose the next one it also takes this one extra wire coming from the bottom and that's this space P of policies which is how the agent sort of actions enter the picture so these policies describe its behaviors its behavioral policies it can carry out so based on the previous hidden state and the way it's acting this channel B would determine the probabilities for the next states and then there's a prior over the policies that you can think of as the habits or the typical behaviors of the system so you can just draw this diagram and say like interpret this in MATLAB plus and that would for any given interpretation meaning any choice of what the values for these variables can possibly take R and any choice for what these channels are would then give you this type of generative model and often you will see models sort of this form or some of the forms being plugged together to form these hierarchical models and I think the compositional language is very nice for these because you really want to talk about open models to define this the hierarchical model you can view as really has just been given by taking us these open generative models I mentioned and plugging them together in a certain sense so here you can see a picture of a hierarchical model where we just have these layers where different copies of the same model each layer and the inputs from one layer match the outputs of the layer below so we can compose them together that's generative models so far we've just been using the diagrams to represent them we'd like to actually do a bit more of a reason about these models and we take care of talking about how you update a model or update the beliefs within a model which is very important in active inference and how this looks in the string diagrams now so let's say we've got an agent M or the model M and it's just the simple one where we say there's just one space of different states S and one space of observations O so they have this joint distribution that I mentioned earlier and they have these prior beliefs about S which you can get back from the joint distribution by just taking the marginal if you like so that's a signal so they've got their model they've got their beliefs about what states are likely and then they receive a new observation and here the kind of observations we'll think about in general can be soft meaning they're described by a distribution over O not necessarily just one element so that would be one of these normalized states so I'll try not to use the word state here because it's a bit confusing with this S around so this distribution over O is then your observation this bold font O and now they want to update the margin they want to update M in some ways so that the marginal basically is different and describes the updated posterior beliefs and this comes up of course in perception where you're updating your sort of state of the world given some observation you receive they could also be used to model like planning behavior where updating your plan of action your policy given something like which outcomes you'd like to see in the future and we'll come back to that later in the talk so we want to talk about how you can do this updating so you might think there's just a standard answer or at least in the ideal case which is this Bayesian updating and that's true when your observations are sharp and we'll start by talking about that case so I'll say what sharp means in a second but first I'll just talk about how you treat Bayesian conditioning in these string diagrams so on the right here we have if you view this process from O to S this describes the sort of Bayesian conditional channel or in general a partial channel in fact that the agent would have from that introduced by their model so you can describe this in string diagrams so a couple of extra gadgets I hadn't mentioned yet so you have your distribution M over S and O you can have this in matter plus there's this effect that we call the cap which just takes two inputs and compares if they're equal and it allows you to turn an output into an input so that's what this part here is and then you can introduce this extra thing of normalization so what you'd like to do is take a general morphism and for each possible input normalize that so that it's a distribution you can also set it to zero if it's just zero and there's nothing you can do so that's what this blue dash box is and in the paper and in the related causal models paper we talk about the axioms of normalization feature satisfies and the point is if you compute this thing in matter plus it will give you kind of what you'd expect so it will give you the usual notion for each point O of the space O you plug it into this you'll get the kind of conditional M over S given O that you'd expect whenever that's defined there's a string diagram way to describe this kind of Bayesian conditional channel or partial channel and I said this is what you would use when your observation is sharp and I also drew it definitely with this triangle to sort of distinguish that case so what does that mean? So in general you say that a state in one of these CD categories so a distribution basically would be sharp when it's copied by the copy map which isn't true for general distributions and if you run this definition in matter plus this really means that this thing O really is just point distribution at some specific element of O so it really is sharp in that sense it's just a point there's no real probabilities probabilistic aspects to it there but in any CD category you can just talk about the sharp states like this they're often also called deterministic so for these sharp ones you ideally think you'd like to do this Bayesian updating but in fact when you've got soft ones so it doesn't have this property there's actually two at least two good ways to do this kind of updating I don't know if this is as well known so I'll just mention it now anyway and they've been studied in some detail by Bart Jakobs in this paper at the bottom so let's say you don't get one of these sharp observations you have just a distribution over O there's at least two reasonable ways to generalize sort of the picture from the last slide to give a notion of updating and Jakobs calls them Jeffries and Pearls update rules so in Jeffries update you basically do it like we did before you have this Bayesian conditional kind of channel or partial channel the normalized box here and you just plug the distribution into it but in Pearls update you turn a distribution into an effect so you just compose it with this cap to bend it around in the picture that's what it means so you plug that into M and then normalize everything so the difference is where the normalization happens and yes it's basically just interesting that because these are reasonable notions of generalization they have different properties it's not obvious that one of them is sort of more rational or something than the other they just behave a bit differently in the formula you can see that the normalization is being applied definitely so if you turn this picture into the usual notation that would look like this so in the top case you're normalizing for each possible sharp O and then taking an expectation over this distribution here you can do the whole thing together and then just normalize either way you do it there the points at these things are actually hard to compute so we don't expect the cognitive agent to be doing either of these exactly even in the sharp case and so as we know we want to instead approximate these kind of things using free energy which I'll talk about next so we are going to try and accommodate free energy somehow in a diagrammatic approach and free energy sort of formally that come up are often given in terms of what you call the surprise these negative logarithm quantities so we'll start by introducing just a new graphical component for treating those that we call log boxes if you have any function E on the on a set X a positive real remember that made in a that would look like an effect in your category X in matter plus that same then what we want to do is talk about this function X goes to minus log of E of X which we call the surprise so just introduce this graphical feature we draw a green box around it and say that denotes this function now and using rules the nice property is the logarithm you can turn it into nice graphical rules this log box feature would satisfy for example this is sort of the way that logarithms have multiplication into addition on the left here if you have this around then you can start talking about surprise so if you have two distributions sigma omega you can do the surprise the distribution relative to the other is defined by this expectation value so it's just the expectation of a surprise of sigma according to omega if you remember I said expectation values are given by sort of plugging a state the distribution into the thing you're looking at the expectation value of so that would be the log box here so we can just define surprise omega sigma in this way in the pictures and important special cases where this come up are when you're calculating entropy which is the self-surprise and the KL divergence we calculated from the surprise and the entropy so it means that whenever we have formula given in terms of these if we like we can instead denote them with this graphical symbol at least with the log box so now let's talk about how we use this to describe free energy so what we want to do in the paper is sort of help clarify the different notions of free energy that we found in active inference in particular the variational and expected free energy so we want one more general quantity that we can understand both of those in terms of I'm just calling out free energy here I'm interested to know what other people think of this sort of naming for what we're doing the situation is that we've got some generative model that's fixed over these two variables S and O like before so remember we have this distribution as box M over S and O it's distribution and let's just say we have another distribution Q now and we'll see examples of this in a second of what sort of Q they would be then we just define this quantity got free energy relative to the other in this way so it's the surprise minus the entropy of Q in the string diagrams then it's just this feature we plug so it's like the expected surprise of M for Q minus the entropy of Q's marginal on S or this is the formula if you want to use the conventional notation which is useful for relating it to existing approaches so we can define this by general free energy quantity and then we'll meet two special cases of it that we're interested in which is the variational and expected free energy it all comes down basically to having a definition of surprise you know you just need this notion of surprise to define everything else so the variation of free energy so we have this fixed model M and we have this soft observation O like we did before so this is box here and then what we're doing is we're considering different possible distributions over S that we think of as different updates we could consider for our beliefs and we define the variation of free energy of any of those states those distributions Q as the special case of the definition from the previous slide so it's like the general free energy where that capital Q just takes this form so it just consists of our new beliefs lowercase Q and our observation O so in formula you could also just draw it like this so you take the surprise from your model M and you just see how it is expected value for those beliefs and that observation subtracted the entropy of Q and what you can show is that this VFE value satisfies this bound of the KL in relation to this kind of Geoffrey update of your model with respect to this observation in particular when it's a sharp observation then the minimal of this VFE will be given by the Bayesian updating in general though we might think about what happens so in general we can think of minimizing this VFE quantity as doing this as finding this Q that approximates this kind of updates we were looking at earlier and yeah in the sharp case it will coincide as all of the notions of updating do so the minimal VFE will be given by the Bayesian update but for these soft observations it's something else it's not exactly the eye for the two notions updating earlier so this is actually a third notion of updating the soft observations which I think is an interesting way to think about what the VFE minimization is doing so we just call this the VFE update so you've got many different Qs you can calculate this VFE quantity for each and you've got a soft observation over here set some distribution and if you find the one with the minimal value of the VFE you call that the VFE update and this wouldn't be equal to the eye for those pearl or Geoffrey style updates that we met earlier so that's the VFE which we'll come back to the other notion of free energy we want to talk about is the expected free energy so that's where we still have our model M where rather than an observation we think of ourselves as having some preferences of observations we'd like to see they're again encoded in a distribution though over O so that's this C and so just with that fixed we can define this one quantity called the expected free energy so that's given by the free energy of M compared with this other generative model where you have so this M here would really be the inverse channel from O to S of M so like the Bayesian inverse here but where you just assert that the preferences actually is the prior on the observations so you're comparing these two in terms of this generic free energy quantity we defined earlier so again you can turn this into formulae and there's loads of stuff going after reference about the different rewritings of EFE and the ways to interpret them in terms of uncertainty and risk and so on and it has this property that you can show it will be bounded by the surprise of those preferences for your models and it kind of gives you a way to approximate them as we'll see so I won't talk, I don't think I'll have time to go too much more into EFE but the point really in terms of what this work's done is just to try and have just one generic free energy quantity we met earlier where we can see the EFE and the EFE both coming up special cases depending on what we plug in here for the two distributions so what I'd like to do now is to sort of put some of these pieces together to show what active inference itself will kind of look like in terms of string diagrams and in particular what we'll do is derive this formula that you'll find in active inference textbooks in a graphical way and I think quite a transparent way but that's the claim so to do so we basically need to give a nice high level conceptual view of what active inference is so this is the way that we do it in the paper so in active inference the key thing for stating the definitions is that our model takes the following form at a high level so there's some notion of, so it's like our discrete time model we had earlier, we're just with two time steps if you like and each of those time steps could break down in terms of further sub time steps but that won't matter we're just abstracted here at this higher level so at the higher level we just have a notion of the current time or maybe like all the time steps up to the current time and that's this S and O here so there's current states and current observations and then there's the notion of future times so these future states and future observations so that could be all the time steps up to some big number something like that all grouped together in one of those discrete time models and again we have the policies and we have the same sort of shape of model where there's some channels here which I haven't bothered giving letters to but showing the way that the policy influences the transition from the state to the future state and observations from each so we just have a generative model where we have policies that we have states and observations and we have future states and future observations and in active inference what we're doing is we're receiving two things we're receiving an observation in the current time and we have some preferences about what we'd like to see in the future so these are each given by these two distributions O over the, Boltzmann O over O and preferences C over the future observations and then we're doing updating with those so I think updating is just our habits the prior over the policies to give our new distribution over policies which we can think of as the agent's plan of how it wants to act so what we're going to try and do is updating like before to obtain a new distribution over P and that is now telling us how we want to behave in the future in a way that will basically you can think of it as saying we want to explain why we're seeing what we're currently seeing and how we're going to obtain what we'd like in the future so in the in the books now, for instance and various places you can find a formula like this that will be justified as coming from the free energy principle in some way basically saying you can do this approximately by making your plan distribution take the following form there's a softmax there's a part relating to the habits of your model so that's your prior over policies these pi are the individual policies in P and then there's two parts of the formula relating to the VFE and the EFE and what we wanted to do is see where this formula comes from in a sort of nice high level way from the structure of the diagram so the usual, there are explanations for this formula there but I found them quite hard to follow to be honest because they were talking about the EFE as being a prior that you then do VFE minimization on top of but you kind of need to do the forward present time first before you can do the EFE and so what we wanted this is a really clear way to see how this just drops out from the structure of the model so that's what I'm trying to show now so what we'd like to do then is to do this approximate updating we're going to do the Perl style updating which look like this in the pictures so we want to get our new plan so our distribution over policies by updating, by plugging in our observation and our preferences what we'd like to have ideally but we're just going to have to approximate it in some way so let's just take the distribution that's inside the dash normalization box now this is the thing we'd like to basically approximate this in the structure of our model we can write it like this and I'll just show then some graphical steps for how we can apply approximations to obtain the formula that we saw and obviously we weren't able to go through every detail of the proof but it should give what it's like to actually work with the string that's really why I'm showing it so when you model take roughly this form there's some part relating to current states and current observations and also future observations I just call them both M here but we just notice that part of the model relating to the present time and future time and so what we're going to do is first focus on this part of the model relating to the current state and the current observation and we want to approximate what's in that blue dashed box and what you can show is that if you do this vfv updating that will be approximately equal to this part of the diagram so this Q is given by for each policy doing this vfv updating so minimizing a variation of free energy so you do that for each policy and then you can view the collection of all of those belief updates as just one channel from p to s so if you think about it basically for each policy you could plug in you would obtain just a distribution now of s and o and you could do updating with respect to that so that's what Q of that particular policy pi would be and you put them all together into this one channel Q and you can show then for each one if you do this overall process where you multiply by this e to the minus vfv quantity here it will be approximately equal to this part of the diagram okay so that's our first step and that's how the vfv into the picture we've got this top part of the diagram we'll collapse it together and just view this as one process going into future observations and our preferences and we'd like to approximate what's in this box now and this is where the efv comes in so you can basically show because you have this the expected free energy will give you an approximation to this here where this is basically like an expectation value for your preferences for each policy so this would be like the density of the preferences being plugged into your model for each policy so I have no time to go into the full details of the approximation steps but they're essentially the same approximations you'll find in active inference text and so on just turn it into the string diagrammatic setting and we talk about how they come about from Jensen's inequality and things like this so this step where you think about the future times sometimes called the prediction step and the previous one was the perception step so now we've rewritten that diagram in terms of some e to the minus of the vfv and e to the minus of the efv as well as our habits and remember what we wanted to do was approximate the normalization of this whole thing so that's when you apply the sprue-box around the whole thing and now if we do that this is exactly the same as the formula we were after so we've obtained the formula now and that's because you know you're normalizing something but it's got these e to the minuses in it so you can also rewrite that in terms of this softmax where now you just replace the e with this log and the other ones you lose the exponentials so this formula if you wanted to if you wrote out what this was for each policy it would be equal to this down here so the claim is that this is a nice way to derive this formula and it's a bit more transparent than the ones that exist so the idea was really just to see we draw what's going on we're updating with a model of this form and we're trying to do this approximate form of updating and just see where we're applying the approximations and from the structure of the model itself see how this formula comes about okay so that so far basically just talked about things that are already there in active interference there's this new derivation but it's existing stuff before wrapping up I'd just like to also talk about something a bit more new that we do with the string diagrammatic approach so that's the talk about the way in which free energy itself is compositional so the motivation for this is that the idea is that we want to think of this one free energy principle applying at all levels of a system so to do that you'd want to know that an agent can say if you've got one of these big composite generative models that it can do its free energy minimization on the whole thing by doing it on the parts because we want to ultimately think it just comes under each part doing its own bit of free energy minimization so that's what we want to make precise and in particular we're going to be talking about the VFE here really all the time and if you recall in the diagrams they look like this so we use this logbox and it just took this particular shape here so what we do in the paper in order to address this compositionality problem is introduce a notion of this VFE that we can apply not just to generative models but one such actually have these inputs as well so these were what I called open generative models earlier because we need to really talk about pieces of generative models plugging together and give them a notion of free energy to even make sense of this notion of free energy being compositional so we've proposed this definition of what we call the open VFE but now instead of just a distribution M over S and O we have a channel from some inputs to S and O given by one of these open models and our Q the thing we're doing the VFE minimization with respect to the thing we're calculating would now have an input as well so it's a joint distribution over the states and inputs and observation takes the same shape as before so you get this other formula that's basically just a natural way to generalize the previous VFE formula to accommodate this extra input wire I now and what we show is that this thing is compositional in a sense that I alluded to so I'll walk through that and the way you do it is just using these graphical properties that these logboxes have that I mentioned earlier so you could turn all of that into a proof in standard probability notation if you like but it's quite instructive to always just be able to work in the diagrams to keep track of the compositional structure of the models and so on so the result says that this open VFE quantity is compositional in two ways the first one here is this quite trivial way so if we have two models running in parallel so like taking a tensor of them and they're just both doing their own calculating the VFE for each of them sorry for the whole thing but it's just given by two running in parallel then it's just the same as calculating the VFE for each individually and adding them together so that's certainly what we'd like to happen and it just follows from the properties of these logboxes interestingly there's the second way in which it's compositional which is the sequential mode of plugging models together so if we have an open model M1 from some inputs into some outputs O1 but those are now actually the inputs for the second model we have these running for the first generated model passing stuff up to the second one and now we want to calculate that result VFE in terms of an observation we can again write it as a sum of two of them but in a slightly different way so observation is just existing on the top wire because it's just the output of the whole thing that gets this observations it's just on O2 so first we calculate the VFE for this model at the top M2 in the usual way and then we add on a VFE calculated for the first model but it doesn't really have an observation O1 right but instead the observation it uses is one that's being passed down from M2 so that's the Q that M2 is using is passed down now as if it's an observation down so M1 so it's kind of like O2 receives this observation does it's updating about Q whatever and passes that down to M1 so in this way we can say that the VFE composes in that both of these are minimising VFE locally where for the M1 model we mean it's minimising it with respect to these Qs that are coming for these O1s that are coming down from above then the whole system is also minimising its VFE because it's just given by summing those two together so I talked about a lot of stuff now and then we can go to a discussion I hope so the main takeaway was just meant to be to try and show that these string diagrams provide some natural language for talking about active inference and I would encourage you to try anyone working on the application formally to take a look and see if they would be useful to you in some way and in particular I focused on some of the what I was calling the main ingredients of active inference so that were generative models the way you update them and free energy and we saw sort of ways where you can describe all of those notions in the string diagrams and the thing that I think is useful about them is that they give you a nice representational language for just drawing pictures of your generative models and composing them like hierarchical models and so on they also let you do the reasoning because you can do a probability theory with them so you can actually reason about what's going on in active inference just with the diagrams themselves there's loads of directions you can create this in future obviously you can keep absorbing more of the what is out there in active inference into the diagrams bit more interestingly I introduced this new notion at the end of how to make free energy compositional in particular we gave this definition of VFE for an open system now so it has a generative model which can have inputs we call this the open VFE I just be very interested in what people think of this definition we introduce some of it seems meaningful Secondly, well throughout the topic I kept talking about just minimising free energy and that's what I said or really the VFE I didn't say how you do it in fact this is normally done with these various algorithms of message passing algorithms so they're an important part of active inference as well and I think it would be great to include these in the setup by having some diagrammatic story of them there's lots of other questions around so one of them is that I talked about these two notions of updating with respect to soft observations and I think normally people tend to focus on sharp observations so they perhaps haven't not everyone has heard of these before but it's very natural to treat the soft ones when you're working in this compositional setup and so there you start to wonder about which of these the Perl style of dating or the Jeffrey style of dating is more natural to think about in the context of cognition and maybe we'll say okay one really VFE updating is the one you should be thinking about that's probably the claim active inference we'd make but it's not going to be nice to think about how this relates to the other two is it just sort of approximating the format or the matter the sort of precise updating and then finally we could try and connect this up to lots of further topics I mentioned at Continuum I'm interested in this notion of compositional intelligence so it would be nice to connect this now to topics in AI and so on and think about how it relates to other basically applications of category theory in AI in particular there's also this whole world of categorical cybernetics I mentioned at the beginning and I'd like to connect a bit more precisely with what people are doing their stories in terms of lenses and so on and something else we were also interested in that I mentioned is that we got into the topic by thinking about consciousness and there's lots of ways as a major theory of cognition there's been just loads of proposals for how active inference is related to consciousness and it would be nice to see how those can be described formally and in this setup however the stream diagrammatic approach helps you make any more sense of those so that's something we'd love to do in future but for now I'd just like to say thanks again to all of you for listening and I'd love to go to a discussion thanks Thank you Sean for the wonderful presentation Thank you. I will first pass to Ali for an opening remark please Thank you Daniel Thanks so much Sean for your really fascinating presentation I truly enjoyed it so I have a number of questions so let me begin by asking the first one so when Bob, Kirk and others took Hamiltonian formulation of quantum mechanics and kind of turned it into the string diagram formulation of it namely ZX calculus the claim was that regardless of its possibility but the claim was that one of the advantages of looking at quantum mechanics in terms of the string diagrams is is more than it's more than just a convenient way of looking at a quantum formulation it actually unveils some properties of quantum mechanics that would be extremely difficult to see with Hamiltonian formulation and even in some of their papers they claim one of the reasons for somehow the stagnant development in quantum technologies and quantum theory is exactly related to the difficulty of working with Hamiltonian formulation so string diagram formulation of active inference kind of takes a similar approach to somehow providing more than just handy tool for representing active inference modeling and actually it kind of opens up new possibilities for further developments of active inference theory possibilities which would somehow I don't know impossible or at least extremely difficult for the current traditional formulation of active inference to see in the current formulation of active inference Thanks, that's an amazing question I would agree that my default is actually in this categorical quantum mechanics area talked about so string diagrams for quantum theory and everything and I agree that that language helps you talk about a lot of things that you would maybe never get round to so much in other mathematical organizations of quantum theory basically things that make use of the tensor the composition a lot so if this led to stagnation in quantum theory probably because people weren't focusing as much on the tensor entanglement and stuff which became very central obviously like in the end that's what people needed to do quantum computing and stuff so now what people are doing with quantum theory includes quantum computing where they're drawing these circuits and so you're drawing basically like string diagrams describing sometimes they use string diagrams sometimes they just use the slightly different convention for quantum circuits but it's similarly a compositional language where you've got tensor products things running in parallel states of these products so that you can talk about entanglement and so on as it's the language that makes it very immediate to represent that because you just draw a box with two eyes you know and encodes entangled state and it makes you want to plug these things together and compose them which is what you want to do in quantum computing so I think similarly if you never use that kind of language you might think of a system often as a fixed thing about the way it interacts with other ones so much and that could lead to overlooking all sorts of things so that's true in any area and I think in active inference it's certainly very true it's natural to think compositionally in this way because you're wanting to talk about generative models being composed from pieces and maybe thinking about how the whole brain works in relation to interactions between parts of it and so on so if you never used this kind of compositional view there is stuff you would miss I think I think in some sense people weren't as behind already because they already were working kind of compositionally right because they're using these Bayesian networks diagrams like the DAGs and the way they're normally drawn are very close they're like they are basically the string diagrams they just don't do the the equations and the rewriting of the diagrams so it's not as far back maybe as quantum theory was in the sense that people are thinking compositionally but it's like it feels like you just want to go one step further to having a fully compositional language you're working in where you know you have the advantage now that you can just talk about taking a whole model and plugging it into another one and it has a completely clear formal meaning and so on which I think is what you want to do in areas like active inference so going from the diagrams that's currently used to string diagrams it's like the logical next step and in terms of new stuff it lets you do I think an example is something like this open BFE thing I guess so it's if you're just always thinking about just a generative model meaning one without inputs you might not think of a notion I'm not saying this is necessarily the right notion but you might not think about this problem of how you want to give a definition for something that is allowed to have inputs as well and once you have that you know how definition you can sort of apply to parts of a composite system more easily so it becomes more natural to use compositionally so that's the kind of thing where without something like string diagrams people can end up overlooking it it wouldn't be impossible to do without the right you need to talk about this notion of a kind of open generative model which just means throwing away some mechanisms to make things be inputs but you could miss it but you really won't once you start thinking categorically thank you Oli please continue if you'd like thanks so much so yeah my other perhaps related question is I mean comparing this kind of formulation to this recent formulation of constructor theory in terms of string diagrams or categorical formulation of constructor theory before going into this question you see you mentioned that this project is a part of a larger project for developing collective intelligence right so the similar kind of situation happens for constructor theory in which it is a kind of meta theory that tries to somehow discriminate between the possibilities of physical laws as opposed to counterfactual laws and how physical laws how there can be a theory that counts for the emergence of possible physical laws so in this sense can you would you say this kind of formulation categorical categorical formulation or possibly this specific string diagram formulation of active inference or maybe other theories of consciousness it can be seen as a kind of providing a path toward developing a kind of meta theory of consciousness and possibly unifying many different strands of theories of consciousness into I don't know a holistic picture that can somehow be compared and positively reconciled with with one another and ultimately reaching the ultimate theory of consciousness or I don't know do you see this line of work I mean providing enough evidence for this line of development research or I don't know somehow maybe even not specifically consciousness but unifying the different aspects of cognition intelligence and consciousness altogether so yeah what would you say thanks I think you're going to keep giving me ideal selling points so yeah that's also something I would like to say yeah I tend to think of it that way in that you know my background is in applying category to just lots of topics and I so naturally do think of it as quite a unifying language and you know the ground on consciousness that I mentioned building on earlier what we did on looking at integrated information theory of consciousness which in the end basically was done in terms of categorical probability so it's like the same setup of the diagrams and so we kind of wanted to do the same thing for active inference so there it's like we've taken both these things and put them in this common language you could have put them in the common language of probability theory before but I think you know I do have an intuition that there is something more clear about it doesn't make it easier to get a conceptual grasp of both theories I think once you've done it this way and somehow also yeah the diagrammatic view does make it much easier to compare them so the hope was basically to you know and still is to keep going and to keep understanding various notions in that language so those things I've looked at in cognitive science have also done this way so there's a series of them conceptual spaces maybe to Gunforce I've worked on creating that in terms of diagrams and so on so I would love to see basically many theories put into this language to make it easier to compare them you know you could try and compare them directly already but I think you want one clear formalization to put them all in and I would say that the categories and the diagrams is the right one to pick because it tends to just give a very clear conceptual view of things the question is whether you have some theory that's very important where the things categories are good at just doesn't quite capture the essence of what you want to talk about there some things like active inference and and IIT so far it seems very natural because in the case of IIT it's about talking about how integrated something is so you basically want to talk about the opposite of that which is something being decomposed and the diagrams basically talk about parts and how they're related which is what you need to make sense of that notion of integration so it's very natural there but yeah I would love basically to see various aspects of cognitive science understood categorically that's something I'd love to do myself as well and as a you know the hope would be then to try and gain insights from all of them and build a theory it's not the category theory itself is a theory of cognition or consciousness it's just a very useful language for relating them and then it would be very exciting to see something you know natively defined in the category 3 as well at the end and there's a feeling that some of what's going on in applied category theory I think like in categorical cybernetics and so on is kind of taking that that approach for perhaps some of the first time previously I've always thought category theory is basically you take existing things and you get a really nice abstract view of them but now I think people are comfortable enough with it that they're like sort of defining things categorically from the outset in areas like that yeah awesome well thank you so much yeah I have many thank you there's like ideal questions you've pointed and we've explored a little bit of the utility and the simplicity and how that could help with accessibility and rigor and applicability all these awesome things leading to re-accounting and reframing, consolidating as well as discovering some new trails between for example expected free energy and variational free energy looking at the equations you might be able to say that they rhyme but you would be many many lines deep into understanding what if any generalizations could encompass the both of them so that that was just a very salient example a few different kinds of questions so how is time treated in category theory or how does active inference treat time today and how do you see the way that time is treated we talk about discrete time and continuous time generative models then there's the past present future multi-agent systems federated or asynchronous communication so how is time treated and how does that give us a different grasp on dynamical modeling thank you I think that's a really I'd love to have a bit of answer for that basically I think it's a tough one you know so at the moment in the talks here I've talked about discrete time and that's sort of very easy to treat with the Bayesian network setup and with these kind of string diagrams because you can just lay out the discrete time steps as process is in the picture like we see here where we have the end time steps here and I don't have anything satisfying worked out yet to say about how you would treat a continuous time case which I think is important in active inference you'd like to basically take I guess basically what you want to do is take the way that you describe this thing with the end time steps and kind of have a form of affording it together but you're unpacking this thing end times and then you can take that thing and imagine you know this abstract and you're unpacking it just not discreetly any more in this continuous way so that you can capture something like the differential equation kind of definition of continuous time thing in active inference yeah so you can certainly work with continuous time thing in the sense of the stuff going on in categorical cybernetics or sort of categorical systems here I guess it would be called an ACT world is kind of it has continuous time systems and talks about plugging them together but that that diagram is sort of just relating their variables is my understanding it's not like the diagram isn't exactly showing the time and in some sense they kind of have to synchronize I think you know it's not an error I'm totally familiar with so it would be yeah it would have been really really cool to basically have this work and then have another part of it talking about you know like we've done for discrete time having a nice description of the continuous time case I think we'll end up being some work to take that into account there would be really nice to see so it just needs the right abstraction I think for yeah taking a picture like this not drawing the time steps as like bits in your diagram but just saying you know that it's like this B thing with like a feedback loop basically is what this is describing and then giving a semantics to that in terms of time evolution to give a continuous version of this for example with like state unfurling continuously in observations for each time step in general I wouldn't say there's like an answer to the question of how is coin treated in category theory there wouldn't really be one answer because category is going to be so generally they tend to be very effective for just for discrete things in general like algebra and so on because they kind of are discrete in some sense that the composition is discrete so continuous aspects and things like continuous time tend to be more difficult in a sense or they're just sort of inside the morphisms as it were they're not in the composition so it doesn't end up looking like this when you're composing continuously but yeah I think there will be people in ACT who sort of would come at you with this particular answer they've got a way they like to treat continuous time and I'm just not familiar with yet cool a little bit of a more educational or applied question so how do we go about drawing and learning to draw is there a software package is there a way that we can get a step by step process to building that familiarity with like when I see this shape then here's what I know and then how do we know what we can and can't do and does that drawing software flag us or do we need to send it to a friend so how do we look at something and then part one build up the motifs in our own aesthetic understanding so that we can understand the compositionality of this as you do today and as we all do today for example for language, English and then part two how do we go from having built that motif based compositional understanding to like now what can we do and then when are we just totally freewheeling and off the rails with a free energy principle or like does anything go if the motifs allow it hmm yeah I wish that I should have the standard answer for the best way to learn string diagrams I think if I'm going to talk about it like this he prompted me to come up with that I don't have something in the top of my head that's the best way but there's so much stuff out there I think it tends to be because if you want to get really comfortable with the diagrams you're learning category theory in some sense but it's not like you need to learn all of category theory it's kind of a relatively modern offshoot in this applied category for your world that's very diagrammatically focused and there will be various nice introductions out there to using them another way is to I'm trying to think I'm pretty sure recently there was a nice paper that came out that was an introduction to string diagrams for computer scientists for example so there tends to be different introductions for different audiences because they just want to pick categories that that those people are familiar with so they can actually have some examples you could just learn the diagrams totally abstractly but it helps to have some examples and you know the old category theory textbooks or things have looked at and a lot of people haven't heard of so they're not particularly helpful so there's like you know Bob has paper categories for practicing physicists to the same physicists that would basically introduce string diagrams to them there's this recent computer science one and there's some work going on in producing one full cognitive science which I think would be really good having an introduction to the string diagrams for those people so you basically look for one in an area you're comfortable with and you find a good paper on it but it would be nice to have a good online resource I guess right that gathers these together so people can just see a great guide for the introductions if you do something like you know there's courses you can do in the sense of the Bob's book but in the case of learning quantum there's something like Bob's long book about sketching a picture in quantum processes that's the kind of thing I learned from like it was in the form of a lecture course but it's basically the same book because there's loads of exercises that will make you have to reason with string diagrams and then you pick the rules up because at first you don't have the same intuition obviously for the rules what can I do with these, can I slide them around like this or whatever but it doesn't take too long to get quite used to it I think which is the nice thing about them that kind of natural they're just these elastic strings and boxes you know and you have that sort of geometrical intuition it depends on that exercise I'm not used to using them I didn't use any like software in the sense of the diagrams I draw in this program called Tixit but it doesn't like tell you how string diagrams will work or anything it's just for drawing them but I know there's more work to develop in the libraries like the Algebraic Julia project it's sort of like an applied category theory language but I wouldn't know if it was recommended as a way to first learn the language yeah so I would recommend finding a nice introductory paper in whatever field you're most used to playing with them exercises to get really used to them for causal models there's this paper Robin and I put out it's not necessarily the very first place of the answering diagrams but the aim is to introduce to people who've heard of causal models saying a sense of Perl so it's Bayesian networks basically but maybe the cause and interpretation of them to get them used to string diagrams and this paper hopes to be a little bit introductory as well, yeah Oli, please thank you getting back to the question about the time representation in this formulation so I take it that this kind of formulation of a Bayesian inference I mean category theoretical formulation of Bayesian inference is largely based on Tobias Fritz definition of Markov categories as C.D. categories, right so as far as I understand it Fritz paper kind of one of its basic assumptions is this kind of unidirectional inference, I mean from earlier times to later times, right or in other words the prediction but in quantum formulation of active inference or quantum active inference there's this attempt to also develop the retradiction aspect of inference as well, right so would you say this recent formulation can also be accounted for for this kind of retradiction in other words can this formulation be reconciled with quantum Bayesianism as well yeah I make the question yeah sorry go ahead because to add one more context here in I think it was in Kirk and Speckens paper there was this clear distinction between classical Bayesian inference which the classical one does not I mean allow for the retradiction but non classical Bayesian inference can be applied for both prediction and retradiction as well oh okay yeah I would love to be a bit more familiar with the quantum active inference stuff I'm not familiar with the retro sorry what was the other version of prediction it's retro retradiction yeah, prediction and retradiction yeah so I think yeah I would have to compare with this another paper you mean Bob's paper on both forms of Bayesian inference to see what they say there about the classical one can you give some intuition as to why it isn't something you can do practically basically the retro one because if that's a general probability then it will be true in some sense here right so you know here it's just being modeled in this probabilistic category and so at the moment they're separate in that you know you have the model which basically goes forward and then you do your updates trying approximate something going back but you don't really have like this one um yeah yeah so I mean the whole idea was that for predictive quantum mechanics we only need to account for the inference from earlier times later times but if we want to account for retradictive quantum mechanics as well we need to somehow account for because as we know not every I mean quantum formulation follows the Bell's principle of local causality so I mean in order to account for all the entanglement phenomena and so on we need to somehow put this bi-directional inference into our model so yeah that was the basic idea behind developing this kind of non classical Bayesian inference does it have something to do with the unitary evolution in quantum theory the way that it has this reversible thing or is it exactly yeah yeah that was the yeah and so you don't expect to have something like that basically where you have this reversible thing built in well yeah so I wouldn't expect to see that exact feature here in the sense of treated because basically something that you can't have in classical probabilities it's not going to it won't exist in this category matter plus because I think that would be basically the same category they would use in that paper and they won't dig a combat categories and they'll work with matter something like this matter plus category for the classical case and if it's just a general it's that sort of physical notion but it's just an idea that the model comes with a forward part and an forward part then I think that's the kind of here just about how you go from forward part to approximate this backward thing but the sort of lens type view of what's going on that's more that it in categorical cybernetics would be imagining I think the model kind of carrying this backward inference process with it as well so that it's you know for each forward part of the model you would have this approximate inference sort of channel stored with it so I don't know if that would address what you're after but it would have your backward and forward part together well Ali do you have any kind of closing slash opening remarks or questions or where do you see this going from the active inference side what does this bring to us and what is opened through what has happened largely this year in active inference and category theory well actually I'm really excited to see this line of development active inference theory and as you know I'm a big big fan of meta theories and all kinds of unification theories and so on so it's I don't know I kind of have this feeling have this hunch that this line of development in active inference theory is I mean it looks quite promising especially kind of tie enough all the loose ends and transcending and many many other areas and discourses and ultimately reaching a kind of coherent picture of quote unquote reality whatever it means so yeah these kinds of development I mean the last year we had tremendous advances in Bayesian mechanical theories and in recent month we have this fabulous line of research in category theoretical account of active inference my hope is that ultimately these different strands can be unified into coherent and overarching framework so exciting times yeah yeah it sounds like do you mean that you're thinking of it as you're basically alluding to the thing going on in cognition and what going on in physics coming together right like one really really meta exactly so yeah the the idea behind Bayesian mechanics one of its premises or assertions was that there isn't any clear distinction between cognitive and non-cognitive things or agents and they rest on a continuum and it's I mean the same kind of mathematical technology can be applied both for inert and conscious agents or sentient agents or whatever we choose to call them so yeah this overarching theory I mean unveiled many interesting phenomena regarding self-organizing systems it changed the whole perspective about how we can look at and define even define consciousness cognition intelligence sentience and all of these related terms so my hope is that category theoretical account of active and inference can also be used for clearly seeing many of these emerging elements in Bayesian mechanics and active inference theory and hopefully well gaining some interesting and potentially groundbreaking insights hmm that'd be wonderful yeah I'd love to apply for those topics here and I'd be very curious to see how categories can come in there sorry Daniel yeah I'll just give my closing thoughts then to you Sean just a few loose notes that again open probably more than they close Ali was right in suggesting and expressing that Bayesian mechanics recently has helped us develop a continuum of active and passive systems so-called living and non-living and inanimate and that brings us to another dialectic to resolve which is life in mind which is where the physical and the cognitive science come together you said they're on a continuum maybe we could say they're on a quantinium and what language could express such work well right now we're speaking in English with the active inference ontology dialect however the phonemes are not intrinsically meaningful the mmm in a Markov blanket or category does not mean it's a sound and so the string diagram language and representation I see as a way to fuse and integrate semantics into the syntax of the actual inscription which enables us to generalize in new ways also recognizing string diagrams are not everything or and so on and then with all of these intersecting vectors from cognitive and the physical sciences we are able to take the compositional cartographic approach for cognitive ecosystems and talk about diverse intelligences biological quantum classical architectures all of these synthetic intelligences and so it's super exciting and I appreciate again your visit and look forward to people's curiosity taking them and also the development of tools and educational materials that that make this easier and then being able to display and use something where the meaning is primal rather than like well this letter represents this it already introduces such a space between the analytical representation and really the string diagram which exists isomorphically with it yeah this is very exciting way of thinking it sounds like you're advocating a kind of structural ontology kind of thing in some sense you're taking the compositional structure what's going on to really be the meaning or really be the real thing that's there not just like I don't know for a while I might but I would love to see string diagrams and other approaches I'm sure and take that role and you've got me very excited about this kind of unification that's going on thank you again Sean you're always welcome and we look forward to seeing where this all goes thanks again for having me really great discussion thank you thanks Ali thanks