 Hello and welcome everyone. This is Active Inference Mathstream 9.1 on March 5th, 2024. We're here with Jonathan Gord and we'll be discussing a variety of topics yet to be determined, or are they? So thank you for joining and to you for any introduction and we'll really look forward to everyone's comments and questions. So thanks again for joining to you. Okay, well, yeah, thanks very much, Daniel, for the introduction and for inviting me to be here on Active Inference. I'm looking forward to a very, very fun discussion. So I don't have anything especially prepared to talk about, which is probably a good thing, because it means we'll be able to extend the unstructured part of this for as long as possible. But I think just to give a little bit of context, I want to talk about an area where I think some things that I've been working on, some collaborators that might have been working on, that might have some kind of intersection of interest with things that, you know, Active Inference type people might care about, right? So, and in particular that concerns the relationship between kind of computation, observation and cognition, and specifically using methods that come from category theory and topos theory and some other kind of branches of mathematics and theoretical computer science to understand the relationship between system, specifically the computational and algorithmic complexity of systems versus the computational algorithmic complexity of observers of those systems and how those things trade off between each other. So just to give a little bit of context to that, I want to show these are just some visuals from a paper that I put out about a year ago now and that this kind of really defines this research program that I've been working on for the last year and a half in some form or another, which is looking at exactly this trade-off using category theoretic machinery. So, here's a specification of a Turing machine. This is just a simple deterministic computation. It's saying, you know, you have a Turing machine that has this head state and this tape state and on the next step you're going to replace the tape state with something that looks like this, the head state with something that looks like that and you're going to scroll the Turing machine head left or in this case scroll it right, et cetera. So this is just a, you know, specification of a very simple computation. I think this is a two-state, two-color Turing machine on a simple, you know, one-dimensional tape. It's about as simple a computation as you could define. So if you run that thing for some initial condition, you'll get an evolution that looks like this. And so right now this is just a purely deterministic, you know, single-path evolution. But from this we can construct, we can build a mathematical structure, namely we can build a category. So, and the rules for how we build that category are very simple. Each arrow here is some simple computation, some application of the Turing machine transition function. And then what we can do is we can say, well, anytime we have two arrows that are laid end to end like this, we can compose them together to create a third arrow that goes like that. I may even have a picture, yes, like this. So, you know, we have a computation f that takes us from x to y, a computation g that takes us from y to z, and we obtain a composite computation g compose f that takes us directly from x to z. And we also add some additional edges, some additional arrows on each state itself, a sort of identity, an identity operation that maps the computational state directly to itself. And so this combined with some axioms of associativity and identity forms a category of elementary computations. So this is a very, very simple example. But what I want to try and build up towards and kind of pump your intuition for is a category which I call comp, which is a category whose objects are essentially the class of all data structures and whose arrows or morphisms are the class of all elementary computations. So you start by just applying all possible computations, or in this case, for the case of Turing machines, all possible Turing machine transition functions. And then you do this closure operation where you essentially do what I'm doing here, but you allow those elementary computations to be composed together in arbitrary ways. And so that gives you effectively a class of all possible programs. So this category comp contains not only all possible data structures as objects, but all possible programs as morphisms. And this is a very rich category with some very interesting algebraic structure that we'll kind of, again, I'm sure we'll allude to in our subsequent discussion. But in a sense, when we do this operation of taking what mathematically we call a transitive closure, where we allow two elementary computations to be composed together to produce a third, we are essentially kind of neglecting considerations of computational complexity, because this arrow here might correspond to one application of the Turing machine transition function. This arrow might correspond to another application of the transition function, but this composite arrow might correspond to two applications of the transition function. And so somehow, when we allow arrows or morphisms to be composed in this way, we're neglecting considerations of the complexity of operations. So the question then is, could we imagine constructing a generalization of Casper theory which takes into account computational complexity? So here's an example of how that would look, right? So here you can see every edge, every morphism has been tagged with certain computational complexity information. In particular, it's been tagged with a number specifying what is the minimum number of applications of my transition function, what's the minimum number of elementary computations that I need in order to evolve from this data structure to this data structure. So here, to go from here to here, it's just one. To go from here to here, it's one. To go from here to here, it's one, et cetera. But to go from here to here directly, it would be three. To go from here to here directly, it would be two. And just for convention, we say that the identity computation, the trivial computation always has complexity zero. So this is, again, a fairly simple mathematical structure. And you can construct this, again, using purely category theoretic technology by building a particular functor from the category of computations and from the category of data structures and computations to what's called a discrete co-borderism category. And again, we might discuss that later on if people are interested. But let me not get too bogged down into the technical details of how we do that. But once you've got this, it gives us immediately a very nice way of characterizing phenomena like computational irreducibility. So there is this idea that has existed in some form or another since the very early days of theoretical computer science, since the days of girdle and turing and post and church and so on. But was given this term computational irreducibility by Stephen Wolfram, where the idea is essentially that you just, you know, intuitively you describe a computation as being irreducible or, you know, the result of the computation is being irreducibly complex if it's not possible to shortcut it in any way, right? So where, you know, it takes, that computation takes a certain number of steps and there does not exist a shorter computation that would give you the same answer in less time. And one of the nice features of thinking about computations and their complexity algebraically like this is that it gives you a purely algebraic characterization of irreducibility. In particular, what it says is that irreducible computations are ones for which the computational complexity acts purely additively under composition. So if it's the case that if we compose, say, two computations of complexity one together, if the resulting composite takes, you know, has complexity two, then it's an irreducible computation. If it has complexity less than two, like one, then that means that we could have jumped directly from the input to the output without having to pass through the two elementary computations that made it up. So that would be an example of a reducible computation. So reducible computations are ones whose complexities compose sub-additively in this category theoretic sense. And, okay, so here's an example, here's an illustration of showing what intermediate computational states you had to go through in order to get from one data structure to another data structure. So to go from here to here, you had to go through steps one to two, to go from here to here, you had to go through steps one, two, three, and four, et cetera. So you can build up a kind of complete algebra of complexity this way, which has some nice properties, which again I can talk about, but let me not get too bogged down in mathematical details right now. But here's the thing I really want to talk about, which is what happens when you go to multi-way systems. What happens when you go to non-deterministic computations? So now imagine having, instead of just a Turing machine with a single rule, a single transition function, that just evolves deterministically with a single thread of time. Now imagine having a Turing machine that has, say, two transition functions, like this one and this one. And so at any given point, it can apply one of the two. And so now evolution, instead of just being a single path, becomes this kind of branching structure, which if we didn't have any merging would be a tree, but because we are merging equivalent states, it's actually just a kind of more general directed graph. And so it looks like this, and this we call a multi-way system. And so we can build a category out of these multi-way systems as well. We can build a category using exactly the same rules. So again, we do this transitive closure operation. So we add an edge for every possible composition of these elementary computations and an identity edge that maps every data structure to itself. But it turns out this category has even more structure than the single-way system that we showed previously, because now it's possible to compose computations not just sequentially in time using ordinary composition, but it's possible to compose them in parallel across what is sometimes referred to as branchial space. So essentially you're saying instead of, you know, the ordinary morphism composition that I showed previously is essentially saying, you know, I apply this elementary computation, then this elementary computation sequentially. Whereas this parallel composition is saying, I apply this computation and this computation in parallel to the same data structure. And so that parallel operation is what causes these branches, right? Effectively, when you have two threads of time that are branching from the same state like here, that's arising because we have chosen to apply this elementary computation and this elementary computation together in parallel rather than sequentialized in time. And so here you can see this parallelization indicated using what's referred to as a branchial decomposition, which is just a kind of a visual way of decomposing what's going on between these different threads of time. And again, there's a purely algebraic characterization of what's going on here, which is that what we've done is we've taken our simple category that we started with and we've equipped it with a tensor product structure and so it's become what we fancily call a monoidal category or actually a symmetric monoidal category. So the tensor, so we now have these two operations. We have sequential composition in time and we have this tensor product operation which is a parallel composition in branchial space. And just like before, where we equipped our edges, our morphisms with certain computational complexity information, we can do the same thing and we described how those complexities composed sequentially in time. We can do the same thing and describe how the complexities compose in parallel as one composes morphisms in branchial space. And so this allows one, by exactly the same token, to quantify multi-computational irreducibility rather than just computational irreducibility. So now multi-computational irreducibility becomes a measure of how additive or sub-additive your time complexities are when you compose them in parallel through the tensor products rather than just in sequence through standard morphism composition. Okay, but I promise I am cut and so here's an analogous diagram to the one I showed before showing all the kind of intermediate steps that are being applied when one constructs computations or indeed multi-computations by composing elementary computations both sequentially in time and in parallel in branchial space. Now, but I promise I am, the point that I'm trying to get to is that it turns out that in addition to just being a useful way to think about computational complexity theory and to formulate complexity classes like polynomial time or non-deterministic polynomial time, et cetera, it turns out this is also an interesting way to think about the role of observation in sort of computational models of reality because so here's where I'm going to get a little bit philosophical and I don't immediately have a slide or a graphic that I can show to illustrate this point but so when we think about modelling a system computationally one has to bear in mind that there are really two computations going on, right? There's the computation that the system is itself performing and then there's the computation that the observer the person who is measuring that system and concluding things from it there's the computation that they are performing and somehow, you know, so when we construct models of reality or when we construct models of systems you know, and we want to describe kind of at a meta level what we're doing in computational terms there's our own computation that's going on inside our own internal representation of the world and then there's presumably some external computation that's going on outside and then when we make observations and when we make measurements when we construct theoretical models what we're doing is we're somehow constructing some kind of encoding function that allows us to take a concrete physical state of the system we're observing and encode it as some abstract state of the internal model that we have of what's going on and that's all very well but then now we don't just have one computation to care about we have three, right? We've got the computation of the system computation of the observer and the computation this encoding function computation that's responsible for their, you know the interface between their internal model of the world and the external reality and the computational complexities of these computations into play in an extremely interesting way and so the, you know, part of the reason for trying to develop this algebraic semantics for thinking about computational complexity and multi-computational complexity was to try to give one a systematic way to reason about exactly this three-way interplay between systems, observers and encoding functions and so in particular when we make, when an observer makes a model of the world one thing that they're doing is that they are you know, for any model isn't just, you know, a complete description of reality there's a certain amount of coarse-graining there's a certain amount of taking a bunch of states that in the system itself are distinguished but in the internal model are treated as the same they're kind of, you know they're cast in the same bucket so in some sense, you know, how coarse a model is, is a measure of how much the encoding function fails to be subjective right, and so again there's a kind of algebraic or category-theoretic characterization of what's going on that, you know the fewer of your morphisms are epimorphisms the more coarse your model is the more abstract or idealized your model of reality is and so then the interesting thing is that this characterization of multi-computational irreducibility this measure of how additive or sub-additive your complexities are as you compose them together in parallel gives you a measure of the relative complexity of the evolution function that is the function that evolves your computation forwards in time versus the equivalence function that is the function that declares that two computational states, two data structures are to be treated as equivalent and that interplay, I claim is a kind of abstract meta-way of thinking about the interplay between the computation of systems versus the computation of observers because, you know it says the role of the system is to evolve forwards in time whereas the role of the observer is to take states in the system that are distinguished in reality and say subject to my idealized model I'm going to treat these as the same, so the system is defining the evolution function but the observer is defining this equivalence function and so then the trade-off in their complexities becomes exactly a trade-off between what are the what are the algebraic rules that describe the complexities as they compose sequentially versus the algebraic rules that describe the complexities as they compose under this tensor product operation and so I've shown this in particular for Turing machine systems but this is a very general kind of algebraic semantics, you can apply it to hypergraphs, you can apply it to combinators, lambda calculus, doesn't matter in a sense there is just one category up to isomorphism of data structures and computations and there are simply many different ways of parameterizing what that category is doing through things like Turing machines or hypergraphs the algebraic formalism transcends the particular details of the computations that one's dealing with and so yeah, as I say what one ends up with is I think a fairly general formalism thinking about the interplay between observers and the systems that they observe and I promise I'll stop monologuing in a moment and we'll try and pick apart what I'm really talking about here but so I'll just conclude with you know, once one has that algebraic semantics a whole bunch of things which I think previously would have been at least to me, previously seemed like kind of fundamental confusions about how scientific observation works and how it interplays with computational models those confusions kind of become much easier to clarify once you think about it in this kind of more compositional way so to give a very simple example or a kind of very degenerous example you can, you know within this algebraic semantics you can effectively trade off the computational complexity of the system for the computational complexity of the observer right so you can have you can have kind of in effect two degenerate cases you can have the case where the system itself has a completely trivial evolution function the system itself has is doing something completely elementary in how it evolves but then the observer has some incredibly complicated equivalence function that makes the system look like it's doing something really complicated even though what it's actually doing is something very simple and so then you have the phenomenon where actually kind of all of the complexity is in the eye of the observer you can also have the other degenerate case where the observer is doing something absolutely trivial where the encoding function the observer's own internal representation is just an identity function or something so there's no complexity there but the system is doing something incredibly complex it's doing some really sophisticated universal computation and so that will also appear very complex to that observer and you can also have any kind of intermediate there's this vast interstitial space between these two extremes and so one thing that's kind of always one sort of philosophical problem that I've always kind of been interested in ever since I was a kid which is this sort of this tension between empiricism versus rationalism the question of if you look back at the history of early European philosophy or certainly western post enlightenment philosophy you had people like Descartes and Leibniz and so on in a more sophisticated way Bishop Barkley with subjective materialism who were trying to push for this idea that all the sophistication is what's going on inside the observer's head and what goes on in reality is somehow secondary and then you had people like Locke and Hume and the empiricists who were saying no no we should try and get the observer as much out of the picture as possible and we should say all the sophistication is going on kind of in the external world and one nice consequence of this one nice consequence of this formalism is it gives one actually an algebraic way of kind of parameterizing this spectrum from rationalism to empiricism you can choose the rationalist extreme where you just have some space of all possible computations and the observer is doing all of the work to try to narrow down to a particular one or you can have the kind of empiricist extreme where the observer is a completely elementary system and everything the observer is just being built up from a kind of bottom up construction or you can have anything in between and in a sense we now have I think the beginnings of a mathematical theory that's able to explain how those complexities trade off in a very direct way so I think there's potentially places of mutual interest there in thinking about cognition, observation, measurements scientific modeling and so on in fundamentally computational terms so I think that hopefully that will provide some context for a discussion Thank you Great opening There's so many places to spin in and jump through I guess I'll start with the two things I wrote down were unity is plural and at minimum two and beauty is in the eye of the beholder and the way that these kinds of pieces of timeless wisdom that describe that fundamentally relational component to observers in systems which are not all kinds of systems per se but those kinds of systems those things are true for and then the way in which along the formalism that you described with the system observer encoding freeway partition and then the way that in free energy principle and the particular physics that interface gets broken out from the agent's perspective into the incoming sensory and the outgoing action so then that results in the four fold particular partition so maybe just to kind of well there how do we partition the from a category theory perspective or however the action perception loop or the engagement loop like how do we make a topology or compare contrast different topologies and flows over this kind of seemingly pervasive or universal interface like concept That's a fascinating question so I don't have the answer just maybe a place where both you Daniel and perhaps David may have useful perspectives on this because I read Carl's work a few years back and so I have some familiarity with the terms but I'm by no means a kind of an expert on free energy principle or active inference and those kinds of things but I think it's a very good point that you raise and so I should begin by just being honest and say that everything I'm doing, all that I just described is of course an idealization and that in reality, in particular it's an idealization which I think you were very right Daniel to kind of pick up on it's an idealization in which we say the observer is completely kind of non-interacting with the world somehow right that in a sense that there's just input coming in and nothing coming out but of course we know that's not really how observation works observation is necessarily a kind of two way process and so what's needed is not just this kind of very clean algebraic semantics that I've described here which assumes that there's essentially a one way function from the world to the observer but actually something more like a kind of second order cybernetics description of what's really going on where you have first order and second order interactions where it's exactly as you say you can get these sort of feedback loops from observation to action and back again which is probably, I mean still an idealization but probably a more realistic idealization for how real observers and real measurement apparatus work so I just want to begin by saying that I don't know and the question of how this formalism interplays with things like second order cybernetics and other areas where I know these kinds of questions have been explored that's something I'm very interested to find out about going forward but I think okay so at some point maybe you could help me understand potentially where things might fit in with the kind of broader active inference framework I'm not sure I necessarily have that much more to comment on than that other than to say that in a sense maybe one further comment is that you asked specifically about how the kind of compositional category theoretic perspective might be useful so I don't think category theory in itself is going to be the complete answer so it will be category theory augmented with some other things computational complexity probably second order cybernetics and some other things that I may not be aware of but one place where I think that viewpoint is useful at least on a philosophical level is the idea that comes about that you really obtain by studying mathematical structures in a category theoretic way which is that the identity of something you can define it both in terms of its intrinsic properties or you can define it in terms of the stuff that you can do to it so this was really the transition that happened in the foundations of mathematics as a result of people like Samuel Alenberg and Saunders-McLean so category theory has its origins in this sort of slightly obstruced branch of algebraic topology initially developed by people like Alexander Guten-Diek and Jean-Pierre Serre for doing homological algebra for reasoning about the algebraic structure of topological spaces but then later in the 1960s-1970s American mathematicians Alenberg and McLean realized that it was useful not just for thinking about topology but for thinking about mathematical structure in general and then later on applied category theorists started saying well maybe it's useful for just thinking about structure in general but the key kind of conceptual or philosophical shift that it imposes is historically thanks to the work of people like Cantor and Frager and Russell and so on people have thought about mathematical structures in the foundations of mathematics in terms of set theory and the idea in set theory is you have things like the axiom of extension that effectively say set is defined by what's inside it right so in other words mathematical structure obtains its identity you break it apart and you look at what's inside in category theory it's a completely different view the view instead is you say well no you can't look inside it's a fundamental rule of category theory that you can't look inside an object it's internal structure if it has any is sort of out of bounds to you and instead you give that object identity in terms of how it relates to other objects of the same type right so in other words you can ask what can I do to this what functions can I apply to it what functions can I apply to something else that map into this so if I want to define the real numbers or the integers or something from a set theory perspective you would say the essence of the real numbers are all the numbers that are inside that set all the numbers that are inside are whereas the category theory perspective is no the essence of the real numbers are all the functions that you can define that take real numbers to some other number system or real numbers to themselves or that take some other number system into the real numbers etc and some of the deepest results in category theory like the Oneida Lemma and other things are telling one in some very precise sense that these two perspectives are really the same at some fundamental level that identifying an object based on its internal structure based on breaking it apart and asking what's inside and identifying an object by asking what can I do to it and what can this object be transformed into and what things can be transformed into this object those give you exactly the same information it's far from obvious that that's true the Oneida Lemma is a very one of those results where you can never quite work out if it's obvious or if it's incredibly mysterious but I tend to fall on the side that it's incredibly mysterious it's far from evidence that those two perspectives would really be the same and yet the point you're making Daniel I think is that in a sense historical ways of thinking about scientific observation have tended towards the set theoretic viewpoint tended towards the viewpoint that we understand systems based on breaking them apart into constituent components but perhaps a more realistic view is something more like the category theory perspective where we say I understand a system by interacting with it by asking what can I do to it and how does it behave when I perform certain operations to it and that's a fundamentally two-way process that involves not just passive observation but also active participation and somehow we need to develop a formalism that incorporates those two elements and maybe it already exists and it exists in this large literature tree of which I'm largely unaware that's partly why I wanted to be here to try and find out what things I missed so to speak Alright, a few points, self-evident is far from evident also I tend to the mysterious which is to say more with less especially for these frameworks because they're less opinionated so that their space of internal semantics can be larger and then that description that you provided with the relationship between the set and the category theory so I kind of summarized it as set is to essential inclusion as category is to relational function now if our concept of organismality or of action in the niche is constructive compositional material then we are looking for like what is in or out is the microbiome in is the pheromone and the colony in or out of that thing because it's looking for like a static material answer and then in contrast the other side of that coin highlights the dynamic like whatever it is that self-organizing of the tornado is the tornado whatever it is that self-organizing for the ant is the ant and then also this like hint slash mobius strip or something that those two in the moment are indistinguishable and yet systems that we choose to define one way or another or keeping both open those design decisions do make all the difference even if for real systems as they're observed there's indistinguishability right right I think that's a very important point and one which I mean this is a it's always a kind of concern I have whenever I start thinking about you know embodied cognition or you know extended phenotype type ideas right that that in a if what one is trying to do here is construct a kind of formalistic model of observation or of cognition or something then as a kind of first order approximation one has to start by somehow decomposing the world into observers and systems but of course we know that that decomposition is somehow arbitrarily imposed right and that if you take these things to their extremes and you allow essentially everything that the agent is interacting with to be considered you know like not just the microbiome as you say but also tools that they construct or environments in which they exist and so on if you as you start to consider all of that to be a component of that organism you know of that agent's phenotype which is a completely reasonable thing to do then and you start to you know you start to say okay well their cognitive processes are not just localized to their brain or their spinal column but are kind of used as they use the paper they write on the books they read etc again perfectly reasonable thing to do and sort of somehow more descriptive of what's really going on but my fear is always if you take that too far then you know you end up destroying the whole assumption that the idealization was based on which is that you can neatly decompose you know the world into observers and systems and so I always get a bit nervous when thinking about that that it's like yes you know we know this is an approximation and we know that that approximation is not really true but how you know how much can you afford to loosen your grip on that approximation before the whole thing just kind of fall apart I don't really know the answer to that question but I think it's an interesting one. Yeah how about I'll ask some questions from chat and then give your first thoughts and then we'll see maybe where that kind of lands with further questions or how it connects to active inference so that sounds good oh by the way should I keep my screen share on or should I yeah we might want to go to a figure so it's fine okay yeah alright quantum bell wrote how does this help us reason about causality that's a fascinating question okay so that's that's another major aspect of you know why I think this research program is exciting because so and again this is something where I'm interested to get the kind of active inference perspective because I know this again it's a topic which much has been written and I'm largely ignorant but yes so one question you could ask is you know if you have a description of a computation like this like let's go back up to the Turing machine case the single-way Turing machine case that's relatively easy to analyze although still far from obvious what's going on so suppose you have a computation of this kind and you want to ask what is its causal structure in other words you know each for each edge each time I'm applying this Turing machine transition function can I construct some kind of graph you know some directed graph representation that tells me how these events are linked together so in the original research program the so-called Wolfram model research program that started a lot of these investigations we were looking at this all the time we were looking at taking computations and looking at that causal structure and trying to infer things about what was going on about the semantics of the computation based on causal relationships and at a certain point I started to realize and I think other people had realized this before I did but I'm often slow to pick these things up I I and other people started to realize that the notion of causality we were using was kind of nonsense I mean it was not completely hopeless but it wasn't really causality or it couldn't really be called causality in any in any definite sense so what do I mean by that so first of all it was a very technical problem so if you're looking at something like a turning machine evolution or a hypergraphic writing system as we were then there's a very tempting and apparently obvious natural definition of causality that you can use which is to ask you know when you split the world up into events that take some part of your data structure as input and output you know some other part of a data structure as output then you can very easily ask well does the output of one event intersect with the input of another event so if I show the hypergraph example it's perhaps easier to see so you have a hypergraphy writing rule that looks like this right so you know you have you say if I have a piece of hypergraph that looks like that I replace it with another piece of hypergraph that looks like this so at each time you apply an event you can think of that event as you know ingesting certain hyper edges and kind of you know replacing them with others so you can divide it up into a sort of the input hyper edges that are being ingested versus the output hyper edges that are being produced and so then you can ask well do the output did I used did I subsequently ingest in some future event hyper edges that were output in some previous event well if the answer is yes then pretty obviously that future event couldn't have occurred unless the previous event had already occurred so then you could say well then one of those events causes the other so in general you could say that two of you know event A causes event B if it's the case that the output that the collection of tokens that was produced in the output of event A has a non-zero intersection with the collection tokens that were ingested as part of the input of event B and that's a very tempting very natural definition of kind of causality in these systems turns out it doesn't really work I mean it works pretty well but there are cases in which it fails and it fails pretty spectacularly and so the kind of canonical case where it fails spectacularly is that you can have events that don't actually do anything right you can have events that just kind of touch an edge touch a token and output it again unchanged but maybe you know it modifies the name it modifies the identifier but it doesn't actually change anything about the structure of the hypergraph or the Turing machine state or whatever so pretty obviously that event doesn't matter it shouldn't be causally related to anything in the future but because it ingested the edge and then didn't do anything you know it did some identity operation but then you know produced it in the output again it will kind of register as being causally related to any future event that used that edge even though it didn't make any difference that's just one very obvious example there are other cases where it became clear that whatever this thing was whatever this algorithm was detecting it wasn't really causality so I tried to think about you know what's a more sensible definition of causality and I started working on things to do with you know a slightly kind of blockchain inspired ideas where you say okay well rather than just arbitrarily assigning you know identifiers to these tokens every time they're created what if I recursively construct the identifier of the token based on its causal history so in other words each token like each hyper edge or each state in my each tape square on my Turing machine tape I the identifier is not just some random number that gets generated by my algorithm but instead its identifier is a directed graph representation of its complete causal history well then kind of recursively it's identified can only change if the causal history was updated and so you don't end up with these kind of these causal relations that I described before so that seemed like one tempting way of resolving this problem but then I realized actually there's a much more fundamental problem there's a philosophical problem with the way that we're thinking about causality which is that it's not really you know so okay this is a long tangent which I'll talk about a little bit but I won't get into the complete details unless people are interested but I ended up talking to a bunch of philosophers who you know had some causality and people who worked on parallel programming and quantum information theory and other places where causality was studied and asked them kind of basically what do you mean by causality what is this thing we're trying to define and in some form or another all of the definitions boil down to you know event A causes event B if had event A not occurred then event B would not have occurred so in other words you need a counterfactual need some possible history some possible world in which event A didn't happen but if you're reasoning about a purely deterministic event system like a Turing machine that doesn't make any sense because if you're you know if you have a single Turing machine transition function there is no possible world in which that transition function didn't fire in that particular way because if it didn't fire in that particular way you would not be reasoning about that Turing machine anymore you'd be reasoning about a different Turing machine so suddenly this you know to make sense of these notions of causality kind of Leibnizian you know modality is view of reality that just doesn't exist for these deterministic computational systems so either you need to define computation you need to define causality only at the multi-way level only at the level where you have many computations or possibly all computations happening in parallel and then you can define causality relative to all of you know relative between them or you were kind of posed right there wasn't really you know that seemed like the only kind of the only get out or you need some fundamentally new philosophical theory of causality that I was not qualified to produce and so that's again part of the reason what part of what motivated this general research program which is trying to think about this category of not just a single computation and with a single sequence of data structures because it's clear that you can't in a philosophically meaningful way assign causality in that case but rather you know looking at the algebraic structure of the category of all possible computations and all possible data structures and in that situation there is a notion of causality you can equip that with and there's a nice again a nice mathematical description in terms of in terms of week two categories and so on which again I can talk about if people are interested but yeah so it's clear that these things are very deeply related that this sort of theory of the category of computations and data structures and the theory of how you assign causality in a meaningful way are very deeply related and I'll just mention one other thing on that topic again just the area which I find quite exciting because it's an unexpected spin out of this program which is that so once you have a way of consistently applying causality at a per token level in these systems it gives you a way of vastly generalising what computation is and in particular you can derive something which I call well I'm provisionally calling covariant computation although it should probably have a better name than that which is so you know in our traditional kind of Turing Church type type models of computation computation is a purely forward in time operation you know so at every point you know you have a complete data structure and computation is about deriving what is the next state of that data structure so in a sense it's only a forward in time thing you might be able to kind of reconstruct the initial conditions based on some subsequent data structures you might be able to go backwards in time but that's essentially what you're doing but then you could imagine okay suppose I don't know the complete state of my data structure I know instead I know one part of my data structure but I know it it's history throughout all of time so you can imagine say an elementary cellular automaton or a Turing machine tape where you know nothing about the tape but you know the state of one cell and you know it you know throughout all of time and the question is what can you infer about the rest of the computation and it turns out that for those kind of structured array type systems you can infer a lot you can actually evolve the system not forwards in time but sideways in space and obtain a kind of causal diamond that so okay the top left top right bottom left bottom right corners are undetermined but everything inside that diamond can be determined just from that one row or that one column that you know sort of extended throughout time and so you know and that's a fundamentally different notion of computation so it's a version of computation which is not forwards in time but sideways in space but you can also have a computation that is sideways in branchial space where you know one complete state you know one branch of the multi-way system extended throughout time and then the question is what else can you infer about the rest of the multi-way system just from that one branch and again the answer turns out to be you can infer a lot but not everything and so just like in the reason I call this covariant computation is because it's very analogous to what happens in relativity so in relativity once you buy into this notion of general covariance and the notion that space and time are kind of fundamentally the same thing then you have to somehow relax your traditional view of what dynamical systems do which is you know we typically think of systems as evolving you know you have a snapshot of your initial of your data localized on a for a particular state of space on a particular space like hypersurface and then your laws of physics tell you how that space like hypersurface evolves forwards or backwards in time but in a covariant picture of physics then you must also allow your initial data to be defined on a time-like hypersurface and for you to be able to evolve that time-like hypersurface sideways in space or mixtures of the two and so on and so it's clear that there's a vast generalization of ordinary computation theory that you can construct that's kind of physics inspired in that sense in which you can have mixing of space-time and kind of multi-way directions in a completely consistent way but to make those things consistent you need to have a definite way of assigning causality because any computation that you do even if it permutes the directions of space and time and branch of space and so on must always somehow preserve the causal structure has to respect the causal structure of what's going on or else it's inconsistent and so this question of how you construct a covariant theory of computation it turns out intimately related to the question of how you take this category of computations and data structures and equip it with a consistent notion of causality so very interesting question we could talk about that at great length Okay, to follow with a few pieces it's very related to Professor Mike Levin's notion of polycomputing and about the necessity for a causality concept to be created or deployed when the question arises was that me? Was that action or change due to me? Also, connectivities even just in the neuroimaging setting which is kind of the cradle from which active inference and free energy principle arise from it's really important to distinguish the functional effective and anatomical connectivities and that was one of the points that Toby Sinclair Smith made in his dissertation which is that a lot of times the Bayesian graphs don't convey all of the necessary and sufficient information to make the reproducible computation which is one of those kind of what's missing from the graph is what motivated a lot of the category theory developments in active inference as well as some of the formal ontological works with Sumo and Dave here and Adam Pease because implementing modal and higher order logics is really important if it's a possible situation where a mind can have a perspective on a mind and all these things like that then the ant Turing tape the tape is the pheromone and then the decision space is the nest mate's scrolling so when you had a deterministic Turing tape that was like a movie because the nest mate couldn't make any choices except for internal action which is kind of side topic but couldn't make any choices on the tape whereas when there's a multi-way which is basically inactive inference what we talk about in terms of affordances and the policy space and the temporal depth of planning counterfactuals on action and action conditioned world transition states like the B matrix all those kinds of topics come into play because if you want to have a causal buffer or grasp on what is it that something that could do otherwise does what does it cause to do when it does or doesn't do otherwise you need something like a deterministic handle around what could be a probabilistic or deterministic but at least multi-way map of some kind of cognitive territory. That's a very interesting perspective. Again, I'm betraying my ignorance of active inference theory here but it sounds almost like when you have this kind of interplay between epistemic versus pragmatic those two aspects of how this kind of speculative part of cognition works I this is something which I thought about in a completely different context in relation to things like quantum information theory but I wonder if there's a potential overlap there between so there are certain situations when thinking about these kinds of systems purely abstractly where you kind of need two different notions of causality you need a kind of speculative notion that can be that's dynamic that can be rewritten and then you need a kind of definite notion that's immutable and so a classic example of this is for something like quantum information theory so you can have superpositions of causal orders, you can have quantum switches, you can have causal structure that exists in superpositions of different kind of directed graph states but then once you apply Hermitian operator once you apply a measurement the causal structure is definite because then everything is relativistic and you have covariance and you have similar things as I understand with distributed computing with parallel computing where you potentially allow for speculative execution for a certain number of steps where you're kind of treeing up this multi-way system and you have a superposition or at least a collection of possible causal histories but then eventually you have to choose an actual operation to do and then the causal history you have this big block that gets laid down and then the causal history is somehow definite I wonder if there's a way of thinking about sort of speculative execution of agents and the interrelation between that speculative execution and agent actions in terms of again this interplay between two different causal structures between a dynamic one versus an immutable one Yeah, well one funny way to think about that is a single agent that has this counterfactual contemplative ability could be in the center place foraging arena and then imagining with discrete branching paths like a chess algorithm or like a probability distribution could be like imagining where it could go but not all cognitive things or the kind of things that make plans of their own actions whereas like an ant colony has nest mates on the ground so they're actually realizing in these finite trajectories the real exoskeleton on the ground that plays out ending up with those simulated trajectories could have been simulated or could have been probabilistically blurred but that's kind of the difference between like the embodiment and like the body moving there for a mammal or for an animal and then like the mind simulating it and then just to the um the epistemic and pragmatic trade-off in decision-making so let's just say that we're in that multi-way moment we'll just have two options two different slices of the B variable and the policy selection question is about which way are you going to go which affordance in the moment policy is basically the affordances for the time horizon of planning but if it's only one time step or just the next one then the affordance space is just the actions that can be taken one way to make it so that what happens is the likeliest thing, path of least action which is kind of what opens up the whole physics of cognitive systems angle in contrast to like a reinforcement or learning perspective what makes it the likeliest thing is starting with habit so it could just be drawn from a fixed habitual distribution however for adaptive action habit gets up weighted with expected free energy which is a functional that takes in the policy space because there's a probability over actions and then up waiting policies according to their score on expected free energy which is consisting of epistemic plus pragmatic value so how much is it going to align the observations to be what I like to see that's pragmatic value with a preference what is my expected information gain that's the epistemic value so how those are parameterized make the agent that always seeks out new information or always goes with habit there's so much policy space because the knobs are not just simple sliders or there are multiple knobs even though they are seemingly quite conciliant and minimal like it's hard to imagine less yet especially when there's richness in the environment even simple systems can have like enormously complex or adaptive behaviors I'll just leave it there no I I never really thought of the okay the two things right so first of all the perspective of thinking of an ant colony or a termite colony or something as being akin to a mind that's I was familiar with that perspective from people like Dan Dennett and so on but the idea that the individual ants in that colony are in a sense enact they're kind of the hardware enacting the speculative execution that's a very interesting idea it's not speculative for them but it's sort of from the mind's perspective I guess it's almost like speculative execution but it's speculative execution that's being actuated in the physical world which is very interesting I not really thought about that before but then yeah okay then the point you're making about connection to free energy and sort of habit formation and so on okay so I wonder if if we're thinking about a model of cognition in which there are these two different two distinct causality notions the immutable versus the dynamic one I wonder if the so you gave a very nice account of how habit formation sort of works in these kinds of formalisms based on prior experience of expected free energy so I wonder if there's a way of describing that abstractly in terms of something like you perform the speculative execution step where you're you're treeing out several multi-way possibilities and initially you know nothing or you have no habits you're treeing everything out with equal weighting but then for each possible path you're calculating you're an actual or an expected free energy and then somehow in future speculative executions you weight those paths which you previously had found to have higher free energy as higher and so you're more likely to explore those and less likely to explore ones which are which have that lower expected value it feels like something like that should fit very very nicely into an algebraic semantics like this which would be interesting how about more questions from the chat okay upcycle club writes acknowledging the limitations of traditional entropy in multi-computations motivates us to develop context-specific entropy metrics can you share some insights towards such efforts yeah I can certainly try so yes I mean the first point is that you know it's I think it was there's that famous conversation between John von Neumann and Claude Shannon where I think von Neumann's famously said that like Shannon should call his measure entropy because no one knows what it means right and I I submit that the reason and no one knows what entropy means is because it's dependent I mean okay one of the reasons no one knows what entropy means is because it's dependent on exactly what we've been talking about it's dependent on the equivalence function of the observer so it's one of these things like um I don't know maybe this is a stupid analogy to use but it's sorry I'm I'm gonna go off on a tangent but I promise it's sort of relevant but so one thing that works okay one thing that always breaks my brain is when I try and think about like actuaries and life insurance policies because it's one of those areas where those models only make sense if they're not perfect in a sense right like so if you had an actuary who knew exactly how long everyone was going to live and somehow that information was kind of openly available there would be no life insurance policies would be pointless it's but you know whereas also if you had a model that was completely hopeless of predicting how long people would live they would also be hope they would you know life insurance policies would also be pointless that is somehow the very existence of actuarial science relies on your model neither being perfect nor being awful it somehow has to exist somewhere in between and that's something which I it's one of those topics where if I think about it for too long it all just stops making sense and entropy has very much that same character right because if you if you were Laplace's demon if you had perfect information about the system that you were observing there's no notion of entropy right it's just every you know every micro state so the Boltzmann formula gives you an entropy value of zero where but you know the notion of entropy only exists once you take once you take that that perfect knowledge of a system and you coarse-grain it you define it as I was describing earlier you introduce an encoding function that is not 100 percent suggestive so that now you are mapping certain distinct micro states onto the same coarse-grained macro states and then now you can ask okay what's the number of micro states that you know it's okay how course is my course graining what's the number of micro states consistent with this macro state what's the number of different values of my domain that gets mapped to a single point in my codomain of my encoding function and that's what entropy is and so it's it's very closely I mean it is effectively a measure of how good is my course graining so if you had perfect knowledge there's no entropy if you have no knowledge there's no entropy it relies upon you having a a not completely trivial but also not 100 percent you know a slightly suggestive but not 100 percent suggestive encoding function just like with life insurance policies but so the reason the reason I am stating that is because so now it kind of I think from that perspective becomes a little bit clearer why there are all these different notions of entropy and why as the questioner was alluding to why entropy seems to be so as a concept seems to be so domain and system-specific because every different system every different observer will in principle have a different set of encoding functions a different set of equivalence functions and each one will give rise to a different calculation of entropy and so one way that you can think about this program this program to try to understand the algebraic interplay between time complexity versus kind of equivalency complexity or you know computational irreducibility versus multi-computational irreducibility in some sense that is a program to try to understand how different definitions of entropy relate to each other in these kinds of systems how you know if I take one idealized observer that has this equivalence function and I ask okay suppose now they communicate with this different observer with a different equivalencing function they come to different understandings of what the entropy of the system is but what is the relationship between their measured entropy values they clearly is one that depends algebraically on some details of the distinction between their respective equivalence functions but there doesn't seem to be yet any general theory for how those things are related that's part of the raison d'être of this research program but yeah I mean okay so one thing that I will comment on although this is a little bit more speculative it's a quasi philosophical comment so one place where these notions of entropy become one place where the fact that you have all these different notions of entropy becomes kind of interesting is in fundamental physics so when you start to think about if you try to model physics and the universe in these fundamentally computational terms then one fairly generic sort of conclusion that you can reach is that gravitation, general relativity is essentially an entropic phenomenon I mean people have kind of talked about this in non-computational context too but it's very very natural if you start to think about space like hypersurfaces being sort of hypergraphs then in order to obtain a continuum geometry that's compatible with the Einstein equations you need to have certain you need to be able to make certain ergodicity assumptions on the rewriting which in turn sort of implies certain lower bounds on the entropy of the system so somehow gravitation, general relativity is a coarse-grained theory that you obtain in the limit as the entropy goes to infinity but something that's interesting is that quantum mechanics on the other hand is an idealization that you obtain in the limit as entropy goes to zero because in kind of one of these purely computational models of physics the quantum mechanical state of a system is described in terms of its multi-way structure it's described in terms of when I have a kind of a branching program like this let me find like this then I can divide it up into these sort of into these simultaneity surfaces and if I associate each state of the program as being like the analog of a quantum eigenstate and the kind of path weightings as being analogous to the amplitudes associated with these eigenstates I can quickly build up a description of this multi-way system in terms of the evolution of some discrete analog of the Schrodinger equation and it turns out you get a theory that is kind of mathematically isomorphic to standard quantum mechanics out of it so quantum mechanics is sort of inextricably bound up with the phenomenon of the multi-way system but if you take the entropy to infinity here then the sophistication of your equivalence function becomes arbitrarily large you can describe essentially any pair of states as being equivalent and so it turns out that actually the quantum mechanical case corresponds to the zero entropy limit whereas the kind of general relativistic case corresponds to the infinite entropy limit but there are two fundamentally different notions of entropy one of which exists at the single-way level one of which exists at the multi-way level and again the question of how these things into play is partly why we're investigating this and it's clear that that question has links to these quite foundational questions in fundamental physics the ideal point mass and the ideal distribution with it's center of gravity and all this Dave question for you from you yes okay I was can you hear yeah I'd like to hear down to the low road during this discussion maybe show business what people are trying to explain what is computational reducibility or irreducibility often you'll see a graph that says well now here's the computation our target running along the fox is running along and behind it there's a team of algorithms that would like to catch it and yeah it either does or it doesn't outrun all of them but some can seem pretty close to catching it now there's something else people have been looking at for a few years the Mandelbrot set the Mandelbrot set really is a not only a deterministic computation it's a crisp computation every set every point either is inside the safe zone it's going to sit there and it's happy and quiet and get color black or it flies off to infinity so you could it could just be a bunch of yes or no but that's not the way people trying to calm down after a day of work want to look at they want to say hey I want to see the 32 million deep Mandelbrot set and I want to see it in colors so when you get the colors you ask how hard did I have to work how long did I have to grind along before I made that decision to settle down and be quiet and uninteresting or it's going to fly off to infinity so you could put colors in now I just wonder would it be interesting to anyone to fuzzify these gorgeous causal and multi-causal graphs and just show this is the portion where we had to work really hard rule 35 or whatever we had to go 15,000 generations before we gave up and said we're not going to follow this anymore the other one oh after 30 steps I see that's a really interesting question and I mean yes so a couple comments on that I mean so what one is I mean you're right in the sense that yes the Mandelbrot set is a very clean thing to describe I would say it's not I mean I would argue it's actually not a crisp computation in that sense precisely for reasons of computational irreducibility because as you go arbitrarily close to the boundary of the set you can have complex numbers that have a kind of indefinite period of transient there's no upper bound on how long the zn squared plus c orbit can last before it either diverges or converges to 0 and that statement that there will be points that can remain that can get tied up in these orbits indefinitely is really a computational irreducibility statement but yeah I mean your question about you know could you construct so I mean that way of colouring points that are on the boundary of the Mandelbrot set in that way the so called escape time algorithm where you colour them based on how many steps that I need to do before it either converged or escaped off above some or the complex numbers modulus exceeded some value yeah it's an interesting idea that you could try and construct geometrical representations of the space of possible computations based on a kind of escape time algorithm for computational irreducibility yeah so I mean there are possible ways that you could do that right that you said you know we know we've known since the days of Turing that the halting problem for Turing machines is undecidable there's no finite computation that you can do that will determine whether an arbitrary computer program will terminate in finite time so you could do it if you once you have a way of kind of geometrising the space of possible computations which we have now as you say yeah you could easily construct a kind of escape time algorithm where you have your analog of the Mandelbrot set which is here's all the halting computations here are all the definitely non halting computations and then there's some boundary of very fuzzy stuff where we kind of don't really know we have to do a lot of work and even then it's only heuristic which is a very directly analogous and then the question becomes so in the case of the M set there is some underlying theory mostly I think mostly due to like Dwadi and Hubbard and people that allows you to kind of predict if you have a finite filament of the Mandelbrot set you can kind of predict where that filament is going to be based on some complex analysis argument and the question is once you have a geometrisation of the space of possible computations can you construct some general theory like the theory developed by Dwadi and Hubbard that allows you things about the topology of the computations whether certain regions of the space are connected whether they're compact whether you can do a similar thing or you can make predictions based on the geometry of one part of the set where you can extrapolate to things about the geometry of another part of the set that's a really interesting question and again it's as with so many of these things it's one of these like I don't know the answer but it's a good reason for investigating both from side as well as from the active imprint side there's both the information topology and the information geometry side and not just in the kind of topological deep learning but rather looking at the topology of information flows which has been heavily developed in the category theory related to quantum information sciences and then the information geometries that allow us to do like machine learning type accelerated optimization and gradients all these kinds of concepts that come into play with geometry and we just don't get them from topology so topology kind of sketches the skeleton and then with the computers that we have the information geometry is at least on the datasets and the ways that we have computation today that's kind of like the quantitative numerical versus the formal one question is are these all discrete state space discrete time formalisms because in active inference we often deal with hybrid models that have discrete and continuous state spaces and the same generative model or the same system of interest could be modeled with like a discrete time chapter 7 or a continuous time chapter 8 model so how does this deal with that that's a really good question I mean yes so most of what I've been looking at so far has consisted of discrete time, discrete space models for no particularly principled reason other than they're easier to analyze for the most part because you can do explicit computations because they're more amenable to constructive analysis it's easier to do but the beauty of a lot of this that's one of the beauties of using kind of general mathematical formalism is that once you develop it it's often quite easy to extend even to cases that extrapolate to cases that you didn't explicitly analyze so in principle this formalism works for continuum space and continuum time systems as well just with some slight modifications so rather than having say branching and merging you instead have if you think about this thing as now being a dynamical system described on some symplectic manifold then these kind of branching merging operations of the multi-way system become effectively divergence and convergence differential operators are defined on the symplectic manifold and so one place where we can start to analyze that explicitly and which I've done a little bit of work on but it's one of these things which I want to go back to very soon is looking at PetriNets so PetriNets are interesting because they are a discrete time discrete space system but they admit a continuum space continuum time description in terms of ordinary differential equations and so on so they're a nice example of a hybrid kind of discrete event versus continuum event system where it's clear that this formalism can be used and is somehow agnostic as to whether the underlying system is discrete or continuous again there's a broader philosophical point to make here which is that in a way one of the reasons I don't feel embarrassed to be working primarily with discrete systems is because once you start to think about things in terms of you have to not just care about nature you also have to care about the computations the observer can perform and what it's able to infer then you quickly realize that in a sense just like whatever you were saying earlier Daniel you know beauty is in the eye of the holder I think discreteness and continuity are also in the eye of the holder right I mean so if you have if you have a universe that is fundamentally continuous that's described by a continuum Lorentzian manifold or something but your constraints of the the only experiments you perform have computable outcomes have discrete outcomes where the countable where the possible number of observables is always countable then in a sense it doesn't matter it's irrelevant to you as an observer whether the system is discrete or continuous because the only parts of it that you can interface with and interact with are discrete and so you could have replaced the underlying substrate with a purely discrete mathematical structure and you wouldn't be able to tell so in some sense I don't feel too embarrassed dealing with discrete event systems because even if I don't necessarily believe that nature is discrete because I don't think that's I'm not even sure how we would be able to answer that I'm reasonably convinced that that's you know the the experiments that we can perform and the observations that we're able to perform are ultimately computable and therefore you know the underlying substrate might as well be discrete even if it's not sort of in reality so to speak yeah that's a great comment definitely calls back to your earlier points about like discretization being in the eye of the beholder like in the active inference models observations, raw data may already be discretized depending on the situation but even if it weren't like it were a continuous sensory perception or modeled as such analytically still commonly models discretize and categorize as they move up cognitive hierarchies and that was like initially explored to get more of this discrete either or decision-making, planning, all those kinds of properties well there's many interesting angles like I'm sure also it could be a multiplexed language model prompt but like what are you working on or excited about for 2024 that's a good question so I've kind of already given some hints about like this general research program of trying to understand computational complexity and algorithmic complexity and interplays between observers and systems through this category theoretic lens that's a major thing which I started on say about maybe a couple years ago I mean in some form I've been working on it for a long time but this more recent perspective on it is maybe a couple years old but for the various reasons over the last year or so I've kind of put that to rest and I've been focused on these much more physics oriented questions about discrete space time understanding things like how the black holes work and how the secretion work in discrete space time which is also very important and very exciting but I've sort of slightly been missing these more abstract directions and so I have maybe one or two major physics related things that I need to finish off and then I really want to go back to the great extent possible and I mean so one thing that's quite clear is that there's great interplay between this formalism and existing theories of computational and algorithmic complexity so in particular so one very basic example is I mentioned before that you have this kind of coherence between these two different algebraic structures between the operation of time like composition versus the operation of kind of parallel composition and these two algebraic structures are in general related although the precise conditions that relate them are not clear and that's partly what we're trying to understand but it turns out that degenerate cases of that question corresponds to unsolved problems in computational complexity theory so for instance the P versus NP problem can essentially be recast in these terms that you can recast the P versus NP problem as the question about is the coherence between the time like composition of computational complexity and the parallel composition of computational complexity which are what P and NP respectively are really about are those coherence conditions the strictest they can be which would be the case that the P equals NP or are they somehow more lax which would be the case that P does not equal NP and so there's you know that's that's one thing that we kind of already investigated but it's clear that a whole bunch of questions about you know how the time complexity and space complexity trade off or how to come over of complexity and time complexity trade off these are questions which can be recast in this kind of more algebraic theoretical ends and and will hopefully give insight into this general program of trying to understand observers and their relationship to the world and those are kind of major well with any luck those are major theorems that I hope will be able to prove at some point in 2024 that's that's that's awesome and it makes me think about parallel more nest mates more CPU threads deeper in times more sequential more planning and more cognitive single monolithic agent and then the kind of question is like can anything that a single agent mega matrix could do cannot be decomposable at space advantage or even at space disadvantage in decomposed it into a single time step operation yeah that's a that's a super important question and one that you know with the possible exception of this community not many people have asked right I mean so that's something which comes up in quantum computing right so a lot of the hype around quantum computation comes from these theoretical speed ups that derive from the fact that you're able to you know you're able to support these super positions of different you know where you know each that each state of your data structure corresponds to a different eigenstate and you're able to evolve some super super position of the eigenstates but then at the end you have to actually come to a definite conclusion about what the answer is you have to form some measurement operation and that measurement operation is lossy it's often non deterministic you know you often have to repeat it multiple times and you know it's becoming increasingly clear that for a large class of operations that were previously thought to have quantum advantage the additional complexity of the measurement step really kills any quantum advantage that you may have had that you know you get some advantage by doing unitary evolution but then you lose all of it by having to do the submission projection at the end and that's really a story of again this interplay between the time complexity saving of doing a computation of doing a multi computation in parallel versus the loss that comes from the the complexity of the equivalence function that you need to apply in order to get to some definite conclusion about what you know about about what happened because you know ultimately you need to somehow collapse that that directed graph into a single thread of time in order to be able to have some coherent representation of what happened and so again understand that you know that's a place where understanding these you know the these tradeoffs will become very important and as I say in some limiting case that that gives some perspective on your question Daniel which is which I you know I agree is a very interesting question about you know in principle we know that anything that a deterministic Turing machine can do a non deterministic Turing machine can do and vice versa with some speed up or slow down but that statement which is a classic result in you know in computability theory neglects all consideration of the equivalence function so there may be cases where the equivalence function is so complex that essentially you know that to do state equivalence becomes undecidable and so in that case you have a scenario where actually you know you've got a multi computational system but to collapse it to one that's equivalent to a single way system requires unbounded amounts of computational effort and so actually they become inequivalent even though you know computability theory says they should be the same so it's clear that there's a a more rich more subtle theory that's underlying here that we're just beginning to kind of glimpse and that I hope will you know we'll be able to kind of to prove some new limited results about soon to understand it a bit better awesome and I think definitely a special shout out to all of our colleagues on either like the Wolfram and or active inference side because we've seen few if any active inference models phrased analytically or computationally with the Wolfram technology from studying complexity in other areas it's really clear to see how productive and powerful the software and the tools can be and changing and growing every day so it's really interesting maybe someone can if they're listening this far in like go from one side to the other and back or make a Wolfram active inference model or do some other kind of combination because it's very fruitful territory and we know that our elders have already spoken they've okayed it no but really it's so it's so rich with connections here between the areas that we're all studying and feeling like converging on many common places to scaffold and jump off from together yeah I agree and at least the kinds of things that we were discussing here about speculative execution and behavior through free energy principle and so on those things should be relatively easy to implement in the framework that's already been developed here so that's a question of just implementing some kind of computation of expected free energy and using that to weight multiway paths in the speculative execution model so at least the beginnings of that implementation I think the path is pretty clear and we will probably end up doing at some point in the future anyway as part of other research so I agree it's a very exciting kind of point of interface yeah well I hope that we can stay in touch if you ever want to come back for a 009.2 do you want to even facilitate some kind of working group or some connections to really strengthen and like include the participation of more people in this super exciting area that would be amazing yeah that sounds fun let's try and set something up well Dave first penultimate comments then Jonathan you can have the kind of last comments yes I hope you do get to continue on the Wolfram side to think about a more general notion of what these ultimate things are are they observers or does that already prejudice the case of what you might find if you call them workers or actors or you know go back and ask Stuart Kaufman what must a mind do to earn its way in the world let's make this keep happening thank you thank you Dave for suggesting it and also it was a great suggestion Jonathan no I think that's a fantastic note to end on I mean and that in a sense this idea that we should start to move so it's okay big picture for a moment like you know this formalism is being developed the formalism I've described in this discussion is being developed you know assuming a kind of purely passive observer idealization and that's already been incredibly difficult there's a lot we don't understand at that but of course David is right that in a sense you know ultimately we want to start transitioning to a participatory observer model where you allow for two-way interactions or higher order interactions between observers and systems and things that don't just go in one direction and yeah in a sense I view a lot of what we're trying to do with trying to nail down these notions of causality trying to understand these interplays between different complexity and entropy measures as that subsequent theory right that does you know it's clear that if we want to have a version of the of this kind of compositional multi-way formalism that is also compatible with things like second order cybernetics then you know at the very least we need to have a very coherent notion for what causality is and a robust algebraic description of that that's not going to break or change and so I think the non-participatory observer model is a useful starting point because it's one that is just within our grasp of being kind of mathematically tractable and then the hope is that the technology and the ideas and the conceptual structure that we develop for understanding that will then as I say lay the groundwork for developing something that's more like what real active participatory observers do and yeah I think on the scientific side I'm not really sure I have any final comments apart from just as obvious this is still a story that's being developed and as Daniel alluded to I hope that we can continue to interact and collaborate where that makes sense and at the very least I think in cooperation of these active inference models within these in the first instance discrete time computational frameworks and allowing things like speculative execution and multi-way path waiting based on free energy estimates I think that's a that's a project that's of obvious mutual interest and something that I hope will happen in the coming months thank you we just speculatively executed active wolf inference basically all right thank you Jonathan thank you Dave thank you everyone see y'all next time