 Hello and welcome. It's July 17th, 2023. We're here in Active Inference Mathstream, number 5.1 with Chris Bodner on topological deep learning graphs, complexes, and sheaves. So thank you for joining, Chris, looking forward to your presentation and discussion. Yeah, thanks all for having me. So yeah, as I was just saying, this is my PhD thesis, which I finished a couple of months ago. It's also kind of a public online. So you wanna go into the details, just kind of look this up on the internet and you should be able to find it easily. Obviously, there's lots of stuff in there. So I kind of try to give an overview today and maybe also go in a little bit more detail in certain aspects, since there's not a lot of time to go through everything. So now I'm a professor of research. So this is some nice goal work that I did in the past when I was at University of Cambridge. All right, so let's get started. Right, so let's start very easily. Now I'm actually not sure exactly what's kind of the background of the people who are watching, but in machine learning, there's all these kind of subfield that emerged a few years ago, which is called geometric deep learning, which is essentially looking at how to apply these kind of deep learning neural network architectures on data, leaving on all sorts of kind of structures or geometries or spaces, if you want. And this has a lot of applications, especially in the life sciences. And there's kind of been a lot of instances of this in kind of very famous publications and all that you see here. But just to give some examples, for instance, if you have proteins or molecules or things like that, they usually represent it as graphs. And you kind of have some data, leaving on these graphs like kind of the properties of certain atoms and so on. So these kind of things. And so far, these kind of spaces or these kind of problems, learning problems, if you want, they have been approached mostly kind of with a geometrical mindset that is the kind of name of the subfield. Also mentions, but something that I would argue is that geometry is not everything that you need. And there's kind of other non-geometrical aspects when you are in such a setting. And this is kind of quite obvious once you realize that the spaces that kind of show up in the field and in many applications, they're very heterogeneous. So as I mentioned, for instance, you could have graphs that could represent anything. In this case, it's the caffeine molecule that you see here on the left. And you want to have some models that predict certain properties of this molecule and so on. But for instance, you can have grids and we see grids all the time and data leaving on grids. And I'm referring to images, videos. They are all kind of pixels living on a grid. And then you can have more sophisticated things. You could have some meshes, for instance. They're all over in computer graphics. And then you could have some sort of manifolds. So for instance, if you're doing maybe weather modeling or something, we live on a sphere, topologically speaking. So you might want to model your data as living on a sphere and so on. But nonetheless, even if these spaces are kind of geometrically heterogeneous and some of them don't even have a geometrical structure in kind of a strict mathematical sense, they all have what's called a topological structure, which is kind of like kind of a weaker kind of structure. But it's kind of more general. And I'm going to talk a bit in a few seconds about what that means. But in general, when you do kind of mathematical physics, you kind of have a ladder of structures where you kind of keep building on top and the more structure you have, the more sophisticated things you can do and so on. And at the base of this diagram, you just have sets, just kind of a collection of elements, with no kind of extra structure. And then you kind of keep going up in this ladder and you add stuff on top of sets and so on. And as I was saying, kind of most of the work is kind of focused on maybe the top levels of this hierarchy, but it's kind of topological level, which are kind of the weakest kind of level you can add on top of sets, has kind of been neglected to a large extent. And part of what I've been doing in my PhD thesis was essentially looking at this kind of learning problems on these kind of spaces from a more topological perspective and kind of try to fill in these blanks. So this is kind of the overview. And now, okay, so if we are to adopt this topological perspective, well, what would that actually mean or how would that look like? I guess it could look in different ways, but in what I've done in my thesis, it looks kind of like this. So we have horizontally, we kind of have a space that could be anything, there's just kind of an abstract space, could be a grid or things like we've seen in the previous example and kind of vertically attached to the regions of the space we have data. So data is kind of this vertical component and you kind of see these flags kind of being kind of anchored in these regions. So that kind of signifies you have some data that's kind of associated with that region. So that's kind of the high level perspective and I'm gonna make that a bit more concrete a bit later. And there's kind of two essential things about this picture. A first thing is locality. So the data is attached to some regions of this space, this topological space. And in that sense, it's local. So it's kind of associated with the region and maybe to give a concrete example, if you kind of have a temperature sensor somewhere in space, right? You could think of whatever that sensor is measuring, it's kind of a property of kind of the immediate surrounding of, you know, around that sensor. So it's kind of describing some property of a region in space. So that'll be kind of a good example. And another kind of axiom that we're starting with is that the space has structure. So the space has kind of, it's kind of made up of various regions and these regions intersect in various ways. And that implicitly also makes our data structure because the data is attached to these regions. So there is kind of some structure in the data. All right, so that's kind of the picture and actually many of these things relate to category theory. I'm not going to go in depth into this because it's kind of sophisticated and I'm not an expert myself in category theory. But kind of the high level here is that category theory is kind of a nice way to translate between different structure in mathematics and kind of, you know, discuss about properties of certain kinds of objects and translate that to some different kinds of objects and find all these kind of relations and connections. And a concrete example is, for instance, if you want to study some manifolds, some surfaces, you could associate some groups to these surfaces and then any sorts of relations between these kinds of surfaces is also translating some relations about these groups or some other algebraic structures. So you could study these manifolds by doing algebra instead of doing geometry or topology or something else. So also in this case, this kind of manifesting the fact that we kind of translate these from spaces to data. So kind of because we associate the regions in a space, certain kinds of data, this is kind of how this translation manifests in what I've just described in the previous picture. You could think of this as some sort of translation or mapping from spaces and regions in that space to kind of data attached to that space. But yeah, I'm not gonna go in a lot of detail into this, but it's just kind of to keep in the back of your mind that there's stuff lurking in the background. All right, so this is, I promise this is the only math definition I'm giving in this talk and then I'll stop. But just because I'm mentioning topological space, space is quite often, I just wanted to kind of give this axiomatic definition, which might sound sophisticated, but I have a picture at the end and hopefully it'll be clear. So yeah, so what is it? So it's just a set. As I was saying, we start with sets and we put stuff on top, right? So you start with a set and then you also have a collection of subsets of the set called the open sets that need to satisfy certain axioms. So you could think of these open sets as kind of regions of the space, not very informally speaking. So something that kind of has to be satisfied is that, well, the empty set and the set itself need to be open sets. So in some sense, you could think of this as saying the set itself is a region of that space very informally. Which, you know, it's kind of, let's say obvious. And then there's some kind of constraints about intersecting and taking units of this region. So if we take the intersection of two regions, we should get another one of these regions. And if we take a union of these regions, we should take, we should get another region. And there are some constraints again, like, okay, how big this intersection should be. There should be finite intersections, but you could have infinite unions, but that's a technicality and we can just keep. Anyways, but to see a picture. So on the left, you just see the set X itself. And here I put like a potential neighborhood structure and kind of like open set structure on this space. So we have an open set here, another open set V. By this axiom, their intersections would also be an open set. So you see this intersection in the middle being another open set. And then the set itself is another open set. So it's just kind of a splitting stuff into regions. Kind of you could think of it like that. All right, so this is a topological space. And now let's add data, right? So we mentioned that we have data and then we put, we're sorry, we have a space and then we add data on top. So far we've seen how a topological space looks like. Now let's add this kind of vertical stuff, these flags that you saw before. We just put some data on top of these regions. And if we put data on all the regions of the space on all the open sets, we get these structures that in category three or like in algebraic topology also jump to the are called pre-sheeps, which sounds very fancy, but all is just kind of a definition of what I was already describing. Essentially you have some data for each region. These F of, so for instance, for region U, you have these F of U, which is kind of the data attached to region U. So you could think of F of U, some set we've described in the data that lives there in that region. But there's also an extra thing. You have some sort of maps going between these kind of pieces of data. And these are called restriction maps and why is that? It's kind of, they provide your way to kind of zoom in if you want, right? Like you have the data attached to the whole set X and then you could think, okay, how do I take this data? How do I go from this data to data on a smaller region on X? There's kind of a way to zoom in on that data essentially. And I'm gonna show some example in a second. So these are called pre-sheeps. And just to see an example, our space here is kind of one of the simplest kind of space you could think of as just one D horizontal line or as just the real line. And then you have some regions which are just given by open intervals on these, you know, on the real line. And then some pieces of data could be functions, like continuous functions on those regions. So here's like some sample data on this first interval. Here's another function on the second interval. Here's some data on this third interval. And actually in this case, it happens that all these functions agree on the overlap. So where these regions overlap, they take the same values and we can actually glue them together in a single function over the entire region, right? So this is just an example of a pre-shift and it's called the pre-shift of continuous functions. So our data in this case is continuous functions and the space is just the real line. And we put these functions on top of the real line. But it turns out because of this kind of special property that we can kind of glue data and we uniquely get some other piece of data, right? We can take these three pieces, it's just exactly like a puzzle, right? We put these things together and we get a fourth thing which is kind of a single function where we just overlap these functions, right? And these pre-shifts that satisfy this kind of properties where you can kind of glue them to get a unique piece of data they're called sheaves. And basically the pre-shift of continuous functions is actually a sheave. So this is kind of a way to formalize data attach to these things. It's gotta get less technical in a second. So just to give more examples, so for instance, in this way we could describe data on a sphere and let's say if this data is just some vector field on a sphere, so let's say if this is Earth, right? This could be some wind vector field, right? Like if we do weather modeling or something, you just have a vector field describing the wind on the surface of Earth, right? And you might want to do some machine learning on top of this where this kind of vector field has a sheave structure and you could think of it as a sheave because if you have like some vector field on the red region, a vector field on the yellow region I can kind of glue them together uniquely if they overlap, if they agree on the overlap and we get the vector field on this bigger region. But something that's quite nice is that even if we have a very, very different kind of space, namely a graph which is very different from a sphere in all points of view and we can still apply the exact same kind of axioms and terminology and kind of definitions and we can have a sheave over a graph. So in this way we could have for instance some features associated to the nodes of the graph, some features associated with the edges of the graph and there's another node which has its own features and this is actually the exact setting we have in graph machine learning. So this is quite nice what this kind of topological perspective allows us to do is, we kind of have a unified way of thinking if you want about very kind of heterogeneous spaces and we can model on all of them data attached to them by using this kind of sheave terminology and other ways as well, but I'm not going into that in this talk. All right, so this is kind of an overview of what I've been doing in my thesis and just to kind of dive a bit deeper into this, I just wanted to go into one paper that we did that in Europe's last year, yeah. So this was last year on what's called shift diffusion. So essentially how can we use what I've just described to do some useful stuff when doing machine learning on graphs. All right, and this was a collaboration with Francesca de Giovanni, Ben Temberlin, Pieterlio, my advisor, and Michael Bronstein. Okay, so before I dive into this, I just want to give some background in case people are not familiar with this. So the kind of favorite architecture of people doing machine learning on graphs, these days are these things called graph neural networks, which are actually very simple kind of models. So in this setting, you have some features. So each node in your graph will have some features. This is what is H vector here, the node. So it's the vector H associated with node A at layer or time, D, whatever. So you have some features for each of these nodes. And what you're doing, each of, if you're at a certain node, you want to kind of compute a new representation or new features for this node at the next layer. So essentially you're learning representations. And what the graph neural networks are doing, this node will receive a message from all the other nodes that are neighbors with this node. And this message can also be passed, can use some neural networks in there, but essentially it's some processing of these features of the neighbors. And these are aggregated into this message. So here in green, and then there's, this is passed through some update function that combines the message from the neighbors with the old representation of this node. And it gives you a new representation at the next layer. And this happens for all the nodes, right? So then you get some new representations for this node and this is one layer. And then you kind of keep repeating this for as many layers as you like. So this is kind of how you do deep learning on graphs. It's kind of a very, very simple recipe. And most models actually vary in the way they kind of compute these messages and in the way these update function is designed. But that's kind of the parameters that most of these models use. Otherwise they all kind of respect this framework and work in this kind of particular way. And to give you maybe an example for why you would want to do this, you might want to do node classification. This is kind of a classic problem in graph machine learning. There are others, but I'm just gonna talk about this because it's easier. So you have a graph and this graph has nodes that have different labels here. There's just two kinds of labels, these orange and blue. And you have some edges between these nodes and what you want to do is you want to do this kind of message passing that I was describing to compute some representations for these nodes where you can easily classify the blue and orange nodes. Now, something that's quite interesting is that for many kind of graph neural networks, depending on the properties of these graphs and how these kind of different nodes are connected, their performance might vary quite a lot. So in particular, they're affected by this property called heterophily. So this kind of a measure of how much opposites attractive you want, right? So it has a very simple formula. Basically you take the number of edges between orange and blue nodes and you divide by the total number of edges, right? So basically you kind of check how many connections we have in this graph between things that are quite opposite to each other versus connections that are between similar kind of nodes. So if you have this kind of, a lot of these kind of heterogeneous connections, then you have very high heterophily. And it turns out that many graph neural networks actually struggle in that setting. It's very hard to classify things in that setting. And intuitively you could kind of also figure out why because you could easily apply to some kind of reasoning where, oh, this node looks a lot like these other nodes is connected to. So they kind of must be in the same community if you want or in the same label. But it's much harder to do that when all things are kind of different from each other. And this community is kind of done form, right? Even visually, if you see a graph and it has some nicely clustered communities, it's quite easy to draw a line between those and say, oh, this is a community is another community. But if things are very mixed, then it's quite challenging. And it turns out it's also challenging for these models, not just for kind of our intuition, right? When we would have to do this. So this is kind of some problem where this topological perspective I was mentioning will be used to do some useful stuff. Okay, so coming back to sheaves on graphs, and at this point, I think you can largely forget what I mentioned in the introduction or if there's something you misunderstood there, you know, we kind of start from zero bit here, so there's no problem. So on the left, you just have a graph which is kind of the incident structure of a graph. I just don't hear the simplest possible graph that has two nodes, V and U, and then there's an edge between them. So this is just a graph with one edge. That's all that's going on here. And I've just represented it by kind of in this kind of incidence structure kind of way, right? Node V is incident to node E, and node U, sorry, edge E, and node U is incident to edge E. So there's just an incident structure, and what is kind of triangle symbol showing is just this incident structure. It's just a way to symbolize this incidence relation if you want. Okay, so this is just a graph, right? And a way we can kind of think of sheaves on graphs is just mapping these graph structures. So this is kind of this categorical theory translation which translates this graph into something else, which looks, you know, very similar. The structure is kind of the same. It's just kind of the meaning of these things changed. So bridge node V, we have here, this will be a vector space. So F of V is a vector space. For each node U, we have F of U, which is another vector space. For each edge E, we have this F E, which is another vector space. So all nodes have their own vector spaces, and the features associated to those nodes leave in those vector spaces. So basically for each node, we have a vector space of features. That's all that's going on so far. And also these arrows that these incidence relations also translate into something. And they translate into the obvious thing, linear maps. So if these are vector spaces, then these things should be linear maps or just some matrices essentially, right? So for each arrow, you see here, we have a matrix. And something that I'll argue and show in a few slides is that basically message passing is very similar on graphs, is very similar with group actions in group theory. So let me explain exactly what. So we kind of can think of what we have on the left, these arrows from the incidence relation. We could think of these arrows as kind of some buttons we can press. So what do I mean by that? So if we have this node V on the left, right, and this E, now if we have some features, some feature living in F of V, right? We could just kind of press this arrow button here, and then if we multiply this matrix by this feature, we will get an edge feature. So it's kind of like, if you go along this arrow, this matrix will multiply this feature and this vertex feature, and it will give you an edge feature. So you could think of these arrows as kind of giving you some sort of actions that you can play with to move features from vertex to edge and edge to vertex. So in this case, it's kind of a left action, right? So this is what I'm saying. I'm taking this arrow, which is this one here, and I act on some features of node V, so this H of V that lives here, right? And how I do that is I just take this matrix, this matrix associated with this arrow, and I multiply this vector H of V. So just matrix times vector, that's all, and then we get an edge feature. So this is just kind of a way to move from here to here. So this already kind of looks a bit like message passing, we're kind of passing a message from this vertex to this edge, but now we also need to pass a message from this edge to these other vertex use. So we kind of, we need to get from V to U. So, and we did that by passing through E. So by doing that, we could do that by kind of going in reverse. So we could have a right action where instead of applying this matrix, we apply it adjoint matrix. So it does as the transpose matrix. So if we want to go from here to here, instead of applying this matrix, we apply it's transpose because we want to go the other way around. So this is, if we compose these things, then we can move features from V to U. So this is just a way that kind of, we can apply these actions to do message passing. And these are called shift actions or pre-shift actions. And I'm gonna now, so what's kind of the relation with between this and what we have in group theory? So one way to represent a group is by kind of having some sort of graph like here on the left. So we kind of have some star object. It's just kind of a dummy thing there but all the group structures in these arrows, right? So for each group element, so let's say this G is a likely degree rotation. For instance, let's say we do have a group of rotations just to have some concrete example. So these arrow could correspond to a 90 degree rotation. We have another arrow that does the opposite minus kind of 90 degree rotation as the inverse of that transformation. So this is kind of the structure of this group. And if we have, we also do this kind of similar kind of translation as we've just seen. So basically we define a pre-shift on this group. We map this star to a vector space. So the star kind of replaces the vertex we had before. Now we just have a single vertex and it's just these arrows we have. So the star is not this vector space that you show here in blue, right? And now if we actually do group actions which are kind of a very well established concept in group theory, well, for instance, if you want to act on this vertex, sorry, not vertex on this vector V right here, you have a vector in this vector space and you want to act on it by this group transformation G. So essentially you want to press this arrow so you apply some action on it. Then what you do is, well, because of this translation these G has been mapped to some matrix which is the rotation matrix, the corresponding rotation matrix. And you apply this rotation matrix on V and you get like a 90 degree rotation here. So this is what's going on. This kind of vertical vector is showing the rotated vector. So this is completely analogous with what we've seen on the previous slide. This is how kind of sheaves connect these kind of actions. So essentially what you could think of as message passing is same as group actions in group theory but you just replace this group with a graph. So it's kind of analogous to that, right? So it's just kind of a different kind of translations where we replace the object on the left. Now it looks like this is a graph, this is a group, right? So, but kind of the rest stays exactly the same. So you could, and this kind of gives us a way to formalize in a way by looking in this topological perspective to connect all these kind of symmetries and things that have been explored quite a lot in machine learning to message passing on graphs and to see one way in which they are related. Okay, so now you might say, okay, this was all very sophisticated and nice but you know, what is this going anywhere basically? And I'm just gonna show you kind of a very forward example there's more about the time is limited. And something we showed is that as I was saying in the beginning, many graphing networks kind of struggling these heterophilic graphs. And what we showed is that no matter how heterophilic or kind of weird your graph is you can always kind of find some shift structure essentially kind of a message passing neural network that if you use sufficient layers it will be able to disentangle the classes of the nodes, right? So just to show you in this picture what you have here on the very far left the colors of the nodes, they show the class. So we have three, there's three colors here so three classes, right? And this is kind of the graph in the beginning and the position of the nodes in this box denotes the features. So that's a way to kind of just to visualize the features. The position, the 2D position is actually the 2D feature vector of each node. And you can see in the beginning everything is kind of super messy and entangled, right? Like if you want to classify these nodes it's kind of very hard because their initial representations are very messy and kind of intertwined. But as we stack more layers of a particular kind of sheaf or message passing model, you see how progressively these classes get kind of more disentangled and more disentangled at each new layer. So kind of these representations kind of collapsed and they form this kind of clustering, right? And then when you get with something like at the end you can kind of see these three communities very clearly and it's extremely easy to separate. And kind of the essence behind these results was we showed for different kinds of problems what sorts of sheaf or message passing models you need by using the theory to solve kind of problems. And this is quite important because it kind of shows you some important bits and pieces in the architecture that you might want to kind of change or use in order to solve certain kinds of problems. And we also had some sort of impossibility results. So if you use a graph unit of some kind you can't solve this problem or you'll struggle to solve this problem. And we also said, okay, if you use some more general ones then you might have a chance. So this is kind of some, the, you know highlight of you behind this theoretical stuff. And what we actually do in practice is to essentially learn these message passing functions or to learn the sheaf or these matrices. So in practice, like when someone gives you not classification tasks it's very hard to know beforehand what exactly is the right sheaf or the right message passing model to solve that task. And what we do is actually we learn that from data. So we learn these matrices that do the message passing we learn them from data by passing some using some neural networks which are shown here in red. And then you learn how to kind of transfer features between these vector spaces and kind of move them around. So this is just showing how these vectors which are features of these nodes and edges how they're kind of moved around by kind of going through via these matrices, just some matrix multiplications. And okay, so that's kind of the highlight of you behind this model. And we evaluated this on some kind of real world heterophilic data sets where you have to classify nodes based on kind of various communities or different kinds of labels. And these data sets going from right to left they are getting more heterophilic so in some sense more challenging for classic architectures and our models which are kind of inspired by all this stuff, the dimension they score quite highly in these benchmarks. And at the same time, we also revealed some or justified some various choices that other models in this space have done but maybe they were not so well justified or maybe they had different kind of motivations we also managed to kind of show why various things they were already doing where they made sense from the point of view of this kind of theory. All right, well, that's all I had. Yeah, thanks for listening. And yeah, happy to chat more about this. Also have lots of backup slides in case, you know depending on how far we venture off with these questions. Cool, well, awesome work. Thank you for the presentation. For people who are in the live chat they can write some questions but there's many things I think we could talk about. So I wanna start with reading a quote from an abstract of the paper by Vandalar, Kudal and DeVries just to kind of ground this in the active inference context and really justify why the message passing approaches that you are describing are helping in the active inference modeling. The paper, it's two papers, it's called realizing synthetic active inference agents. And they wrote, with a full message passing account of synthetic active inference agents it becomes possible to derive and reuse message updates across models and move closer to industrial applications of synthetic active inference framework. So how does knowing the message passing structure help reuse a model across different settings or like facilitate the legibility of the model? Right, so first of all, I'm not super familiar with the kind of active inference literature. So you'll have to help me there a bit in for it to anchor maybe the discussions a bit more into that. But I think if I understand correctly the kind of question you're getting at is basically how can kind of message passing help us generalize in kind of various kinds of settings or maybe from one graph to another and things like that. And these ease and kind of active area of research how exactly this generalization is happening. But something you could notice for instance something that for instance was shown like these models are quite good at for instance spotting patterns or structures depending on how exactly implement them. But for instance, let's say you have a triangle in your graph or that'll be kind of the simplest structure. You have a triangle or some other kind of gadgets in your graph like particular subgraphs that might show up in different kinds of various graphs. The graphs themselves might look completely different from each other but these kind of patterns might kind of be reemerging in multiple like local patterns might reemerge in multiple graphs and that could help your way to kind of generalize. Like you could see for instance, if you have clicks they're super important in kind of when you do social network modeling and things like that because they kind of show this kind of close group of friends, right? They all talk to each other so they kind of form a click like everyone's connected to each other, right? And then you might be able to use that then generalize for another completely different social context where these agents are again kind of communicating in a similar manner or connected in a similar manner even if the kind of overall pattern is quite different. And it goes way beyond just kind of structural similarities because there's also features in there. So there's combinations of kind of structural patterns and features that give your even more complicated patterns, right? Like you might have a triangle but then also two of the features in this triangle are looking a certain way and one that looks in another way. So that gives you even more kind of requirement than even kind of richer pattern detection abilities. So you have essentially this ability to kind of spot patterns at multiple scales as well. So you could see this happening at multiple scales. You could have patterns of patterns, right? You could have community, entire communities connected in various patterns and so on. And again, it's kind of also a research question. How do you capture these hierarchical patterns and so on? In general, you have to do more message passing if you wanna capture things that are further away from each other because otherwise they can't talk to each other, right? So yeah, I don't know if that actually answered your question or if I was kind of going in the right direction next. Oh, that's great. It brings up a lot of different cool ideas like this patterns all the way down, but totally agree. I think we can now perhaps explore some more specific connections to active inference because hopefully the listenership or viewership of this it's kind of like a two-way street. Like some people may be coming from more of your backgrounds and then learning about active inference and generative models as a specific system of interest the first time, but also certainly for a lot of people in the active inference space, like these methods coming from category theory have only recently come up to, I guess, more prominence in Bayesian modeling, at least where we are. So it's kind of, it's a cool connection to make. I think one of the biggest touch points off the bat was like you mentioned multiplying a matrix by a vector and interpreting that as an edge. So just in the inference part of the generative model of about sensory observations, we always talk about the thermometer observation and an underlying hidden state temperature. So that exactly describes that case. And that's why we can represent the active inference generative models, the perceptual parts and the action parts in terms of matrix multiplication. It's why the MATLAB code for generative models does look mostly like matrix multiplication and it can all be done explicitly that way. So how does that, like, are there models that don't have this feature or what do we gain by having like all of our edges defined as appreciative action with a matrix and a vector in this setting of agent generative models with perception and action? Like what, yeah, any thoughts on that? Yeah, I think essentially kind of the graph structure is kind of telling you these things interact in some way, right? So there's some communication between these vertices, if you know, if we're kind of in a graph setting, right? And then kind of what the sheet is giving you or any message passive model, essentially it's expressing a way that, in which way that connection should manifest in the model or in what way that connection should be used to process information. So in this case, I was mentioning, okay, we have like linear maps because you could go on the type, if you're a type of data or vector spaces, then you'd be transformation will be some sort of linear maps, but it doesn't necessarily have to be. So for instance, it could go to any nonlinear transformation, right? Like if, and this is what's happening in general in practice, you know, if you have a neighbor, the message coming from that neighbor could be modulated by any sort of transformation you want. So it could be linear, it could be nonlinear, it could be something, I don't know, you can specify it basically, but essentially you could think of this as, you have a structure level telling you who should communicate to whom. And then you kind of have some semantics that this kind of shift is adding on top, saying how should these things communicate, right? Like the first thing is who should communicate or what should communicate? And then the semantics we had on top essentially describe how should that communication manifest essentially. Very cool. I think that maps like exactly to how we talk about the sparsity of variables in the generative model. So here the topology of the nodes in the graph that we wanna do a message passing on are gonna be describing like the agent and the environment or like the generative model that includes perception, cognition and action. So a lot of people have proposed different sparsity architectures for integrated modeling of perception, cognition, action. So one example would just be like kind of around the clock like action influences environment, environment influences perception, back to cognition. You could add a self loop using our call of blanket and different kinds of connectivities. And that defines the sparsity topologically, which is where you showed the stack and you were on the second and the third levels I think of the stack. And then like what flows, it has to be described how it actually, what that edge does. So what is that? What is that that is also being provided? Yeah, yeah, exactly. And it could even go to the extreme where does that edge actually do anything? So for instance, if you have a matrix that's just the zero matrix, for instance, associated to that edge, you would just kind of multiply by zero and I think that gives you zero. And it's kind of essentially pruning that edge, right? Like I can't get rid of it. I don't want that communication to happen. But there's also kind of this possibility where these kind of semantics, they override the structural level where you say, okay, I don't need to communicate with this other agent person or whatever. It depends on what these vertices actually mean and in what context you are. And then there's also the case where you could do some sort of selective pruning where these matrix, depending on, so in kind of linear algebra, the matrix has a kernel. So it's all the stuff that that matrix sends to zero. So what vectors are sent to zero, right? But not everything will be sent to zero unless you're the zero matrix. So depending on the features of the neighbors, you could also just send some of the neighbors to zero, right? And that kind of removes those neighbors from the equation. They just kind of get, you know, you, yeah, they're not factored in anymore. So you kind of have these, you know, it's a way to get the sparsity, I guess that you were also talking about where, you know, certain, only maybe a small subset of the inputs or kind of, yeah, only a subset of the features are actually kind of doing some meaningful stuff among the neighbors and everything else will be kind of zeroed up. Yeah, that makes me think of the lasso regression which tries to set most variables of having an impact of zero so that a few hopefully important variables really pop out in the analysis, but also there's newer techniques, I guess, of attention modeling and reweighting that isn't just like, okay, set five of them to one and then the rest of them to zero, like a more nuanced, so I think that sparsity with the expressivity is basically the best of both worlds because you do want to have a situation where there is an edge but the attention being paid to it is zero. So functionally that doesn't have an update on the belief state even though the in principle the edge exists and that's why we can model situations where like the agent believes they have impact in the world, but actually just because the edge in principle exists doesn't mean that it has any given impact and so that allows like the articulation of these models where they factorize and keeps interpretable motifs in terms of just little clusters of motifs here in our case describing the action perception and cognition types of systems of interest, but people I believe already implicitly do this, like they will often add an adjective and refer to X kind of active inference. So like deep active inference with a temporal horizon, sophisticated active inference with this kind of nesting and those are pointing to a given feature but of course those features, as we're hoping should be composable and so this seems to be bringing tools that are even more general than just action perception modeling because they're at a lower level of abstraction than like any specific system of interest but where this work and kind of timeless thinking around cybernetic systems come together through the active inference generative model as a Bayes graph, it gets very exciting. Yeah, yeah. And maybe it's also something we're emphasizing here is that even if this kind of semantic level can get rid of some edges, right? By doing this kind of pruning, something it cannot get rid of is the computation. So something that kind of that structural graph level forces you to do, it kind of tells you what should you spend compute time on, right? Cause like if, even if you're gonna decide to prune an edge, you still need to decide that which takes compute time. So you still need to look at all your neighbors if you're a node and decide what to prune or maybe you don't prune anything or whatever, but you have to look at every edge. And one way to look at this is the kind of, the graph structure defines you a computational graph or kind of a computational, yeah, a series of computational steps you have to execute and then the kind of the shift structure or the message passing model actually specifies what those steps are and, you know, in what particular way they look exactly. And so yeah, that's one point. And yeah, you also mentioned attention and actually, yeah, I'm glad you did cause this is actually quite related and in certain ways more general than attention and actually maybe going back to this slide it might be a nice way to see this. So here, basically what happens in attention instead of learning these matrices that we learned here in attention, you learn attention coefficients here to just learn a scalar that's the attention coefficient how much attention should I pay to essentially, well, you know, these overall edge, let's say which will be just a scalar what we do is kind of a bit more complicated because you just learn how do I transform these neighbors so it's kind of a whole matrix rather than a single scalar but there's also some subtle differences but in a follow-up work with it we also combined this with attention and went a bit more general that also worked quite well but the kind of underlying idea is very similar you want to modulate the way it transforms information based on the information itself, right? So, you have this kind of one level of recursivity if you want to kind of, yeah that we're also alluding to and that, you know, it happens in active inference where, okay, you're, so if I'm not be right my neighbor knew some features and based on these features which are XU I'm going to find out the matrix that will be used to process XU, right? So it's kind of very recursive and it's what happens with attention, right? Based on the features of NodeU I'm going to compute an attention coefficient that I'm going to apply to this feature of U, right? I'm going to decide based on this feature how much attention should I pay to it and here we decide how should I process it more generally in a linear way. So you have this kind of loopiness structure embedded in there. Awesome. I'll bring up a few more points because I think there's so many great pieces. So, Toby Sinclair-Smith who we recently discussed his dissertation in Livestream 54 introduced a term or at least a phrasing the compositional cognitive cartography and so thinking about the compositionality of cognitive systems. And I think what you're describing here with this notion that the mappings are more general than the kind of attention mechanisms known famously today that those represent like a lower dimensional special case of one kind of architecture makes me think about how the Bayesian graph is kind of semantic in principle and can have all of these nice categorical formalisms around them but then and you can even build the connector to empirical data with the pre-sheaf and the sheaf which may be news to even many empirical researchers doing data analysis certainly was for me but the message passing provides a rigorous translation from whatever semantic model is proposed topologically to an implementation procedure that can be planned for and executed in linear time or at least with definable characteristics. So, message passing plays a really important part in going from like the abstract what is possible to the implementations of any of these actual models and it does it in a really general way where is it accurate to say that we hope that implementation with message passing compatible generative models will kind of roll out better because we won't have some of the engineering challenges that less reusable abstractions might carry? It's hard to say, I think there's also certainly some limitations to this paradigm as well. So just kind of doing this kind of message passing I think as you were mentioning one thing is that it kind of scales up quite easily like linearly with the size of the graph but that also come at the cost so there is certain results showing this has limits in expressivity. So if you actually want to go beyond this for instance you have to instead of just looking at pairs of nodes you have to look at tuples and it's kind of high order groupings of nodes in order to kind of get higher expressivity there's all sorts of techniques to do that but yeah and there's always this kind of tension between being more expressive and being efficient that we'll always be there in any sort of algorithm or method so it's kind of hard to say I mean it's definitely, we can definitely say this is kind of not the ultimate solution let's say if you want to do things like message passing in itself but maybe doing some sort of computations on graphs could be maybe also something that's kind of maybe missing a bit in kind of the graph and male setting is the context where your graph is you assume your graph is known and you need to have some graph structure at least a sensible way to construct it but for many kind of more I don't know like more obviously I phrase that I guess for less clearly defined things like okay if I'm an agent doing perception in the real world or something like if I'm trying to create a graph of the world you know like what's an object what do I create a node for if I want to have like one other object and there's some connections between objects and things like that just I know it's like some wild example that comes to mind I don't know if you actually want to do that but let's say you want right then there's also like all these kind of blurry things like what's an object and what's not an object what's kind of somewhere in between maybe you know is that a node is that so it's kind of like the one I'm trying to say that the graph structure is kind of very discrete like it's kind of the node is either there is not there and edge is there is not there but then the world is kind of very fuzzy right so if you use graphs as a model for your world then there probably has to be some decision to be made somewhere about these kind of fuzzy concepts do they actually translate in a concrete graph entity like an object an edge or whatever or not based on you know some kind of inference procedure and I don't know if we do that or not as kind of humans as kind of as intelligent agents but that's you know kind of some interesting thing to think about maybe you could also well maybe one way to solve that there's also kind of stuff like soft edges and things like that and in some way you know if you have attention coefficients it's a bit like that you know like if an edge has a weight of 0.01 or something it's almost like not being there but you know it's still kind of there so it's a bit of a soft graph architecture but so I guess at the edge level you can implement the softness but I think it's a bit harder at kind of the node level right like how do you kind of model a node that's kind of there and not there yeah there's just some random thoughts that's very interesting about the fuzzy object identification and kind of similarities and differences between nodes and edges even though in some ways they have some similarities to or interoperabilities too one other point of contact was like an underlying hidden space that we understand topologically that projects a vector space from different places so that could be a vector of thermometer readings and we want to have a smooth path within the homeostatic range to find up to a boundary point not saying that that's the structure of the world but a structure of a very heuristic and simple model might be to aim for continuity and have a defined hidden estate space that has continuity underneath and is able to emit vectors as that kind of brings some of these classifier type discussions that you brought up and like the kind of fundamental impossibility of geometric classification because you are gonna end up with gray zones whereas even if it takes a bit wise description you can separate the network so that gives an actual completeness measure and that allows measures like I mean amount of computational resources or in a more like statistically principled way like the Bayesian information criterion so how many nodes should we have? We should be on some tradeoff front in some modeling space I don't know what to tell you it's a map not the territory and that's more justifiable and so even lifelike organisms might wanna self evidence staying emitting from a living state and so that provides a really simple graphical architecture to cybernetic systems and then active inference explores a lot of different more specific motifs within that broader blanket persistence picture and the path of least action so that's what enables the physics in that space and why these methods which as far as I understand are often used in quantum mechanics or being able to come together with active inference this way. Yeah, something that comes to mind when you mentioned this I think there's also like a recent avenue of research in this area where people and it's again kind of generated by the fact that you don't know the graph before hand many times and I think kind of the old school approach was well you constructed based on some rules like you're gonna say all these I don't know some things are similar I'm gonna put an edge between them and you define similar in whatever you like and so on and there was this kind of recent trend where what you try to do is kind of latent graph inference or some people call it manifold learning if you think of the graph as some sort of manifold but this kind of very informally speaking and essentially what you would do is like you would map whatever you try to learn the raw observations into some latent space and that's where you actually construct the graph you construct the graph in the latent space rather than kind of in the raw space so that might be kind of a way to deal with posiness as well because then I guess you might lose some some of these kind of very concrete one-to-one mappings because you might learn some node in the latent space that maybe corresponds to three or four concepts kind of mixed together and there's all these kind of nice experiments with neurons in the network, it's kind of visualized and they learn maybe a kind of a mixture of concepts like if you see what actually activates that neuron is actually maybe a few classes or different kinds of things it's not necessarily a single thing so it could be something very similar here where you have some very entangled representations that are kind of distilled in this latent graph and then at least in concept space even if I kind of in the latent space that's still kind of a very clear combinatorial structure which respect to kind of your raw observations that structure can still kind of encode the fuzziness of the world to some degree because you have this kind of mixture of concepts that got distilled in the same node or things like that or maybe some concepts could be represented by multiple nodes depending on where you see these concepts there might be also some variations or concepts or points of view and so on so I think kind of latent graph inference could be quite an interesting way maybe to address some of these issues where we were discussing although I think it kind of died off a bit in the recent year at least as far as I've seen there were a few, slightly fewer papers on the topic Hmm, well certainly the agents proposed latent structure of the world the causal structure of the world is just mapped on the territory and so it enables maybe some of those coarse grainings Could you go to the slide where there was a mapping between a smooth sphere and then a regular geometric shape? Let's see, in this one? Yeah, just wanted to make one point, see if you had any comments at the heart of some of the relationships that you're describing and where you pulled back to in terms of generalization helps us understand this relationship between the sphere and the geometry and the implications for data processing and all of the computational science areas is if you are preserving or learning or analyzing geometry but not topology or the other way around you might get these different like data set aberrations like you might have the topology of the coffee cup but it looks like something totally different and so what we would really want to do would be understand the relationship between geometry and topology because if we could understand it in principle like you have it on the left side and then in practice with the data scheme on the right side or insert your own left and right side there then we'd be able to do data analysis in a way that respected slash preserved both the topology and the geometry so it's like two compatible perspectives that have their different like strengths and weaknesses and heuristics and so understanding that relationship between geometry and topology and the implicit spaces that geometry requires and so on that has tremendous use and just in closing reminds me of Buckminster Fuller's Cinergetics which uses a close packing architecture and a tetrahedron centric model of coordinates to find more continuity between service area and volume and between the smooth surfaces and the great circles on them and like the points of connectivity on shapes so I think it's an incredibly deep area and really has fundamental impact and active inference helps us think about our models in this way kind of like the inflated balloon and with the fuzziness and the architecture and the finiteness it really brings a lot to active inference and so I appreciate you sharing the work with us today and continuing to work in this way Thanks Art Any last thoughts? Yeah I think, yeah what you mentioned I think it's been all over my thesis this kind of tension between topology and geometry and maybe what I want to emphasize is that you know I'm not saying kind of the previous perspective of looking maybe more geometrically or things was wrong in any way and on the contrary actually there's lots of interesting places where these things intersect even in kind of this chief paper I briefly went through like if you actually read the paper there's there's a lot of beautiful intersections actually my main collaborator, Francesco he's a differential geometer so he actually had lots of kind of inputs from that side and indeed I think you know we should try to use all these kind of layers of structure you know in the best way possible for all our methods Awesome all right thank you till next time Thanks Art, thanks for having me