 All right, hello and welcome. This is Active Inference Mathstream number 10.1, March 28th, 2024 or 29th, not exactly sure. And we're here with Thomas Farley discussing generalized decomposition of multivariate information. There will be a presentation followed by a discussion. So Thomas, thank you very much for joining, looking forward to this and everyone's questions. Yeah, thank you for having me. I'm really excited to get to talk about this with people who are also interested. And I think it's the 29th because I compiled this LaTeX last night, was the last time I compiled it. So I think it's a one day out of sync. All right, and so yeah, so I'll be talking today about generalized decomposition of multivariate information. This is sort of walking through a paper that I recently published in plus one and I'll have a link to that at the end of the talk if you want to read more or sort of interested in what I've covered here. And so we'll start with a little bit of a background, sort of the intuitions around information theory and then introducing these two ideas of the partial information decomposition and the partial entropy decomposition. And I can't see who's in the audience, but if you are an expert in multivariate information theory, you can just go watch YouTube videos for five minutes or so while we get everybody else up to stream. And then once we've got sort of all the necessary mathematical machinery built, then I'll talk about how I can, how I've taken the PID and the PED and sort of generalized them to this thing I'm calling the GID, the generalized information decomposition, which is based on the Kohl-Bock-Liebler divergence. And once we kind of built that, we'll talk a little bit about how the GID can be used to get insight into other information theoretic measures, you know, well-known things like the total correlation or the Tononi-Sporn's Edelman complexity. And then finally, to show that it is, you know, truly a generalized generalization of the PID, I'll actually walk through the derivation of the classic Williams and Beer bivariate partial information decomposition from the generalized information decomposition. And there are actually some kind of interesting sort of mathematical questions that kind of that you discover as you do that derivation, that, you know, I'll talk a little bit about other than a brief section at the end sort of talking about sort of future work, what we could do with this. And since this is the active inference stream, you know, and with a little bit of discussion of sort of predictive coding and neuroscience and that I would be really interested in talking in some way to people in the audience who might be more expert in active inference or free energy than I am, you know, because they're, you know, maybe things that could be done with this that are not super obvious to me just because I'm sort of coming at this from a slightly different angle. So with that in mind, let's get started. Background. So, you know, this is the active inference stream. So I sort of assume that we're all familiar with sort of information theory and complex systems. You know, it's really, I think become the case that information theory is sort of emerging as a lingua franca for complex systems, right? It's nonparametric, it's model free, which makes it very useful for complex or, you know, nonlinear systems that are not well modeled just by, you know, linear regressions. It has some really deep and interesting connections to thermodynamics that I won't talk about here, but, you know, that's something I think is really interesting. And there's some very cool work pushing that forward. And then what makes it most relevant for complex systems is that it really elegantly handles multivariate interactions, right? You know, it's hard to sort of imagine what a five-way Pearson correlation might look like, besides just a covariance matrix of pairwise correlations, but information theory can really handle multivariate interactions sort of on their own terms, okay? And then so I'm sure we're all familiar with the basic object of study is this thing called the entropy. This is a Shannon's classic entropy function, negative sum of p log p. And I was introduced a little bit of notation here, so random variables I will be denoting with italic, uppercase italics. The support set, so that is to say all of the states that, you know, italicized x can take will be indicated with the mathcal, calligraphic font. And in the specific state that, you know, x is all of the elements of the support set, I denote with lowercase italics. All right, and so, you know, as we continue to leverage this notation, this distinction between sort of average or general random variables and their specific local instances is going to be quite important, right? So classic Shannon information, we're just iterating over all of the possible states that x can take and computing the probability of that state times the log of the probability of that state, okay? And so this quantifies sort of our average uncertainty about the state of x, right? You know, if you were to take x, you know, it was a coin and say you're a coin that you were flipping, you know, it quantifies how uncertain are you about the state of x, you know? And from this, we built, you know, the classic mutual information, Shannon mutual information. It's a pair-wise measure of correlation and it is arguably, I think, the most fundamental, one of the most fundamental ideas in modern information theory. And so, you know, we would read this as, you know, the information that x discloses about y or ultimately the information that y discloses about x is the difference between our initial uncertainty about x. So this is, you know, the entropy of x. And then we subtract off the uncertainty about x that remains after we have learned y, right? So this is the conditional entropy here. And when I teach this in class, you know, I like to think, you know, we start with some big pile of uncertainty. We scoop some uncertainty out when we learn y. And so we have a slightly smaller pile of uncertainty left. The difference between, you know, the big pile and the small pile is the amount of uncertainty that we scooped out, right? That's the uncertainty that is resolved by learning y, okay? And so, you know, since we're all big fans of sort of Bayesian thinking here, you know, I like to think of this, you know, our initial uncertainty about x, this is our prior, right? This is the uncertainty about x that sort of we come into without having learned anything, okay? Then we observe y and we compute a new uncertainty about x and that is our posterior uncertainty about x after learning y. And thinking about this in terms of priors and first posteriors is really helpful because this is just a particular case of a prior-posterior relationship, right? You could sort of heuristically propose a more general definition of information. It's just the difference between a prior uncertainty and a posterior uncertainty and how you sort of formally specify the prior and the posterior is sort of largely up to you, okay? And so mutual information is just the special case of this broader definition, where the prior and posterior are related by these sort of joint marginal relationships, okay? But again, it's just a special case of this much larger sort of definition and we're gonna come back to this, come back to this in a moment, okay? But continuing on sort of our information theory background, you can do the multivariate mutual information if you want. So here now we have two sources, x one and x two that are together disclosing information about y, right? And again, you have our prior, which is just our uncertainty about y and then we're gonna subtract off the uncertainty about y that remains after learning x one and x two, okay? So far, so good. Any questions, Dan in the chat, people hanging on for dear life? Good, okay. And so when you start looking at like multivariate mutual information, things can get really interesting, really fast, right? So like the first thing is that, this mutual information, the information that x one and x two disclose about y is not trivially reducible to the sum of the mutual information, right? The mutual information that x one and x two disclose about y is not equal to, not generally, let's say equal to the information that x one discloses about y plus the information that x two discloses about y, right? The whole, in this case, x one and x two is not trivially reducible to the sum of the parts, okay? And you can have both cases, right? If the joint mutual information is less than the sum of the marginal mutual informations, then what that tells us is that there is information about y that is duplicated over x one and x two. And so when you sum the two together independently, you are double counting that information, right? And so this is what we call redundancy, right? It's that information about y that is copied over both of the inputs and gets double counted when you add them together. Arguably the more interesting case is the opposite, right? The case where the whole, the information that x one and x two disclose about y jointly is greater than the sum of the two marginal mutual informations, okay? And if you're familiar with, ooh, that should be y, not z, I apologize, that's a typo. And so if you're familiar with sort of like integrated information theory, some of the stuff that Giulio Tannoni has been doing, you know, you kind of might recognize this as something like integrated information, all right? It's information that is in the whole, as the joint state of x one and x two that is not present in the sum of the parts, okay? And so, you know, people call that, like I said, integrated information. We're gonna call it here synergy, all right? And so we have these two sort of interesting things that are happening. We've got a freeway interaction between x one, x two and y and it has like two different, it can have two different flavors, this redundancy, which is duplicated information and a synergy, which is information that is only accessible in the whole and not any of the parts, okay? And so, you know, if we sort of take these two ideas, we can kind of make them a little bit more formal and a little bit more rigorous by noticing that we can talk about them in terms of logical connectives, okay? So redundancy, I like to think of as that information that could be learned by observing x one or x two or x three or, you know, as many x's as you have in the, in your data set, right? And so for instance, if x one and x two are just, you know, linked by a copy gate, you know, x two is just a copy of x one, then you could look at either one on its own and get all of the information, right? Because they are just copies of each other. In contrast, synergy is information that can only be learned by observing x one and x two and x three and yada, yada, yada, okay? And, you know, this link between logical connectives and higher order information lets us do like really interesting things, like we can construct very exotic combinations of information, and this will come up more when we talk about the information decomposition. But if you could ask for instance, okay, what information could only be learned by observing x one and x two or x three and x four, right? And so, you know, we're starting to see like what the structure of, you know, multivariate information might look like it's gonna be these sort of chains of atomic, it's gonna be these chains of sort of atomic subsets of the collection and inputs joined by logical connectives, ands and ors, okay? Any questions about this, because this link between redundancy and synergy and ors and ands is very helpful for thinking about what's to come. So far so good, okay, cool. So, this brings us to the partial information decomposition, all right? And so the idea is that we would like to take this joint mutual information, the information that x one and x two disclose about why and these atomic components that resolves this ambiguity about the relationship between the parts and the holes, right? Remember we said that the sum of the two marginal mutual informations can be greater than or less than the joint mutual information, right? So, you know, how can we extract a relationship that sort of brings all of these into harmony, okay? And the way we're gonna do this is with this sort of branch of modern information theory called the PID, partial information decomposition. What it does is it takes the mutual information that two, you know, that x one and x two disclose at why and it breaks it down into the redundancy. So, this would be the information about why that could be learned by observing x one alone or x two alone, plus the synergy. So, this would be the information about why that can only be learned when x one and x two are known together. And then these two unique terms, right? Which is the information about why that can only be learned when observing x one or only learned when observing x two. And then to force harmony between the joint and the marginal mutual informations, we will also say that the marginal mutual informations should be decomposed in the same way. So, even though, you know, the information that x one discloses about why makes no obvious reference to x two, if there is still that redundant information about why it could be learned by observing x one or x two, plus the unique information from x one and likewise for x two, okay? And I'm sure many of you have seen that these, this kind of classic event diagram that goes back to the original Williams and Beard paper, but I find that it's really helpful to sort of build the intuition here, right? So the big oval is the joint mutual information. The two circles are the marginal mutual informations. And then so we have the redundancy is the overlap between the two marginal mutual informations. The unique information is the difference between the marginal mutual informations and the redundancy. And then this purple synergy is the sort of extra special sauce that you get in the whole mutual information that is not, you know, present in either of the two marginal mutual informations, all right? And so, you know, this is a very restrictive case, you know, we just have two inputs and one target y, but I find it's really helpful to sort of getting a handle on the underlying logic of this thing, okay? But of course, we typically are not looking at us at systems that just have two inputs and one target. We often have many, many systems. And so, you know, there is a general case for the PID and you can take for an arbitrary number of inputs, you know, x1 to xk all disclosing information about y, it can always be decomposed into a finite set of atoms that are structured into a, what we call a partial information lattice. This is also an anti-chain, an anti-chain lattice, whether we were into order theory, but again, you know, over here, we have the two element case. So, these are x1 and x2 predicting information about y. And here, you know, we can see this has the same structure as this van Dyne diagram. We have the redundancy down here at the middle, which is subsumed by the unique informations and then we have the synergy up at the top, right? And as the number of sources of information grows, the lattice grows, you know, super exponentially and you start to get, you know, more complicated information theoretic relationships, right? So like this little guy that I'm circling right here is the information about why it could be learned by observing x2 or the joint state of x1 and x3, okay? You know, I'm not gonna, you know, spend too much time dwelling on these more complicated partial information atoms, but you know, the idea here is that for an arbitrary number of sources, you can decompose the information using this very elegant lattice structure, okay? And so formally, you know, if we were gonna try to do this all out with notation, what we would say is that we have a decomposition that takes the mutual information that x1 all the way up to xk just closes about y and decomposes it into a sum of all of these partial information atoms, right? And so math cal l here is our calligraphic l here is the lattice, each of these alphas sub i's is a vertex on the lattice. And then, you know, we're computing the partial information that that vertex for that atom discloses about y, okay? And so again, you know, you have a partial information atom like this, this is the information about y that could be learned by observing x1 and x2 and x3 or x4 and x5 or x6, okay? There's a certain amount of sort of detail that I'm kind of alighting here, like how you actually compute the value of these partial information atoms. I can talk about that at the end, but for our purposes, you don't really need to know that. It's just sort of sufficient to know that this decomposition exists and has this form that is connected by logical ands and ors, okay? So the PID is very powerful, it's very cool, I'm a big fan of it, but it has a pretty severe limitation that it requires distinguishing between source elements, that would be all of these guys, and a single target element, right? And complex systems aren't typically that nicely organized, right? You know, there are many cases where it doesn't make sense to say, okay, what are our sources and what are our targets? We would like to be able to sort of decompose the structure of a system claw itself, okay? And we can do that by playing a really fun little mathematical trick, which is for a given multivariate system, which I'll denote as bold X, which is just X1 through Xn, what we're gonna do is we're going to decompose the mutual information that all of the parts, the individual XIs disclose about the whole, all right? So here, you know, our sources, our sources of information are the parts, all of the individual variables, and the target is not some other variable that's exogenous to them, it's actually just their own joint state. And so we know from this basic identity that the mutual information that all of the parts disclose about the whole is just the entropy of the whole. And so when we do the PID on this particular mutual information, we know that all of the atoms have to sum to the entropy. And so this becomes the partial entropy decomposition, okay? This is the information that all of the parts disclose about the whole, you know, of which they are a part, essentially, okay? And I spent a lot of time working on this partial entropy decomposition. Shout out to Robin and C and Connor Finn and Joe Lizzie, who were working on this a little bit before me as well, and laid down a lot of the mathematical. Sort of fundamentals about this. And for a long time, I was sort of satisfied with the partial entropy decomposition because, you know, it seems, you know, to provide this decomposition of the system quite self, but it still has now a new limitation. It gets rid of the source target distinction. But the problem is that entropy is no longer really a measure of information, right? In the way that we introduced it before, right? It's more of this measure of uncertainty about the state of the whole. It's not really a reduction in uncertainty. And so, for instance, you can see that, you know, entropy is maximal if all of the elements are independent of each other, but it doesn't really fit our information, the intuitions about information, right? If entropy is maximal, we have minimal information. So it seems kind of odd that, you know, our partial entropy decomposition, it's odd, I shouldn't say. It makes perfect sense that the partial entropy decomposition is, you know, maximized in the case of total independence, but that's not really information, you know, as we typically would like it to be, all right? And so, to try to resolve this problem and really get a general decomposition of multivariate information, I turn here to the Colbach-Liebler divergence, okay? And so the Colbach-Liebler divergence is a nice target-free definition of information and it's actually sort of a generalization of the classic Shannon mutual information in its own right, you know, the divergence of P, P of X from Q of X is the amount of information that you gain when you update your beliefs about, you know, system X from some prior distribution Q of X to a posterior distribution P of X, right? And so now we're starting to think about priors and posteriors again, coming back to this initial definition of information that I introduced earlier, right? You know, in the particular case where Q of X is, you know, the product of the marginals and P of X is the joint state, you know, the Colbach-Liebler divergence just reduces again back to the classic Shannon mutual information. But again, we can, you know, plug in any prior and any posterior and we still had a meaningful definition of information, okay? So, you know, my goal then was to say, okay, can I take the percolentropy decomposition which almost gets us to where I want it to be and can I somehow remix it so that instead of decomposing the entropy decomposes the Colbach-Liebler divergence instead and spoilers, yes, we actually can do that. It's not that hard. And so the basic idea comes down to this very simple sort of algebraic manipulation of the Colbach-Liebler divergence, right? So, you know, up here we have it written in terms of, you know, something that looks a lot like Shannon's entropy, right? You know, you have the sum over all of the states and the support sets, you know, times the probability of that state, then we have this log ratio, P of X over Q of X, okay? This is an expected value computed with respect to P of X, so I'm just gonna take the summation and the P of X and I'm just gonna shift it out into the expected value operator and not think about it again. So that leaves us with the expected value of this log ratio. By the rules of logs, you can split it apart into a difference of two logs and then these two things look a lot like the Shannon information content, right? The local entropy. And so we can take these two things and we can actually rewrite them in terms of two different local, local entropies, right? So we ultimately have the difference between the Shannon information of surprise of seeing, you know, this particular state little X computed with respect to the probability of distribution Q minus the surprise at seeing this little state of X, computed with respect to the probability distribution P, okay? And so if we go back to thinking about our priors and our posteriors, this looks very, very similar to the heuristic definition of information that I gave earlier, right? We have the difference between a prior uncertainty, although in this case it's a local uncertainty or a local surprise, and then this posterior surprise, attributable to each local configuration of X as well, all right? So again, this is, you know, matches all of our definitions about, or our intuitions I should say, about information and, you know, we can see that it falls out of the Kohl-Bock-Lieber divergence very nicely, okay? And this is really, this is sort of, this is the most, the crux of the whole sort of derivation that I'm doing here, right? I said you can recognize that the Kohl-Bock-Lieber divergence is the expected value of the difference between two local entropies, which is, you know, I've never seen it written out this way, although this is, I don't think this should be surprising to anybody who's familiar with this, you know, it's not ground-breaking discovery or anything. But now we can actually bring in the partial entropy decomposition that I introduced earlier, right? Because this thing, this, you know, local entropy, well, we can write this out as a local mutual information, you know, but now we're just looking at specific states as opposed to, you know, average states, but the logic is exactly the same. You know, we decompose the sort of a surprise that all of the parts just close about the whole and we end up with a localized partial entropy decomposition that we can apply to every realization that our system X can adopt. And so this gets us to the Kohl-Bock-Lieber divergence decomposition that I wanted. So, you know, we start with the definition of the expected value of this difference between two local, two local entropies. And then we can just use our local partial entropy decomposition to decompose each one of these into its components atomic bits, essentially. And so we end up for a given atomic information or information atom, the partial divergence, as we go from to P from Q, it's just equal to the expected value of the partial entropy atoms, the local partial entropy atoms from the prior and the posterior, all right? And, you know, because this inherits a lot of the nice properties of the partial entropy decomposition, we can then just get the original Kohl-Bock-Lieber divergence back by summing all of these partial Kohl-Bock-Lieber divergences over all of the atoms in the lattice, okay? And so I realized that there's been a lot of notation here, a lot of partials and superscripts and subscripts, but you know, really what we're doing is we're just, you know, we're computing two different partial entropy decomposition lattices. So this is the prior lattice and the posterior lattice and then we're just subtracting them element-wise. That's really what this works out to be. And you just do this for every possible state that X can adopt and then average over all of the states, essentially, okay? But, you know, I like to think of it literally just in terms of subtracting one of these lattices from another. I find that it's much more sort of visually intuitive than all of the symbology in the mathematical notation and everything. And so that is the generalized information decomposition, right? It's really just the difference between two localized partial entropy decompositions. And so before we sort of jump into the fun applications, are there any questions about, you know, what I've worked through, everybody again, hanging on for dear life? Yeah, that's awesome. I definitely will have questions and I'm sure people in the chat will too, but let's hear about the generalized information decomposition that we can get to it. Thank you. I'm playing. Okay, cool. And so the really nice thing about this generalized information decomposition is that any information theoretic measure that can be written in terms of a cold block we blur divergence can now be decomposed, right? You know, beforehand, we were kind of stuck with like various special cases, right? The PID works for multiple sources, disclosing information about one target, the partial entropy decomposition and work for the entropy. But now with the GID, you know, any measure that can be written as a cold block we blur divergence is now fair game. So for instance, you know, I just picked the total correlation as kind of an example. It's one of the multivariate generalizations of the mutual information. And it just generalizes this idea that we're looking at the divergence of the true joint statistics from the product of all the marginals, okay? And this is very nice. It's, you know, zero if all of the elements are independent and in contrast, it's maximal if all the elements are copies of each other, essentially. And so that makes it a very nice generalization of the mutual information, but it doesn't really distinguish between lower order and higher order deviations from independence, right? It doesn't see any kind of distinction between redundancy and synergy that I introduced earlier. But by taking the total correlation and plugging it into the generalized information decomposition, we can start to tease out the lower order and the higher order interactions. So for instance, you know, just as an example, I used the logical exclusive or gates, this sort of classic example of synergy and discrete systems, the information that any XI discloses about Y is just zero bit, but then the information about Y that's disclosed by X1 and X2 is one bit, right? So this is pure synergy, the all of the information about Y is in the whole and not in any of the parts, okay? And so if we take this and we plug it into the generalized information decomposition, you'll get a lot of zeros, but I just want to bring your attention to this last column over here. So this is the total correlation and all of the information here is in this, you know, the very tippy top of the lattice, you know, the highest order synergy term and not anywhere lower, okay? And so, you know, a big table full of zeros is probably not that exciting, but it does a really good job, I think, of illustrating that, you know, this GID is sort of, is doing what we want it to, you know, this is a very good sanity check that, you know, yes, the synergy, the logical XOR gate is purely synergistic and we get that back from our generalized information decomposition. And I should note that I'm using the H-min redundancy function from Connor Finn and Joe Lizzie, if that doesn't mean anything to you, that's fine, but it's, you know, that helps describe values to all of these, okay? Similarly, if we have some measure that is built on multiple cobalt-lubler divergences, you can also decompose that as well. So here I looked at the pretty well-known to the Noni-Sporne's Adelman complexity and, you know, I won't go into all of the details, but the idea is that it's this measure of complexity that was designed to be low, both in the case where everything is just random. So if every Xi is independent of every other XJ, you know, that's kind of like an ideal gas. There's no structure there, right? You know, so TSE should be low. But also when every Xi is just a copy of, you know, XJ, well, then the system is also sort of not complex, but it's crystallized, you know, it's sort of highly synchronized, highly redundant, very boring. And so the TSE complexity is sort of high in this interstitial zone where integration and segregation kind of coexist, right? And that sort of matches a lot of our intuitions about complexity. And then if we, and if so, if we take, you know, the GID and apply it to each of these total correlations and then add and subtract sort of the sums and differences between these two, what we end up with is this really nice decomposition where we can see that the redundancy, so these are all the, these are total correlation atoms, but I've just, you know, removed the total correlation just to make this visually more accessible, you know, we can see that it really penalizes redundancy, right? And sort of the more redundant you are, the stronger the penalty for being redundant is. And then as you climb up the lattice, so, you know, here are the redundancies at the bottom of the lattice, as you start to climb up, eventually you hit a point of increasing synergy and then suddenly it starts to reward that synergy. Okay, and so we have this very nice transition as you go from low on the lattice to high on the lattice, you know, going from penalty to reward. And so I think that this shows us something really interesting about how the Tanonis-Forn's Edelman complexity relates this idea of, you know, complexity and integration segregation balance to this idea of synergy, right? You know, the TSC complexity was developed like two decades before we had a really rigorous understanding of synergy. It was not designed to be a measure of synergy, it was designed to be a measure of complexity. And then it's, you know, only now much later that we learn that, you know, there seems to be something very deep linking this idea of balanced integration segregation to synergy, okay? And so again, if you're not, you know, super up on the TSC complexity, that's not necessarily like the main takeaway here. What I'm really trying to demonstrate is that the GID can be used to give us insights into other measures, right? That it can help us understand, you know, what are these measures telling us in information theoretic terms, okay? And there are a ton of other measures that we could be using here and that we could also decompose in the same way and get sort of similar insights. And that's something that I would be really interested in doing sort of in the future. But finally, the last thing that I wanted to show was that we could recover the single target partial information decomposition from the GID, right? If the GID is a true generalization of the PID, we should be able to recover the PID as a special case, right? We started with the PID, we turned it into the partial entropy decomposition by doing this weird sort of thing. We made the target the joint state of all the parts. And then from that, we built the GID, can we close the loop and go all the way back to the PID, okay? And so just to remind you, the PID takes the information that X1 and X2 just close about Y and breaks it down into redundant, unique and synergistic components and then does the same thing for the two marginal mutual informations, okay? Can we recover this from the GID? And it turns out that, yes, you can, right? We can write this joint mutual information in terms of an expected value of Kolbach-Liebler divergences, right? And so our prior is just the statistics of X1 and X2 together and then our posterior is the statistics of X1 and X2 after conditioning on Y being in some particular state, right? And then so we compute the expected value of this thing over all possible states that Y can adopt, right? And so this just works out to computing, you know, a bunch of lattices, essentially, two element lattices because we have, you know, two variables, X1 and X2. And then we just average, you know? So if we were to take the expected value of all of the top lattice synergy terms, we would get the PID synergy back. You know, if we were to take the average or the expected value over all of the redundancy, the redundancy is down here at the bottom, we would get the PID redundancy back, okay? And this does recapture the partial information decomposition as we would expect. If you use the HMIN redundancy function for your generalized information decomposition, you get the PID back that uses the HMIN redundancy function for the partial information decomposition. So, you know, we can see very clearly how the entropy terms turn into the information terms, okay? And that all sort of looks well and good. And I was, you know, pretty happy to be putting a bow on this project until I realized that there is another way that you can write the joint mutual information out as a Klobach-Liebler divergence. And that's this way, right? So, you know, we can also write the same thing out as the Klobach-Liebler divergence from a prior, which is the product of, you know, the probability of X1 and X2 times the product or times the probability of Y. And then the posterior is our joint state, okay? But this requires decomposing a three-dimensional probability distribution Px, probably X1, X2, and Y, whereas before, we only ever were decomposing two variable probability distributions, okay? So, this is why, you know, in this case, we had a nice four element lattice. Here, we're actually decomposing a three-dimensional probability distribution. And so the resulting lattice will have 18 partial information atoms rather than four, right? So we're gonna end up with this decomposition of the joint mutual information as opposed to the expected four element decomposition, okay? And I spent a long time trying to, you know, find some way that I could squish this 18 element lattice back down into the four element lattices that I sort of, you know, expected to get. And I was not successful. The people who reviewed the paper were also, you know, didn't have any suggestions either. So I won't say that it can't be done, but if there is a way to do it, it is currently beyond me. And so this sort of puts us in a little bit of a pickle, right? Like we would expect the decomposition, we would expect a unique decomposition, I don't want to maybe not say expect, we would hope that there would be a unique decomposition for a single mutual information, right? It's kind of odd that you can take one mutual information and depending on how you like choose to denote it, you know, how you choose to write out the Colbeck-Liebler divergence, you could end up with two sort of non-identical, non-interconvertible decompositions. So what's happening here? Like this was very odd to me. And so my working theory, and this is just a theory, if any other mathematicians are in the audience and want to take a stab at it, you're more than welcome to, I think what's actually happening is that, you know, these two ways of writing out the joint mutual information are actually sort of fundamentally different, right? In the first case, we're looking at a three-way interaction, right? We're looking at the case of the interaction between X1 and X2 and Y, okay? Whereas in the second case, we are sort of abstracting Y out for sort of removing it from the divergence and sort of wrapping it all in this expectation value of Y, okay? And so what you're doing here is you're actually, like these are two different kinds of dependencies, right? They're mathematically interconvertible and they'll always resolve to the same value when you plug in numbers, but they are in some fundamental way different kinds of structure in the system, right? And so the classic Shannon information theory sort of can't see this difference, right? It just says, oh, you know, these two things can be, you know, algebraically turned into each other and therefore they must be the same, but actually kind of under the hood, they are, you know, two different ways of looking at this interaction between X1, X2 and Y. And so again, classic Shannon information theory can't see this difference, but the information decomposition can. And I sort of should note that this is not the first time this is sort of turned up. There are a couple of other cases where a classic Shannon information theory has been unable to see a distinction between two, you know, categories of object, but, you know, when you start looking at the information decomposition, suddenly you can, okay? I'm sure there's a lot more to say about this and I'd be interested in talking to people about it after the fact, but you know, I'll just sort of leave you with this question now of sort of non-identical decompositions of the same value. And let's see, I think I'm coming to the end of my time here, so I wanna talk just a little bit about sort of possible applications and some future work that I would like to do and I'd be happy to collaborate with other people with as well. And so I think that this sort of is a big step towards what I call like a grand unified theory of multivariate information, right? We have this sort of interesting case where you had, you know, we started with the PID, right? And then from the PID, we can construct the partial entropy decomposition, right? The PED is a special case of the PID. And then from the PED, the partial entropy decomposition, we can construct the generalized information decomposition, right? And then we can take the generalized information decomposition and we can re-extract the partial information decomposition from it with sort of these weird caveats, right? And so, you know, this kind of, you know, or a Boris sort of thing where we have all of these different information decomposition that are kind of fundamentally built out of each other, I think is really, you know, getting at something fundamental with how we can think about sort of these parkhole relationships in multivariate systems. The last thing that I sort of haven't been able to do is incorporate the integrated information decomposition. So this was proposed by Edward Mediano and Fernando Rosas and the other people at, in England, I'm not sure where everybody is now, but, you know, they provided a very, very elegant generalization of the PID that allows you to have multiple sources and multiple targets. And so, you know, I would really like to be able to find a way to incorporate that into this PID, PED, GID framework for sort of a truly brand unified theory of multivariate information decomposition. Still working on that, open to collaborating with anybody who's also interested in taking that project on. And then finally, so let's talk about predictive coding because this is an active inference stream. And so, I assume everybody is sort of familiar with the basic idea of predictive coding that the brain is sort of acting as a kind of Bayesian inference engine, right? You know, it has a world model, it has some set of beliefs about the world and it updates those beliefs, you know, as it navigates through it and receives sensory stimuli, okay? And so, you know, one of the things that I'm really interested in is, you know, how might this predictive information about the environment be redundantly or synergistically distributed over different sensory channels, right? Like we're not just learning, you know, we don't just have vision, right? You know, we have vision, we have hearing, we have touch, we have proprioception, there are other animals that have senses that are totally alien to us. And so, when we take in all of that sensory information and build a world model, you know, we can't just treat every one of these incoming information streams as independent of everyone else, right? We are, have to learn, you know, these complex higher order interactions between different incoming streams of sensory information and we use that to build a really rich world model, okay? And so a great example that I first heard from Andrea Lupi, who is, you know, a friend and sort of collaborator of mine, you know, he was talking to me about stereoscopic depth perception, right? For this isn't sort of an example of synergy between two incoming sensory channels, right? If you cover one eye, you lose the ability to perceive, you know, depth, right? So there's, you know, redundant information, you know, that, you know, both eyes get simultaneously. There's unique information that, you know, sort of at the corners of your visual field. And then finally you have synergy, which is the, this sort of emergent perception of depth, this qualia of, you know, receding distance that you only get when you're getting information from the right eye and the left eye simultaneously, right? So from sort of a cognitive modeling standpoint, our ability to extract sort of higher order information from multiple incoming sensory channels is clearly very relevant to our perception and our behavior in the world. And so, you know, I'm sort of tempted to propose, you know, a hypothesis of maximum synergy, right? So, you know, if there are two different channels that are totally redundant, right? You know, an agent that only has, you know, some minimal number of calories that it can burn, you know, keeping itself alive, isn't gonna wanna spend calories, you know, perceiving both senses independently, right? Like all the information you could get from one, you can get from the other. So, you know, why spend money on both, right? But in contrast, if there's, you know, synergistic information in X1 and X2 that you can only get when both are present, well then modeling or having channels to, you know, sense all of them becomes essential, right? And if it's truly synergistic, then, you know, having just X1 on its own or just X2 on its own may not be, you know, giving you any information, you know, it's, you know, if there's synergy in the environment, then you need to, you know, spend calories maintaining both passage of both channels. And so, you know, my guess is that there's probably some kind of evolutionary pressure that says, you know, agents with limited resources in complex environments probably want to maximize the number of synergies that they are sensitive to when building their world model, while simultaneously minimizing the redundancies, right? So redundancies are basically just wasted calories while the synergies are, you know, potentially very informative, you know, higher order interactions that are well worth the calories. And so this is about as far as I've gotten on this hypothesis, but I think it would be something really interesting to explore down the line, you know, maybe either in very simple animal models or in some kind of in silico minimally conscious, not really conscious minimally cognitive agent kind of approach. And again, that's something that I would love to talk to really anybody else about. And then so, you know, that's sort of the end of my spiel here. This is the paper was published in plus one a couple of months ago, you know, here was the link and I'll send around the slides afterwards if anybody wants, I can send them to Dan, hopefully, yeah. And yeah, so that's it. This was work that I sort of largely did on my own so I don't have the usual sort of acknowledgement slides but I'll just say let's jump right to questions. And I'd love to get feedback from anybody else on the work but if you might think about it or what your own thoughts might be. Awesome. Thank you for the very, very clear, great presentation. Lot of places to jump in. Could you speak a little bit about redundancy and synergy with respect to fragility and other system properties because having redundancy, although like as you brought up in those last slides it might cost more energy. And so in the short term, it might appear like, well, why do we need to have redundant information when there are like component failures or other kinds of perturbations to the system? What's redundant in one moment might come to be not redundant. So how do you think about these kind of informational properties of systems or networks with respect to other systems properties? So that's a great question. And I'd be curious to ask that because I actually just put up a pre-reference about exactly that topic. So I don't know if they're aware of it. So yeah, that's a really good question. Redundancy and synergy, how do they relate to things like fragility and robustness? I don't have slides for this but I'm getting just sort of the verbal TLDR. I just did this project where we took very simple Boolean networks and I evolved them either to be very redundant or very synergistic. And then since Boolean networks, there's a lot of ways to characterize their dynamics. I asked sort of what are the dynamics that kind of come along with evolution for redundancy or evolution for synergy? And what I found was that the redundant Boolean networks were extremely stable. You could perturb them and they would almost always fall back to one of the very small number of attractors. They had very restricted state spaces. They were kind of crystallized. In contrast, the synergistic networks were basically chaotic. It was very hard to find ways in which they did not just appear to be chaotic systems. And I'm using chaos in sort of the technical sense here, sort of sensitive dependence on initial conditions. Very large state spaces, lots of attractors, small perturbations sent them off onto very different possible futures. And the punchline then was that if we took these two different networks and asked about their capacity to integrate information, and I used this measure of integrated information from integrated information theory, what I found was that the synergistic networks could integrate lots of information, whereas the redundant networks were very stable but could not integrate information at all. So there was sort of this, when you were at the extremes, you could be very stable and totally unable to integrate information or you could be hilariously unstable with lots of integrated information, right? And so the last thing that I tried then was I actually then evolved networks that had the TSE complexity that I mentioned before, this balance of integration and segregation. And what I found was actually that the complex networks were able to sort of split this difference, right? They were sort of, more stable than the purely synergistic networks. They weren't as chaotic, but they also had more capacity to integrate information than the crystallized networks, right? So we proposed this kind of sort of multi-way trade-off where redundancy brings stability but restricts your ability to integrate information, whereas synergy brings integrative capacity but is destabilizing or fragilizing. And then you can kind of balance those two things with this idea of complexity that brings together integration and segregation in a kind of sort of modular structure. And so that's been sitting under review at chaos for like the last six months. I have no idea what they're doing over there, but hopefully that will be in press before the end of my postdoc, hopefully. And I was, you know, there's a lot of work that I want to do going on sort of trying to build on that as well. So great question. That's awesome. I mean, you've approached from very informational first, sorry, you've approached it from very informational first principles and recovered a lot of these topological and dynamical features of complex systems and how they have to trade off against these like multiple properties of systems-ness that sometimes might be like directly contradictory. A question in the live chat, Celeste wrote, how might GID be related with network controllability? Oh, that's a really good question. And I'm gonna, I'll be honest, I have no idea. Network control theory is something that I've, you know, I'm a consumer of network control theory. I'm not a producer, you know? So I'll see these papers, you know, like Parker Singleton had a great one looking at like network control theory and, you know, LSD and psilocybin. I was like, oh, that's really cool. But I don't, I don't know enough about the theory to really get at it. I, my guess just sort of just off the cuff would be that redundancy would make the network perhaps less controllable because, you know, it always wants to fall back into, you know, like its main attractor where a synergy, you know, probably lowers the energy needed to, you know, push you on to a different, you know, part of the configuration space. But, you know, it's maybe less stable, maybe more fragile. So, so I think that this is a great case where, you know, some, some simple toy models, maybe some analysis of FMRI data could go a long way to, you know, answering a really interesting question. But I, you know, I can't say for certain other than just sort of those very sort of hand-wavy predictions. But I'd love to see it, I'd love to see it done. Cool. Yeah, one thing that you brought at the end with the two eyes looking at the object and getting depth, it really made me think about the multiple scales of redundant and synergistic information. Like you have two photoreceptors that are right next to each other. So just from a first pass correlational value, it would seem like they're gonna be highly correlated sensors. I mean, they're in a sensor array, like on the retina. And yet also the differences between retinal receptors are used to improve like signal to noise characteristics at multiple steps in the relay of visual processing. Then you have the two eyes and depth perception. And then there's also cross modal synergistic information. Like maybe if you just saw the video of somebody or only heard what they had to say, it might convey certain information and yet putting them together, that is very important for understanding like, well, how do different kinds of multimodal experiences convey information about real world things? And for a cognitive modeling like of biological systems or design of cognitive systems, these kinds of redundancies and synergies would basically be ignored at one's peril. Because you can always describe kind of the syntactic flow of information and say, like, this is how much data at each moment we're pushing through this connectivity. But that might be misleading or fragile or over designed or under designed with respect to relatively obvious intuitive ways in which we fuse streams of information to reduce our action oriented uncertainty about the world. Yeah, the idea that at different scales you could have different sort of balances of redundancy and synergy is I think like there's a whole universe of interesting work that could be done there. And my guess is that, you know, you can have somewhere there is a system that has every combination of, you know, micro scale, macro scale redundancies and synergies and sort of a different context in which, you know, that might make those would be useful. I think it would be really cool to think about. Another kind of connection is those lattices. Even just from the two to the three, we could see that kind of explosion of ways to parse the information. And then it made me think about where is time and the foraging or the attentional path of a given cognitive entity because while you're able to represent all the possible partitions and like you as a mathematician can pick up that and handle it, we might, from like an attentional networks in the brain perspective, we might ask like, well, which node on this lattice should we be looking at? Like, how should we blend the indicators from person one and two to the exclusion of three or two and three, but not one? Those are like different attentional modes you can have. And you could have a uniform prior across that lattice and just say, well, we're gonna blur across all them. But then once you start saying, oh, but I like this one and this one's irrelevant and these two are the same, then you're getting into the business of reshaping the portfolio across the lattice. And then that, how do we deal with that when outside of dealing with these systems purely symbolically, if we actually wanna be using certain partitions and identifying useful decomposed components? Like, how do you think about that? Well, so, I mean, first thing is, you know, when you mentioned time, I again wanna shout out Pedro and Fernando and Andrea's work on integrated information decomposition because they really, you know, I introduced it as this like multiple sources, multiple targets, but they have this incredibly cool application of that where they say, okay, well, all the sources are the states of the elements at time T and all of the targets are the states of the elements at time T plus one, right? So now you're decomposing the information that sort of the past discloses about the future. So you can ask, you know, what redundant information in the past, you know, discloses information about the synergies, you know, in the future and you know, what information in the synergies in the past stays synergistic in the future. And so, you know, there's a whole sort of universe of temporal information dynamics, you know, that can be brought to bear here that I just, you know, didn't really have time to talk about. On the topic of, you know, sort of which atoms do we care about? You know, one of the things that I think a lot about is so, you know, these lattices explode super exponentially, right? Like, we don't know how big the lattice for like a 10 element system is because it's the 10th DD kind number and that's like so large that nobody has ever like computed it, right? So finding ways to kind of, you know, figure out, you know, how will we get sort of the information? How do we make this information accessible to us as modelers, you know, is sort of an outstanding problem as well, which is kind of, I think, kind of getting what you're saying. And then the other thing is that like as the lattice gets bigger, the particular atoms become like hilariously arcane, right? It's like, okay, well, what information could be learned by learning X1 and X2 and X9 and X10 or X1 and X2 and X3 and X9 or X1 and X2 and X7, you know, like you end up with these incredibly long chains of logical dependencies that, you know, my guess is almost certainly don't mean anything, right? And so again, figuring out some way to extract or decide what are the meaningful dependencies, right? Not just like what are all of the dependencies, you know, as a mathematician, I like completeness, right? I like having the whole, the whole pie. But, you know, again, outside of that, you know, the question of how do we figure out what information is sort of relevant to us? That I think is a standing problem, you know, I think there's a lot of space there for creative, you know, attacks on the question because there isn't, you know, I've taken my own swing at it, other people have as well. I think, you know, again, there's a lot of work still to be done in that space. And it's very exciting to see what people will come up with in the coming, you know, next five, 10 years. Yeah, a few thoughts on that. First, I think what you said there about even with 10 retinal receptors or like an insect eye with multiple amatidia, like once you get to 10, enumerating the lattice is not even finitely accomplishable. And I think that in a way must speak to the relevance of coarse graining and nesting of models because you couldn't have 10 people in a room and all those pairwise combinations happen. So like there has to be some clustering and nesting. And then also that what you said about the specific atoms being like hilariously arcane reminded me of random forest modeling. And you also had the kind of logical framework. So then it's sort of like, tell me either your favorite cereal and what color of this car you would want or this and that. And like the concept is, yeah, those are drawn from a massive state space of possible thresholds or choices. Yet through statistics and iteration some of those thresholds may identify kind of critical decision points within an empirical state space which is probably gonna be existing on like manifolds or subspaces that don't require the full unpacking of the lattice but there might be whole like territories of the lattice that you can just average over. But the question is how nicely are those distributed on the whole lattice? If those are distributed randomly then there may not be many ways to work with it. But if those are all on one side then a random forest with a simple first question might like bring one into a useful area. Yeah, that's a really interesting idea. I'd never thought about the link to a random forest before but I'm gonna write that down and take a stab at it. I like that a lot. Cool, yeah. I guess one of the interesting questions too is information for whom and about what? Like in the math that's kind of generalized over appropriately so but then when we're constructing active inference models and we're actually using the KL divergence like in the variational free energy where the KL divergence is used like in the complexity minus accuracy way to talk about it or when we're using KL divergence in the expected free energy prospective setting thinking about like the divergence between what we prefer for observations and for how we think different kinds of courses of action are gonna play out looking at those divergences. It does get very specific what we're talking about the reduction with respect to. So how do you bring something that's kind of generalized across observers and types of variables and bring that from being kind of an analytical result into being something like the air conditioner is reducing the KL divergence about preferred temperatures with respect to courses of air conditioning what happens between having this math and being able to use it in those settings. So that's a really good question. And like again, coming from a math perspective like I'm always like gunning for full generality like the less useful this is like the happier I am but like the DID really is just it is a decomposition of the KL divergence, right? Like all of the general stuff is kind of it's sort of an added bonus. So, you could have a KL divergence where your prior and your posterior are very well-defined, right? And you're like, I know exactly what my priors I know exactly what my posterior is I know exactly how to interpret it. And then you can just go ahead and crank the DID handle and you'll get the decomposition back out. So I think the, again, I approach it from sort of the general perspective but if you want to sort of take it and make it specific that there's no, I think, fundamental mathematical problem with that. It's really just a question of how to what extent can you specify exactly sort of what your priors and posteriors are and how do you then interpret the resulting numbers, right? Because the DID itself doesn't care about the interpretation. It just says like we have one distribution we have another distribution and like they have this structure that relates them. But if you have your interpretation and your sort of, let's say maybe domain knowledge that sort of specifies everything, then yeah, I think that that's, that the generality is not necessarily a problem. You know. How would you connect this to the concept of like known unknowns and unknown unknowns? Oh boy. So the problem in information theory is that it has a very hard time with unknown unknowns, right? Like it kind of assumes that you have the distributions, you know, that you have all of the, the distributed like, how do I, how do I say this? This is a little bit getting a little tongue tied here. So, you know, from the perspective of the KL divergence like all it sees is what's in the distribution, right? You know, and you know, so you plug the distributions in you turn the math crank and you get numbers out. There's not an obvious or real way I think to account for the possibility that there is sort of, you know, meta uncertainty, right? Like you can be uncertain about, you know, what's the probability of X one? What's the probability of X two, right? But there's, you know, what is the product? How are you like, how certain are you about these certainties? That's something that as far as I know, you know, the sort of vanilla information theory that I'm familiar with really actually really quite struggles with, you know, and I'm really interested in climate science and climate change kind of stuff. And, you know, I keep running into this idea like deep uncertainty, right? Like how, you know, what's the probability that your probability distribution is actually the right one? You know, and that's like, there's, I think, some very cool work happening in that space, but it's, you know, a step or two beyond, you know, what I think can be coupled with what I've presented here. So I guess I don't know. I think that that's something that we've got to work on. Yeah, that's super interesting. It reminds me even back in learning about like the mean and the variance of a distribution, it's like, okay, so we've parameterized our uncertainty about the mean and we called that the variance, but then how certain are we or how do we get a p-value on the uncertainty? And then in the hierarchical Bayesian modeling approach, you can just keep on going, but the buck has to stop somewhere. And then you can go to kind of another set of methods and pick different prior distributions for that highest possible prior or take other approaches, but that's like a very kind of composite approach that is not necessarily returning back down to the simplicity of the initial question. You're just getting into like higher orders. What is the probability that there was another element that we were missing? And then that, but that could increase the complexity of the problem so much that even a simple question would be kind of like unnecessarily exploded because of even a very small probability of something happening that changes the state space of the initial question very greatly. And then that would kind of like potentially forego a lot of low hanging fruit to kind of cover the bases against some rare events, which still might not even be detected because you would have just basically turned some kind of unknown unknown into a known unknown, but at expensive cost and making the problem quite different. So it's kind of like, is that something that can be resolved in an airtight way? Or is that kind of like the openness that just requires iterated engagements? Yeah, I mean, my sort of my intuition is always, sort of keep things as simple as possible, but not too simple like that the quote to quote says, you know, in the paper, I write a lot about like all of the context in which you shouldn't use the GID, right? That was like something that I spent a lot of time on. And so, you know, the Colbeck-Liebler Divergent does have sort of these odd limitations. Or for instance, you know, anywhere that, you know, Q of X is zero, you know, where, you know, you have a, you know, something happens in the posterior that you didn't think could happen the prior, like the whole thing blows up, right? And so I do think that if you wanted to use this in an analytical context, you know, practical context, not just a theoretical one, you know, would require a lot of thought about like, is this actually the right pool, you know, for the question that I want to answer? You know, and it's entirely possible that, no, it's not the GID, the PID is not the right pool for, you know, a system where you have, you know, changing support sets, you know, or unknown unknowns, and that's fine, you know, there's, you know, plenty of mathematical toys in the box for us to play with. And so I guess like my response to that would be like, if you find yourself asking those questions, like maybe just don't use the GID or the PID, maybe there's something else that would be a more appropriate tool for your question. Hmm, that also makes me think about like, the concept of, I guess, different ways to approach it, but like information or stimuli that have an informational impact, information cannot be negative per se, yet there are ways to move distributions up and down. And then there are ways in which the movement of that distribution up or down would have like, an adaptivity benefit or detriment to an organism. So information is just kind of at this level, this is coming in lower in the processing that it is not necessarily a complete answer to questions about complex systems. However, it helps ground and anchor some subsequent questions about their higher order dynamics in a way that skipping directly to inferring higher order dynamics might not be able to realize. Yeah, the question of whether, is this something that we use to model, these certain kinds of information, theoretic dependencies, or is there something in the brain that is actually, computing something like a little GID as it tries to learn like, what are the patterns that I need to be sensitive to? That's, I guess, is there's probably nothing like that happening, but yeah, the question of how do organisms sort of represent information and represent potentially higher order interactions, that's a really interesting question. I don't know if that's exactly what you were talking about, but that's kind of where my mind went. Ooh. Well, just, we'll wait a minute, if anyone in the chat has any last questions, but I mean, where and how are you going to take it forward? So I'm really interested, I spent, for most of my PhD, I kind of spent working on like, the mathematics of these things, you know, I wrote about the PED, I wrote about this, you know, I've worked on some other stuff. I'm really curious about like, where can this be applied to real data, right? And so, you know, I'm working, currently met at the University of Vermont, I'm working with Joff Banger and Mike Levin on some sort of interesting projects and like evolutionary computing and evolutionary biology to see like, you know, what are these, you know, are these synergies in nature and are they telling us anything about the, sort of behavior or structure of organisms? You know, I'm still interested in neuroscience, you know, I sort of a long standing interest in like psychedelic neuroscience. So, you know, whether there are changes to redundancies and synergies in the brain, you know, when you're, you know, on DMT or whatever, could that help explain why phenomenological consciousness changes? That's something I've worked on before as well. And so really trying to find ways to get out of like, sort of math theory land and see like, does this actually tell us something about nature, as opposed to just like, the structure of information theory is sort of what I'd like to do next. Although that's also expensive and requires data, whereas pure theory just needs a whiteboard and a lot of tech compiler. So it's a lower overhead. Just on that kind of like altered states of consciousness and all of those possibilities, it's almost like, what decompositions help one like gain knowledge about themselves or about something else. And it might be like a decomposition of like this amount of that in this setting or this amount in that setting, provide information on a common target distribution. And that maps out like an architecture that's not this part of the brain's connected to that one. It's not even necessarily the architecture of kind of like the dynamic causal modeling of the neuroanatomy, but it really would be toward like architectures of beliefs and how they have different equivalences or relationships. Yeah, there's a really interesting set of papers by Robin Carver-Harris and Carl Christen looking at sort of a free energy or a Bayesian approach to like psychedelic therapy, like how do our beliefs change and how does, you know, tickling these serotonin receptors change our beliefs. And so I think that would be a really interesting place to try to apply something like this if you could get like meaningful keys and cues, right? Like, you know, if you're looking at something like, you know, post-traumatic stress disorder, right? You know, you're changing your beliefs, not just about like the world around you, but also like your own interoception, you know, is the capacity to, you know, heal from trauma or something. Does that require a higher order, synergistic, you know, change in sort of the joint states of your beliefs about the world and yourself that is sort of not figuratively reducible to like just tweaking one or tweaking the other, right? And so, you know, getting into sort of more of this sort of cognitive kind of stuff, I think would be really interesting. I also, you know, one thing I didn't mention when you're talking about future stuff is one that I've been working on quite recently is, you know, that the PID and the PED, you know, it's all gamed out currently in terms of entropies and mutual information, but as long as you have a function, a general function that satisfies certain axioms, you know, non-negativity, you know, monotonicity, yada, yada, yada, yada, yada, you can also get a lattice and you can also do like the Mobius inversion and get the decomposition. So I've been working a lot on, you know, taking, you know, what I'm calling sort of like structure decomposition. Like if you have a network, you know, and you have a measure like the communicability, which does satisfy all of the sort of the desiderata to induce the information decomposition, you could do a communicability decomposition, right? Looking at like the influence of edges or nodes to this measure the communicability instead of the information, you know, or you could look at the shortest path efficiency. This one also satisfies all of those requirements. And so, you know, can we take these logical mathematical structures that were developed in the context of information and entropy and apply them to other things? You know, I'm looking, I'm a networks guy by training. So I'm looking at network measures now, but you know, could you, how far could you push this, right? Could you start looking at maybe psychological constructs? Could you start looking at, you know, dynamical properties like the, I don't know, the, having a stroke, I'm drawing a blank on any good dynamical properties, but you get the sort of the idea of what I'm, you know, what I'm trying to say, like how far can we push this logical structure to look at higher order interactions in other contexts that are information theoretic? Yes, that just makes me feel like networks give a discretized and topological handle on families of continuous phenomena or the phenomena could be discrete at the node level, but we could just be talking about a continuous statistical distribution. So that at the node level gives us continuity and connection with empirical data and statistics. And then the math picks up with these discretized lattices and kind of like studying a tree is like studying a tree. And then you pull back and you have a model of the forest and then what could you know by studying two trees? Well, they could be giving redundant information. They could be giving synergistic information about what? And then it's kind of like off one goes on that inquiry. Yeah, huge potential there, I think, for a lot of really cool sort of new ways of thinking about complex systems. Awesome. If you have any last comments, feel free to go for it. Otherwise, this has been epic and very informative. You know, all I'll say is, you know, you can find me, you can find me on Twitter, you know, you can, you know, you can find me on my email. You know, I'm always open. I don't know how many people are on the YouTube right now, but I was good. I'm always open to like collaboration, talking to people, you know, like I am, you know, if any of this is interesting to you, shoot me an email, please. And I would be happy to talk, you know, about any and all of this, like my inbox is open. So I look forward to hearing from folks. Thank you for that. And thank you again for joining. So till 10.2. See you. Bye.