 Hello and welcome everyone. This is the Active Inference Livestream. It is Act Inf Livestream 8.2 and it is November 17th, 2020. So welcome everyone, listeners and participants. Welcome to Team COM everyone. We are an experiment in online team communication, learning and practice related to Active Inference. You can find us at our website, our Twitter, our email on our public Keybase team or on YouTube. This is a recorded and an archived Livestream so please do provide us with feedback so that we can improve on our work. All backgrounds and perspectives are welcome here in learning about these questions. And as far as video etiquette for Livestreams goes, just remember to mute if there's noise in the background and raise your hand so we can hear from everyone on the stack, use respectful speech behavior, etc. Today in exciting Act Inf Stream 8.2, we are going to have introductions and warm-ups and then we will at whatever rate makes sense, welcome Shannon. We will walk through the sections of 8.2. As with 8.0 and 8.1, we are going to be discussing the paper scaling Active Inference by chance at all 2019 and Alec, thanks so much for coming on to the show today. We're going to talk about the goals and the roadmap of the paper, hopefully from the author's perspective. We'll then have a little pause just for an open Q&A because there's a lot of different directions. This paper can take one. So we'll just pause there. And then we'll continue through the figures, just two of them, and just address a few different domains of follow-up questions about the paper. And then if we want to, slash, still have time, we can go through some of the notation and math again. For the rest of 2020, we are going to be moving into papers 9, 10, and 11. So check it out on Twitter. Paper 9 is going to be about consciousness. Paper 10 is about scripts and about social interactions and paper 11 is about modeling under the sophisticated, effective inference framework. For the intros and warm-ups, if everyone could just introduce their name and their location, whatever else they'd like to state, especially our first-time guests, and then just pass to somebody who has not spoken. So I'll start. I'm Daniel. I'm in California, and I will pass it to Alec. Hey, everyone. Yeah, I'm Alec Chance. I'm based in Brighton. Finally, I've reached these students. Thanks for having me on the street. We will go to Sasha. Hi. My name is Sasha. I am also a graduate student, and I'm based out of Davis, California. I will pass it to Stephen. Stephen, you're unmuted, but it doesn't have any sound. Maybe just reload and temporarily let's just go to Shannon. Hi, guys. I'm Shannon. I'm based at the University of California in Merced, but currently in South Dakota. I'll pass it to Blue. Hi, I'm Blue Knight. I am an independent research consultant, and I am based out of New Mexico. I will pass it to Alex. Hi, my name is Alex. I'm in Moscow, Russia, and I'm a researcher in Systems Management School, and I pass it to Ivan. Hi, my name is Ivan. I'm in Russia, Moscow, and I pass it to Stephen. Stephen, are you here? I'm going to try again. Does that work? Yes. Hi, I'm Stephen. I'm based in Toronto. I'm actually studying a practice-based PhD through Canterbury Christ Church University in the UK, and I'm pleased to be here. Hi, I'm Mel Andrews, and I'm in Cincinnati doing a PhD. Cool. A lot of different stages and areas, so it will be a great discussion to bring to the technical side with this paper. For the warm-up questions, people can just raise their hand as they'd like to speak, and I will just put up the first two questions. They are, for today's discussion, what is something that you are excited about? And then the second question is, what is a question you're wondering about? So while people are thinking about it or raising their hand, one question I was wondering about was how do we trade off between Explore and Exploit? In the two figures, we saw examples of Explore or Exploit, but what is required to enact something that is able to mediate between the two of them or find a compromise between those two strategies? Anyone have any thoughts or want to raise their hand? Stephen? Yeah, I'm also excited about the Explore and the Exploit side as well, to be honest, because I think that opens up a big shift in the way that we socially interact, in the way that we organize ourselves. So trying to look at a real foundational way to sort of approach that is going to be quite helpful, I think, in a lot of fields of practice. Alec, and then anyone else raises their hand? Yeah, am I muted or muted? No, you sound good. I just wanted to comment on your comment, because I'm also very interested in that, but I guess I'll go on the report saying that I'm not convinced that Active Inference provides, sometimes it's said to provide a solution for the Explore or Exploit that I don't think it does. I think it recasts it, so now it's not a question of balancing exploration and exploitation, but they're both assumed into this different objective function, but in practice, there isn't a sort of magic balancing app that you get out of this. So I think it's a good way to recast the exploration, exploitation, but certainly not a solution to the problem. Awesome. Blue? So I kind of have like a tangential question, and thank you Alec for your comments, because it just plays right into, I think what's missing, it's very clear that if the environment is homogenous, there's no real need for exploration, because the agent is not going to gain any, you know, any additional progress by exploring the environment, but I just wonder where autonomy factors in, right, like so, you know, some agents have more or less control over their environment, and there's no like real room for this or not that I've seen anyway in the Active Inference framework, like where the differing levels of autonomy of the agent play into the need to explore versus exploit. Cool. Good question, Steven. Yeah, I think that that point about how to look at the agent starts to bring in the inactive component a lot more, I think, and the generative model that's kind of sitting inside the person and existential kind of sensemaking. So I think that's that also, yeah, it's like pushing this back from a real world perspective and saying, okay, maybe we need to work back the other way from complex, multi sensory experiences towards active inference as well as using active inference to do modeling of maybe kind of very clean, sort of low dimensional approaches, it's like, should we also be coming back from the other direction. And I did, I've got an activity called conceptual action sociometry and activity where people move around space to understand how their orientation is during a moment of encounter. And I've asked people to unpack it in terms of the three different areas of active inference, the sort of pragmatic gain, epistemic foraging or risk mitigation or salience. And it's actually really interesting when they break it down that way and start to think about what was I trying to do, but from that kind of high dimensional kind of synthesis of being in a really immersive moment. And they actually people seem to who don't know anything about active inference actually were quite intuitively seeing quite a lot of shifts in those three different areas of focus depending on the type of event or moment that they were reflecting on, because they were going back to an event and thinking about the focus of their attention at that particular moment in time and how much were they sort of bringing these three areas in. And then I suppose between that you could infer something about the exploration sort of desire to learn or the exploitation game. But I agree, I think it does depend on the moment that's in question. So there's a few things that kind of this is sort of tapping into which I'm really interested in. Cool. And to tie that just together with a recast idea from Alec, it's kind of like nature and nurture explore exploit. People don't know about these binaries they're not natural kinds until they're taught them. And then the resolution to these kinds of dichotomies or binaries false or not is not some parameter that varies from zero to one that trades off. For example, when Evelyn Fox Keller was moving beyond nature and nurture, it's the mirage of a space between it's about reconsidering the situation so that there doesn't have to be something like a bipartisan. So that's definitely the direction that we will try to take it because it's a really cool idea. The second or I guess the last warm up question is an honor related note. So what would be an interesting control system or any system to model with active inference. So we're going to go into like a trolley car and a hopper, I guess in a pendulum. And I think it'll be fun to hear about those but we heard about this the social and the spatial. There's questions that we've raised about biology, communication, anything else that somebody wants to bring up or whether it's yeah, Mel, please. Yeah, um, so I'm, what I'm excited about I'm really excited about this kind of work as I'm a, I'm a philosopher but with this sort of stuff the active inference FEP stuff I'm like please stop theorizing just like let's play with this. So I'm really excited about this kind of work. And the question that's motivating me is really sort of figuring out what the boundaries of these various models are so what boundaries of active inference are relative to the free energy principle and then relative to e.g. predictive coding models and that sort of thing. Someone brought up recently. I guess it's, there's a, there's a biologist at my university. This thing really kind of groundbreaking work in, in like perception cognition, perceptual cognition and stuff in, I guess, butterflies mostly jumping spiders so to say. So the, the Morehouse lab, Nate Morehouse is doing, for example, so to say have like very elaborate like mating dances and things and they're signaling to each other. And they're doing stuff like, like the evolution of color perception in, in these jumping spiders. It's really fascinating work. And they mentioned recently, or Nate mentioned recently that the lab is doing some like simulation work on, on how we evolve additional color per se. So, so you can write, you can, you can, if you have certain, like genes and proteins for, for receiving certain colors, you can, you can kind of like shift the probably distribution of the range of color that those pick up, right. That's something that, that evolution can kind of modulate, but the evolution of an entirely different, an entirely new color percept is, is something different. And so they're doing some, some like in silica work on that. And I, I thought, yeah, that's what, that's what active inference is exactly what activists should be doing. That's what we should be doing. Great. So I think that's what we should be striving for. Thanks for that very interesting summary, Sasha, and then anyone else. Thank you, Mel, for bringing up the jumping spiders. They have incredible eyes. You should all look it up. But I, I'm interested in this, from a developmental perspective, and what active inference can contribute to how systems develop and communicate and specifically humans, how our in utero environment and the activity of that process shapes the development of the human and kind of on a more broad scale, how this can play into teaching and how active inference can be applied in the classroom at different educational levels, but in a way that sets up the environment such that agents are encouraged to explore instead of being dragged through the material as students often are. So that's, that's my curiosity in this time. Cool. Yes, a lot of God ski fans slash colleagues through time on this discussion. Any other comments, but really great points there about like the evolution and the development of perception. And then at the level that we can at least experience or enact agency, can we choose to see a color is different. Can you look at that ambiguous pattern and choose to see it one way or another. Can we understand many of these perceptive questions as being related to maybe not simply explore exploit, maybe having an element of multi scale agency. I think there's a lot of ways that that could play out in people's life, while also recognizing that we're not talking about just metaphors here or narratives. We're also talking about this specific framework with the paper that we're going to discuss now. So let's talk about those details but then maybe you know we can dip back to the other systems but let's think about how this paper and also our special time with Alec will help us understand these broader questions at least where we want to go after this short discussion. The paper scaling active inference. We read the goal last time. Alec, maybe if you could just however you want to or however long just what what were you setting out to do or how did the collaboration come to be what were you asking before. How'd you get to this paper in this goal. So, I guess, historically the sort of historic lead up to the paper was just very interested in reinforcement learning machine learning and also very interested in active inference. So those sort of a natural overlap but in terms of the motivation, I think the two communities don't really speak to each other as much as they should. And so much of the work that's happening in reinforcement learning, especially model based reinforcement learning has direct analogies to the work that's being developed in the active inference community. So stuff like the use of variational methods, variational auto encoders, the use of dynamic models, the use of trajectory based planning or just planning in general. The two programs have a lot of similarities between the two. The use of intrinsic objectives, these information game terms is widespread and both the use of belief based planning. So all the things that we sort of touched on in the paper. So, in my mind when I started out that the aim was really just to construct an agent using the machinery that would be common and relatable to the machine learning crowd. But that was consistent with the active inference very much. And I guess a secondary goal was also to put together an initial attempt to look at whether what needs to be put in to get some of these ideas to scale. So for particular focus on, you know, this wasn't a biologically plausible suggestion, but looking at some of the objective functions, the expected free energy at scale. I think it's of interest rather than where it's commonly looked at in these small discrete TMAs. It's really helpful in a lot of terms that we've heard, but to link together the ideas of the variational inference with the belief and the trajectory based planning. There's are all really cool things. I just have a question about the at scale part. So one aspect was that you introduced the continuous state space, as opposed to the discrete. But is there anything else that's meant by at scale or what does it mean to scale in this context? Well, I mean, so I guess the scale just means is almost defined in relation to what is currently common in the active inference literature, which is where your entire world is often designed by four or five states. So this paper presents a framework that can be applied to observations of 10,000, 100,000 and see a state space or the observation space is much larger. So that's what I meant by scale is putting forward a framework that could be applied to high dimensional tasks and the ones that we can see. And there's also a second thing which is not really captured in the word scale, but also the complexity of the dynamics. So some of these tasks actually have quite a small state space, four or five, but the complexity of dynamics requires large models of a number of parameters to them. So there's kind of a scaling aspect and a scaling in the space of complexity of the dynamics. Very interesting stuff. And at any point people can just raise their hand, but I think we can keep on walking through these areas. And last time we did walk through the different areas in the roadmap, just how we introduce active inference and the current state, and then walk through the specifics of the model, show some proof of concept experiments related to previous work, and then discuss and conclude with some of these sentiments related to what Alec is saying. And I also found it interesting there's the scaling dimensionality, then there's the introduction of continuous state spaces, not just discrete. And then often as machine learning people might use the word scaling to mean it's about the number of observations the size of the data set. And that relates to like the compute scaling relationship as you add, when you double the data set input, is it the same amount of training time? Is it twice as long? Is it four times as long? Is it 4,000 times as long? And so really interesting because there's also the scaling in terms of the understanding in the world and people talk about scaling and innovation and entrepreneurship and how to use scale a solution. So there's a lot of parallel meanings and a fun and concise title because you know that the finding isn't going to be in the title. There isn't just a simple way that says, oh, this protein does this in this species. We're talking about developments in a modeling framework, but we also want to be specific about what those advances are and how they relate to systems. So maybe just on the roadmap Alec or anyone else, just anything to say about it or why was it arranged this way? Personally, I was just kind of wondering why the previous work is at the end. Is that a difference in our academic conventions or is that just because it was more results oriented? Yeah, I mean, you don't get a lot of leeway in machine learning. I mean, previous work can either go after the introduction or at the end. But it's always an introduction, some methods of presenting a model normally quite short in terms of the experiments in previous work section. It's a very short discussion that's kind of standing across the book. But the reason I think I chose previous work at the end was just because I discussed active inference, which naturally leads in from the introduction and that introduces a little bit of maths, which then naturally flows into the model. I feel like a previous work might have broken up that flow a little bit, but it was more or less arbitrary. Cool, makes sense. I think we can walk through the next sections. So right now we're in the just open pause for questions. I have a few written down just as far as just big questions because we're trying to be a bridge in this discussion and through our conversation between those who might be hearing about active inference for the first time, but they're coming from machine learning, thinking about how cool methods perhaps with philosophical implications might be utilized. And there's also a lot of people who are in the active inference side of things or the inactivism side or the philosophy side where the details of the machine learning might be their first time hearing about it. So how will we just come from where we are and think about what questions and what kinds of fun things to talk about will make that bridge. Steven first and then anyone who wants to. I'm just curious because I know that the scaling question has been maybe looked at more from a philosophical perspective in the kind of active inference. So what this paper really helped me do or helped me was it like, broke that fear of questioning the kind of these models because the people like yourself and Anil and that are saying, look, there's a challenge here in the scaling like this stuff is being kind of used at these low dimensional spaces. And basically it showed like a gap in the literature in terms of how to action that which kind of helped me because it sort of gave me a bit of permission to say, OK, it's OK, there's a gap there and it's not that I just don't understand everything. It's just not easy to scale. And I think this paper was the first one to explicitly say that because all the other papers tend to talk about. Yes, we know how to scale but in a philosophical way. But this is like, OK, well, this is the problem from a practice perspective. Cool. Just one note on that if anyone else wants to raise their hand. I really love this idea where there's the gap in the literature and there's there's a billion gaps on literature. It's about which ones are salient and fundable and relevant. So there's many gaps in literature. And then when we find that gap, which is often very niche, like in a PhD, and I thought, wow, this data set has been collected from this ant species but not this other ant species. There's a gap in literature. We don't have this data set on this ant species. And then seeing that as a opportunity and something that's OK, even if it's not your traditional field of study, it's OK because you're at the edge, like at the gap with us talking to the author and talking through the equations. But this is what it looks like to look across the gap and try to connect because somebody else has identified it as a research question. Or maybe you have Alec. Yeah, I just wanted to comment on that as I entirely agree with what you're saying to come to maybe the defense of existing literature on to inference. I think it's viewed a little bit or this maybe it's just me interpreting what they're saying. But that maybe the problems that they're testing their agents in, for instance, something like the teammates are actually viewed as quite complex in a different dimension. So a lot of these reinforcement learning tasks, stuff like Atari, most of Atari, most of the stuff where you're already pushing to get state of the art results with huge amount of compute and deep mind of computing. You can often get by with just this model free reinforcement learning that doesn't require any notion of beliefs or any notion of proper epistemic. So you can get away with under expression. So what I think you have to reference community is try to do is look at simple tasks such as a teammate is that actually, well, you can solve them without beliefs, but a far, far more amenable to being sold with a belief based scheme and with this directed expiration and uncertainty reduction to show the kind of complexity that framework can come up. But then you could also respond by saying that this is also thought about a lot in reinforcement learning and there is the question of see if whether the kind of process phase they put forward will apply when you've got when you've really got this scale that Yeah, let me add on to that because it's a really great point. So a lot of times what's perceived as cutting edge or modern or advanced research in machine learning. I mean, go check the lay media, and it's going to be about this many graphical processors or this size of data set or this accuracy or this level of skill using massive data. And it's very much on the performance end and on the eking out a little bit more performance from increasingly large data sets with increasingly large types of models. But it's actually in some sense a local exploration. It's locally exploring certain frameworks and ways of doing machine learning, which is just computer statistics. And so what this is like with this paper is a return to simplicity. And it's a return to a slightly different way of conceptualizing some of the parameters and how they're related. And the big data just train it bigger train it better is kind of like thinking that we can do X without a belief we can train on go just by watching go and we can train the laws of physics just by watching physics we can learn language by just watching regularities and human language. These approaches, there's merit to them. This isn't just about one way being better. It just that what is being done in this paper is to take the free energy principle and active inference, which previously had made these kinds of philosophically, at the very least, tantalizing claims, like the relevance of a belief guided trajectory based optimization and search, taking these very fascinating ideas a little bit out of the sandbox into the next level of the playground, where now we can actually start to compete, or at least compare and contrast directly with the kinds of benchmarking algorithms that are used. So maybe one day, you know, active inference go active inference chess. But today, we have the simple control theory parameters, which is one level closer to these kinds of use cases that are happening today, one level closer than the teammates or the the three state decision that we did a mouse or a hawk or a cat, that type of stuff. Cool. Just the questions that I put up just thinking about how can we continue to deepen our understanding which will move on from but just like what's something you wondered about just always stick with that and anyone can raise your hands well, what's something you learned about while studying paper, whether it was something that they wrote specifically or whether it was something that you kind of went down a rabbit hole and started studying about definitely that happened for me. And then lastly, like what's something you're motivated to do more or learn about now that you've read this paper and had this discussion. Yeah, Mel, thanks, and then anyone else. I guess I just wanted to jump on that. And so that I'm, I'm used to seeing people sort of try to compare reinforcement learning with with accidentally sort of related approaches. And it's nice to see something more that's more of a synthesis there. Yes. So I think that if there's any other thoughts, feel free, but we'll look again over the experiments, and then we'll sort of work kind of in and out about broader topics but let's try to understand what was really done because this also is the point of contact with people who might be very familiar with machine learning and optimization control theory. But for them, it might be the first time hearing active inference. So where's the common train stop where we're on board with the machine learning community, they're interested in these kinds of tasks, they frame it as explore and as we've been talking about maybe there's like a bit of a re imagination or re conceptualization of this relationship, explore and exploit or what are these variables, that's what we want to get to at the end. So let's for sure remember that. But for now, let's think about being in common with benchmarking different data sets and different machine learning algorithms. So maybe we could talk about explore and exploit how these communities think about these and maybe your observations about how they're different. Alec, in terms of first the example that you chose to highlight as explore. So maybe tell us about the mountain car and then when you tell me I'll go to the figure one, but like what is this what does it have to do with exploring. What does the machine learning community think about it how does active inference apply like what is happening here. Yeah, so mountain car is quite an interesting one. It seems so simple as to when you do it in a fully observed environment, there's two states one action. This is the continuous version so it's continuous actions. So, you know, it's like the simplest task that machine learning people present new papers, but it's actually one of the hardest so you know DQN vanilla DQN doesn't stand the chance of solving this. What is DQN. Sorry. Deep Q networks. So that's the taking might be old. The old idea of Q learning, which is kind of was the biggest idea of reinforcement learning for this whole deep phase. I'm just a man. It's the one that the Atari. The first like deep mind nature, but I can't remember the details, but it was kind of the birth of the deep reinforcement learning. Great. So that one has challenges on here and then just continue. I mean, basically the reason is because you only get a reward when you reach that flag up at the top. And before that, you're, you know, that takes about 180 steps or 180 actions that you have to string together before you get any notion of reward. So just pure reward base beams really struggle with this. They essentially have to do more or less random actions until you get out of the hill. So yeah, it's a struggle. And that's why it's kind of exemplifies exploration because you need to care about learning about the dynamics or some other intrinsic quantity. If you want a chance at finding out how to solve this problem. Cool. Very interesting blue. This is just really reminding me of a lot of what we've talked about with explore, exploit a lot of the perspectives you've brought about how exploring how and when and where does it matter. So pretty interesting to hear Alec just how you phrase that. And also just like to hear that it's 180 sequences of actions. So this is not like control theory like connect for this is like walking, you know, who knows how many degrees of freedom. There are, but this is many tasks that have to get strung together in long sequences. And then you said something about how it has to learn something intrinsic. For example, the relationship between its velocity position and policy a little bit more nuanced nexus of action rather than just learning like simply whether you know go means fast and stop means slow. Steven and then anyone else. Can I just ask a question about that when you say stringing together actions because I kind of had this feeling that it was kind of a case of like you keep it that by by there being a desire to explore in itself through free energy that the car realized that it could go up the other side of the hill further because it could just have a reason for going there because of trying to minimize the desire to free energy around exploration. And in doing so it gets that extra kind of momentum to get up the other side of the hill. I haven't really gone into a detail so just wondering how this kind of sequence of steps and maybe just finding yourself far enough far enough up the other hill to get enough potential energy just to go down and make it to the flag under how those two sort of relate. Great question, Alec, maybe if you have a thought on that. Yeah, so I think your intuition is mostly correct. So how it plays out is that this agent is evaluating different sequences of actions. Some of those actions are going to keep it where it's already been, which is at the bottom of this hill. And then some of the actions are going to take it into a place where it's uncertain about the outcome. And that uncertainty is essentially valued by the agent. It wants to resolve it. So it's going to say, OK, what happens if I do a little... Because it has to go left up the hill. I should have mentioned that before. It has to go left up the hill. It has to actually go away from the flag to gain momentum to go around. So it's never gone left up the hill and then accelerated. It doesn't know what's going to happen. It does that. That's valuable to it in terms of epistemics. And then it realizes, oh, wow, left the hill. Yeah, let me link that to the question of the sequence of actions. So imagine if you were going to do a control theory optimization on shooting a bow and arrow. So shooting a bow is a complex motor movement, uses probably many joints. There's a lot of... You could do different speeds, different ratios. It's going to be dependent on the bow, just like this is going to be dependent on the slope and the car. So it's an enacted affordance that you're trying to develop an action sequence for. And your action sequence in a game is often just one stepper. Drop the Connect 4 token here. I'll reevaluate after I see what they do. But if you're going to be evaluating prospectively an action sequence in the world, it often has depth. So it's like getting out of my chair, shooting the bow. There's... state spaces are very... There's multi-joint. So it's multi-dimensional and it's continuous. So that's why the multi-dimensional and the continuous are so important. And then the depth through time is so important. Because you can't just be doing a get out of my chair or a shoot the bow and arrow short term one step optimization. There's no opponent for you to then take a look at their move like in a chess checkers go paradigm or even a video game paradigm to some extent. This is something about planning in an enormous state space and thinking really constructively about these intrinsic relationships. So in the bow example, you might be attuning to the tension between the proprioception in your shoulder and how tense the bow is to know if you're at the end of your range of movement or the bow is and if you train on one that was half size and then you go to a larger one or if your arm is hurting that day or all these differences, differences in our abilities. These things all become enacted in the relationships that we're learning about the intrinsic variables. So in the depth through time and intrinsic variables, if you learn that hey, if I actually reverse and start accelerating downhill a hundred times steps later, it's just better. Like if I go to sleep at this time a hundred times steps later, it's just better. And so these allow for very nonlinear policies to be selected because there can be temporal depth that's learnt as a function of the actual physics basically of the setting not just like counterfactual, you know, if 13 moves down the row, this person does this with my rook. That still is in a very if then context and this is taking it into a totally different domain with a proactive action selection policies. Yep. So that's why this is such an interesting paper and approach and like again, a branching off point. That's why it's foundational in the machine learning because just like Alec described, like if deep Q learning cannot accomplish this task, then yeah, it's all great to be a human at chess or go. But if that algorithm or even that architecture can't defeat this challenge, then we're developing something that's incredibly specific, which is great. We should have good map routing algorithms and things like that. But we're definitely going hyper local down a rabbit hole if it can't solve this, but it can solve go. I don't know if it is that case. A machine learning person would be really welcome to help fill us in on some of these details, but that's the kind of stuff that is interesting and broached in this topic. So just to see what that looks like empirically, like the first hundred epochs are plotted in terms of the state space coverage, which here is just the position on the x axis and then the velocity on the y and the flag is at point five, right Alec? Like the flag represents being at position of point five, but yeah, continue from here. Just anything you want to add on figure one or what does this mean? What is happening here? I guess no, your description is great. The flag is not point five with flags. What is it? Oh, yeah, point five. Yeah. But I just kind of want to self deprecate myself and say these results kind of they're not, I mean, they're not bad results, but we had a follow up paper where we got this all working properly called reinforcement learning for active inference, which is very similar, but we use a slightly, I can discuss the differences. I think they're interesting if we want, but just that the results are so much more impressive. It's kind of both hitting on this and also advertising that work. So in that work, oh, yeah, if you can actually show the figure and that we can check. So there it can solve the mountain car in a single trial. And I have also the sort of state space spots that look a bit better. So here's the single one. So if you kind of go, I just the first time it goes into the ring, it just solves it straight away, which is much nicer. So the other one in the paper that we're looking at today, so the here is being solved every time that it does it. And it just wants to get that way. And we try some harder tasks in this, like the half cheetah and ant maze coverage. Got to love it. Yeah, I guess that was just a caveat that I wanted to highlight that it can do a lot better than this. But I'll be happy to discuss the differences because it's just one part of the architecture. Sure. Yeah, what takes us from here to what we just sort of peeked into in this paper? Like what was the one thing that you added or changed? Yeah. So when I wrote the original one to, so to get this kind of what Carl calls parameter exploration or parameter information again, it's essentially trying to reduce uncertainty, not about what's out there in the world, but about your own model. Yeah, beliefs about your model. And it's kind of in a sense is, you know what you don't know. And to get that you have to have a distribution over your model. And in this case, our model is in your network to your distribution over your network. There's two ways in the literature to do this in the first paper scaling up to reference. We tried Bayesian neural networks. There may be more principles, you know, you're actually using variation inference to estimate this distribution over each of your parameters. But in practice, they don't work nearly as well as what we tried in the next paper, which is called the ensemble approach. And the idea there is you just take in this case 25 dynamics models and train them on different batches of the data. And that's kind of like a proxy for non parametric Bayesian posterior over the dynamics model. You can do a little people are kind of working out how close it is to actually proper Bayesian beliefs, but we definitely get a notion of uncertainty there. So here, and there's lots of interesting reasons as to why principle reasons as to why ensemble models would work better than Bayesian models. It's also easier in practice to estimate stuff like information game. So that just turned out to be 100 times better. And in the machine learning community, people generally find lots of success, more success with deep ensembles for both uncertainty calibration and directed exploration. That's very cool. Thanks for the great explanation there. And so just to sort of rehash that or say it a little differently, the approach that was taken in this scaling active inference paper to estimating some of these essential parameters was done with a Bayesian neural network, which Alec just described. It's like one way to do it. It might be a way that interfaces pretty cleanly with a lot of other software packages or approaches, but it's the skeleton of the model. And that's the diagram. And then we think about how can we improve it. And so one direction is more analytical or more principled as you described it. And that's like the fully Bayesian sort of specifying every little hidden state. And that approach can provide some interesting avenues at times. But an approach that is also relatively easy to implement from a programmatic interface level and also makes a relative minimum of assumptions about the specific mechanics of the system is this ensemble or deep ensemble approach. So the deep adjective is just meaning you're going to have like a multi scale or I mean a multi level neural network with hidden layers or it's just going to be deep and modern. But it's kind of like just an adjective. The ensemble approach is describing having a bunch of different models that are going to be trained up on the same or different parts of the data set. And then you could look at the average of the ensemble or you could look at some other weighted combination of the ensemble's performance. And so it's almost like bridging the gap from individual testing. That's like the single model testing to now the ensemble approaches like we're going to have a classroom and then the best answer from our nine group all working on it. Independently the best one will push forward or will average. And then even another level beyond that is like the truly emergence, which is actually what the answer doing where the ensemble works as an ensemble in a way that is itself shaped by development and learning and evolution. So right now the ensemble modeling is still like sort of well if you just split it up into many parts you can cover more state space. You might be able to train the model in parallel. You might make sure that no single model over generalizes. There's so many benefits that come simply from batching and ensemble modeling that go beyond just saying well what's the best single model or what's the best parameter range in this type of model. The ensemble can consist of a single type with a different parameters between the different ensemble mates or they can be heterogeneous in some aspect. So just really cool. You went with what was implementable and a common point of departure with machine learning community and the Bayesian neural network kind of peaks in one direction towards more analytical more principled fully Bayesian approach. And then on the other side to modern approaches in machine learning like deep ensemble learning. Cool. Let's look at figure two and what is conveyed here. So this is the hopper task will go blue first. Go ahead. Sorry I didn't see that. Can you back up to the last figure. Let's do blue and then math and then anyone else. So I just wanted to ask and this is kind of related to the next figure but I just wanted to ask that when you did the hopper task and the pendulum. So I know that that was only based on the extrinsic value like part of the free energy equation so you like you left off the exploration because like exploration probably had no value and in that problem I mean I'm assuming there. But I was wondering here like on the exploration problem started I say exploration before exploit anyway. So here with exploration problem. Did you ever think about leaving off the extrinsic part of the equation and just using like the information gain as a reward. Like can you structure it that way would it be different. Like if the object of the agent was just to gain more information about the environment do you think that it would have succeeded in you know climbing the hill because it would get to any part of the of the environment or how do you think that would work. Yeah, yeah, not in this paper but it does. So that other paper I mentioned the reinforcement and through active influence, because it solves it in the very first task it literally can't be because of reward. We tested it without reward as well to confirm this. But you know it hasn't ever experienced a reward and it still solves the task so it's exactly what you're specifying it's just purely trying to reach all the parts of the state space. So you just left it with exploration what you tend to find is that it will solve it for say the first 10 trials while it's still the peaks of the hill are the most kind of interesting state the most extreme dynamics with the most variance. But then after a while the top of the hill is no more interesting than the bottom of the hill because it's kind of spoiled a bit. You've got the rewards on the exploration kind of slowly transfers from solving it because of exploration to solving it because it knows how to get reward. Awesome. Thank you. I didn't see that follow up paper and I like I'm going to go read it right after this. I know we need to do it after party active after parties with next paper but Mel and then anyone else. Yeah just what you're describing with like string your bow something like this. I think that was what really got me excited about active infants and the free energy principle in the first place is the idea that with something like like towers of Hanoi style problem. Are people familiar with that? Yeah, it's like you've got you've got three sticks and and you've got discs of various size on the sticks you've got to get them in a in a like a sending order right. And the idea is that the puzzle requires that you you go backwards before you can go forward. And if you're just minimizing or maximizing some function on a single level. You've got a like a one D optimization. You're going to not be able to solve a problem that requires that you backtrack in order to make progress. And something like active infants the free energy principle where we've got temporal depth we've got hierarchical depth of the model is really equipped to solve these kinds of problems in a way that a lot of traditional learning approaches are not. I totally agree. And I think there's a quantitative and a qualitative so at the quantitative level there's control policies that we can't have computers look beyond some locally non favorable states to get around. So we want to do these quantitative policies that's what this paper is about. But at the qualitative and really the philosophical level. How do we come to grasp with processes where yeah we're not always strictly walking a staircase directly to the top of the mountain. Or it might not seem like we can do it at all initially if we just look and there's so much there with how we think about challenges and about exploring one actual link there because blue asked about what would happen if you just let it go wild on exploring. And Alec what you said is that the learning gets you to the top pretty quickly because in that sense it's similar to this model that also prioritizes reward. But then after getting to the top you spend a lot of your time learning on the most extremely variant areas of parameter space. So it's so much like curiosity driven learning where a lot of times curiosity driven learning with no reward or scaffolding. It ends up learning learning learning a ton shocking amounts but then also spending a lot of time in the most extreme ranges of space. And that doesn't always just play out like in the kind of online gutter but even in the literature we see a lot of the attention being spent on the extreme hyperboles. And then the middle ground where it's like yeah it's kind of balanced and we can work on it together that is not as extreme of a viewpoint and so people spend less learning and attentional regimes on these kinds of projects. So there's a lot of like parallels that we can quantitatively model but also help us think about how we can say no you're just trying to make me explore the top of the hill I get it but the flags on the other side. So I've heard about the other side of the hill how are we going to enact a policy to get us to the flag together. That's a little bit better than your hill is worse than my hill for example how could these kinds of things be ported onto like human decision making is a cool area. Any other questions on one otherwise let's talk about to a little bit. Well just just one quick piece just picking up on what blur said there I thought was quite interesting is if there's this idea of exploring the environment in different ways but say there's exploring or information of personal preference. So for instance, maybe it was a bit more of a calm I like to always feel what it's like to go slightly up a slope and turn to the right. That feels cool. Right. So then you have this kind of like it's not necessarily it's like an information gain about having fun. So would it like say the car was like hey I really like to do this type of things because cars like me like doing that. You know if it was a living animal. So it kind of ties into what sort of exploration of the world and then what exploration of just being an entity that likes to do certain things and that could end up taking you there in another in another way or together they help. Great Alec. Yeah, I just find it's quite on that because this is I totally agree and I think it's one of the most promising kind of directions or perspectives that I through instance gives. Never wrote about this in a in another paper that I can link afterwards of this notion of I don't really know what to call it but but goal directed exploration that you derive from something like expected fringe and that means you don't necessarily or it's just too inefficient just to have everlasting exploration with no constraints, you know, and it's also too inefficient just to do have exploitation. What you need is some objective function like expected for energy or model evidence that contains these two things so that you're not selecting actions to just explore or just to exploit that each action is kind of shaded with both of these things. And that way you're getting an exploration that's geared towards your goals and it greatly constrains the type of exploration and I think from an engine that you know that also fits with our kind of experience maybe of their life or maybe not but I think from an engineering perspective that that's crucial because in real world tasks there's just too much to explore and you need to prioritize that exploration. Yep, that's so beautiful. And even another layer is the ensemble of mountain cars. So now imagine, we all see different things and we don't know the way to get to the top of the mountain. We don't know the policy we don't even know what the endpoint is we don't know if the one that we can see close by is the best one or if there's a way better one way further away, and then everybody is who they are. And they all have their own landscapes, and then through the ensembles modeling collectively with or without information sharing of whatever kind the ensemble, as we're seeing, gets better performance. There isn't just one best policy of mountain car. There are so many different ways to ascend the mountain and then you open it up with what the objective functions and the policies the goals could be. And it really is a great space Shannon and then anyone else. Hi, thanks. I was just thinking about your ensemble of cars here and comparing this to like flocking behaviors and birds or even people who are foraging together. And maybe I was wondering how, so maybe this entire flock is minimizing its free energy by getting closer to reward, which is food, but any individual bird in the flock is just following local rules about like how close to fly to other agents or when to follow them or when to break off, and find a new like go away from the rest of the flock or drag the flock along with them. And I wonder if every single bird is having their own little mountain car model, or if there's just one gross model for the entire flock as a mountain car finding finding its food. Yep. Well, one angle on the forging and affordances is imagine that ensemble of human foragers, but they're not exactly the same size or they don't have exactly the same preference or they have slightly different vision. Some people see closer or further. They see color or they don't see color. And so all these differences are what allow the ensemble to explore and exploit, especially with information sharing. Oh, hey, I was over under this tree and I found this but somebody else wouldn't have found it. So we can all use the ensemble to take advantage of our, you know, differences of the nest mates or the flock or cognitive diversity. And then in the question that you raised about, is there a single layer being enacted by the flock, or is there a little mountain car as you said by each bird. So from a modelers perspective, if it turns out that we can explain variance about the birds trajectory by putting a linear aggression on it, it doesn't say it's linear aggression that the bird is doing just that it helped us explain variance in the world. So similarly, the low bar here is that we can use this kind of model, just like we could use another type of control theory model or action policy selection model to explain variance about the real world. Especially because we see organisms succeeding. In fact, it's really the only ones that we do see. This helps us explain those successful systems, as opposed to like funny little gifts of, you know, a robot flailing. And then whether there's like one layer that's being enacted and then the group is purely epiphenomenal. And there's no downward causation or there's no influence of the group states on the lower level states. There might be some systems like that. There might be other systems where there's an analytical solution like a well defined solution at the bird and the flock level. There might be another one where it's well defined at one level, but then because of interactions and emergence, it isn't as defined or isn't defined by the same class of model at a higher or more coarse grained level. Any. Great. Thanks. Cool. Let's look at figure two. So similarly, as you did for that mountain car, which was like such a helpful direction to take. What can you tell us about the hopper V2 or the inverted pendulum, which I don't have here, but those were the two tasks, maybe more on the hopper or the pendulum, whichever one just what are they about? What will machine learning people recognize or know these models as signifying? So hopper isn't the exploration basis quite a dense reward that you get. So you kind of get rewarded or disrewarded each reaction. But it's slightly more high dimensional. It says two dimensional there, but that's just the liquid. That's not when I speak say dimensions. I mean, how many observations it receives. It's more than two. It might be like 16. I'm not sure. And it's just generally the dynamics that you've got to learn, you know, you've got maybe four actions which are real value numbers between minus four and four. You've got to learn how that changes 16 variables over the course of your however long you're planning. So it's a much harder task to learn. So I guess that's why it was included in this, just that it's generally valid is a bit of a hard task in machine learning. Cool. And so what is happening in figure two? What can we say? How is it different than DDPG? And what actually is DDPG for reference? DDPG is of the semi-apnems and very close to the deterministic policy gradients. So what is that one doing? And then what is active inference doing that's different that maybe enables it to have such better performance? So what is DDPG doing? So it's just a model-free reinforcement learning program. So you have a policy that maps from states to actions. And you essentially do a lot of maps from something like the Bellman equation to get to an update for your policy parameters. The reason that it's doing so much better and the reason that model-based reinforcement learning in general does so much better is because it's essentially it's a planning algorithm. So what's the best way to describe this? Yeah, due to the fact that it's model-based, it can learn from every single bit of data that it receives. So that is in terms of each state transition and as well as the reward signals. Whereas this model-free reinforcement learning is just simply learning from that single bit of information that gets it each time step, which is the corresponding reward or 32 bits or however your rewards are presented. So I mean eventually DDPG after enough epochs would probably asymptote around or higher than model-based. That's a kind of general pattern you get with model-based and model-free reinforcement learning. Model-based is far more sample efficient. But model-free takes a hell of a lot more samples but is asymptotes gets, you know, levels out at a higher reward. Let me also add a layer there with a few other aspects from machine learning and the idea of the ruggedness of the landscape. So if you're doing this task, which was described as a dense task, another way of thinking about these is you're trying to keep something upright. So it's like very obvious if you're succeeding or failing and in the big landscape, there's really easy to tell differences between succeeding and failing. That being said, policy planning, especially when you have four variables of control that are projecting out to like 16 potentially nonlinear and connected outcomes. When you're trying to do that, there may be many strategic mappings. There might be many policies that help you keep the pendulum up. Like for example, going a little bit back and forth with a certain speed or you can go a little further with a different rhythm. So there might be many, many different ways, even for a single joint of control, many policy sequences through time, and there might be some learned relationships that help you stay in that yes-no area. Now, if you have a model-free reinforcement learner, it means it's learning basically the raw connection. It's model-free, so-called, between the reward and the policy. And so it may spend a lot of its time exploring locally a policy because it's like in the spotlight, it's working, and then it goes to a slightly different area, it's not working. Where do we go from outside of the spotlight? It's all dark. No idea. Model-free. In a deep generative model, it's not like this spotlight-in-or-out reinforcement. It's like we're learning these intrinsic relationships, which actually helps us get a grasp of the maps that are going to guide us to search on the territory a little bit more exhaustively. But then that part at the end, which was so interesting, that the model-free sometimes gets to higher final performance in some context. That can be because a truly model-free search can result in some wacky combination that wouldn't have necessarily been approached, that it just uniquely, potentially not in a resilient way, but uniquely allows some performance on a task. And so that reminds me of the evolutionary computation where its goal will be to travel distance, and then some will actually go down the walking road, and those start slow, but they can walk forever. And then other ones just fall. And so it's kind of like hacking to get over a barrier without necessarily a deeper understanding of the ecology because it's so blindly pursuing just performance. So there's a lot of stuff there, but this is like really an illustrative example. And it does highlight a lot of the differences between the model-free performance in the state of the art in that as well as what potentially active inference could bring. Any thoughts on two? Now we'll just have some general areas. Basically, the first area was implications. I know we've probably talked about some of them, but three areas that I thought about were like robotics, resources, and allocation, and then rugged landscapes. This one probably in the Southwest. So any thoughts on these? I think we've almost touched on a few related ideas, but if anyone wants to like speak to one of these areas of possible implication, whether they do or they don't see what an implication could be, or what would another domain of implication be? Yeah, Alec, go ahead. So I also am becoming increasingly interested in some other people in weather. So I'm very invested in this idea of kind of a Bayesian brain. And as active inference kind of as fast as... But whether some of these ideas from Bayesian machine learning that are used in this papers of stuff like amortization might also be employed in nervous systems. So, you know, we've had a few very, very big proposals for how the brain might implement Bayesian inference stuff like population coding, predictor coding, the process theory that's most commonly associated with active inference in this terms of message passing. And also this amortization is also another real possibility of how the brain implements some type of inference. So I think it could have implications for understanding nervous science. Wow. Did not expect that message passing and predictive processing, predictive coding type models can be understood alongside amortization models as an alternate implementation or mechanism of Bayesian brain as a specific testable hypothesis. So this is where the rubber hits the road with the modeling and with showing the deep mathematical isomorphisms between these different kinds of relationships. Like there might be some paper, I'm sure there is, where like message passing is equivalent to Bayesian networks. And then that allows us to bridge to big areas of literature. And so if we do a lot of investigation in Bayesian brain, and then we find out wait, there's actually multiple ways that could be implemented in different systems, whether that's through message passing, so maybe that's more applicable to a computer network. Maybe it's also going to be implemented by policy planning ensembles. So then, whoa, what else is doing this kind of Bayesian like processing. There's so many cool directions there great topic because also robotics or at least the question of implementation of action and selection of policy, and from sensors and with actuators resources in terms of informational attention or whatever they may be. And then rugged landscapes is just that's everything that is policy selection under uncertainty for any kind of system that wants to stay alive. Let's go. Oh, sorry, Sasha or blue. See you later blue. Wait, Sasha out blue, please. Thank you. But muted, but yeah, good. So I just I had the same reaction that you did like Bayesian brain amortization like wow, I was just wondering, Alec, if you could maybe just unpack that like in a very simple way or elaborate more on on that. Yeah, sure. So the best way. So I mean amortization as it's realized in stuff like various northern guys but also other schemes. We can like, maybe discuss a bit more together what defines amortization are just an instance, a way of learning a Gemini model and well more precisely doing influence the key defining features for me. I don't think this has ever been properly defined in the literature is this notion of having an encoder how that would play out in the brain. I could play out in the brain is just kind of a feed forward mapping a few forward part of say the visual cortex that maps from some data or a lower part in hierarchy to some posterior parameters higher up in the hierarchy. And when you have stuff like population coding or coding. There's not this notion of quickly in a few forward manner mapping from data to parameters you've got this kind of iterative procedure that slowly and sequentially updates the beliefs those posterior beliefs based on the data based on some learning. There's another route towards Bayesian inference it's far far more fast and doesn't require a current processing and that kind of speaks well with some things you know about the brain. It's not all the brain brain definitely does have a kind of processing but a lot can be done in the kind of few forward manner very quickly. And the second thing is that the parameters that you're optimizing of the encoder are optimized over the entire data set whereas in something like coding. It's optimized individually that the stereo parameters of your beliefs optimized based on the current data point. And I think that has some that might speak to some of the generalization capabilities of Bayesian inference. And that's something that we're looking at. So I hope that kind of answers your question. No, definitely thank you so much. Totally epic to connect that feed forward encoder model to the machine learning side. So instead of doing like a back and forth expectation maximization where you're just updating back and forth, or especially a like a back propagation type thing where there's very, very complex interactions of back and forth. There's very complex interactions about how parameters are trained. This allows us to kind of train on the fly and just learn as we go. Now, to the second point, which is the usage of the entire data set versus point by point. This is like we're learning on the fly, but we're learning on the fly in the state space that we want to be learning about with respect to the entire data set that we have access to, which is a little bit different than like you're kind of trailing your finger along a time series. And you're updating based upon that data point and what you believe and what you've recently seen. That's like a kind of point by point way to use a whole data set to update parameters versus this variational approach, which actually enables like almost a simultaneous utilization of the entire state space. I'm not sure if that's totally correct, but those are just two parts that hit me about the two sides that you mentioned, Alec. Steven and then anyone else. So I've not heard of this amortization, so it's quite interesting. It's almost like in reverse then is it is like you, you'd have a big complex space of knowing and you quickly discount stuff and the process is about discounting away rather than building up the model. Kind of the idea. I'm not sure if I understand, understood exactly what you're going to have but for my, I'm not sure if it is, but maybe we could get into that a bit more. Alec, how would you define amortized inference? Like we see it here in the paper, but how would you define it? You mentioned some benefits. We've talked about them a little bit, but like in this kind of new way of thinking about it, what does it mean? Or what would it mean for the brain to do it? What happened programmatically? I mean, so this is what I was getting at earlier. It doesn't really have a clear definition. Trying to find in two ways. One is how it's generally used. If you see amortized inference in machine learning, you're generally, you should think of a encoder network. So you've got a neural network that takes in your data and it will output literal values for your parameters that would normally, and those parameters are the ones that would normally be optimized in traditional variational methods. Here it's just, they're just spat out by the network. And then you have some other learning methods which updates that encoder for you and also your generative model. But in terms of what amortization actually means, going back to the words, I think there are some economics. I think it's this idea that you come into the second point that I made where you're sharing the parameters of that encoder across all of your data. So you're amortizing the cost of inference is how I think about it. I could be very wrong. So that just means you kind of don't reset with each new observation or you're not trying to optimize with respect to each new observation and then kind of discarding information. You're just sharing that encoder, the encoder which is the relationships in the data and the parameters across your entire lifetime. So you're amortizing the cost of inference is how I think about it. It's a very fun, very quick. So is it like, is it like, so as opposed to just discounting data, it's discounting ways of chunking up the data or ways of analyzing it, if that makes sense. So you could have lots of different models and they say, okay, I'm going to apply them in a more efficient way over time as you start to infer. It's almost inferring what models should be deployed. Is that the kind of idea and you start to take away the models which are less useful? In a way, I guess you could another kind of people sometimes refer to it as learning to infer. Which might be what you're getting at. So in normal inference, you're just doing inference, but here you're learning how to do inference. So the learning, yeah. So maybe that was what you're going out with your kind of model selection. Cool, thanks. Yeah, that makes sense. I don't know, but I'm trying to find my way through. There's a few dimensions. So it's really helpful to unpack it. There's the learning how to learn. There's that meta learning elements. But also this is really important in the paper is the number of parameters remains constant with respect to the size of the data. So if I'm trying to estimate the mean and the variance of a normal distribution, I know that I want to squeeze down whatever data set I have into two numbers. And so if I have three numbers coming in the pipeline, I still want to mean in a variance. If I want a billion numbers to come in the pipeline, I still I want mean variance. And so it turns out that by knowing and by specifying ahead of time that you know exactly the size of the outcome. It allows you to scale a lot better because you can know that no matter what you're going to put in on the inside, it's going to be able to be distilled down very rapidly. Whereas another method that results. Okay, well, we put in a data set and then we find out how many principal components explain it best. That might computationally scale in a way that is very disadvantageous with respect to the size of the data. Whereas this method can do that. And then as a sort of corollary there, it's happening through a single forward pass of a network which can be ab initio sort of like de novo setting it up from the beginning and or updated in a learning fashion. And so there's there's a few dimensions and out that really look forward to, you know, seeing what you continue to do with the amortized inference seeing as that concept becomes a bit more formalized, because this seems like really powerful. And then if it also turns out that it's an implementation of Bayesian statistics or Bayesian brain, then it just taps into the entire message passing compute graph Bayesian net framework which already has been the most helpful for machine learning and a lot of other areas to very, very cool. So, I guess, in the next steps, which is really I guess the last thing that we'll talk about for the last few minutes, just thought of a few areas to go into for next step so I like definitely you take the first pass but the areas could be like computationally. What does that mean hardware software from a math perspective, what could be analytically shown or what relationships would be good to know from an applications perspective, what robots need this kind of software update, and then just educationally how do we come to, you know, control our attention so that we understand these concepts as well. Or what did you start working on after this paper I guess. Oh yeah, no worries, no worries. I wasn't. I guess you did this one. Where are you? Steps are, I guess, on this line of thinking that this line of work, the stuff that I'm still thinking about is what was mentioned right at the beginning, which is the balance of exploitation and exploration and whether the active inference perspective buys you anything in practice. Because, you know, we've got this nice objective functional expected for energy. And we can optimize actions with respect to it, and it does contain exploration exploitation and it motivates exploration and exploitation from first phase and principles, but in practice, you're going to be in any model you're going to be fine tuning those the balance between those. You've got a bit more of leeway to do it because now you can change the shape of your prior beliefs, for instance, and change a few other parameters that are going to lead to that playoff rather than just having some scale or weight. So some of the stuff that I've been looking at is learning prior beliefs in order to facilitate exploration and exploitation. Because I think, to summarize that, yeah, the balance between exploration and exploitation is probably one of the big unsold questions in control, reinforcement learning, neuroscience, et cetera. And for making these machines work as well as we'd like them to. And active inference offers a new route to try and solve, but I don't think it has been solved. Awesome. That's really exciting. There's just so many aspects to the control theory question when there's a lot of solutions that could work. And maybe it's even unclear what it would look like to work, especially as we think about some of these systems that go beyond the merely just keep a pendulum standing. But what if it's give someone a massage? Well, that's a little bit of a different question. It's relational or how are you going to exercise in a way that's comfortable for you? These are control theory questions that are literally about action sequences. And so when we start thinking about how we're going to apply it to different systems, having a framework that is at least moving in this direction of what you said is like with the reconsidering the explorer exploit by adding things like reconsidering the shape of your priors. So how can you step into a framework where the explorer exploit is a dimension that could be calculated or described or kind of summarized but isn't simply the underlying framework of the model. We're going to get exploratory behavior. We're going to get so-called exploitative or narrowly searching behavior that is not being ruled out by what we're discussing. We're talking about another way that optimization could proceed that doesn't use simply the explorer versus exploit or the coefficient weighting of these two through time or statically to outline its whole learning approach. So Stephen first and then anyone else if they want to give any like last thoughts but this has been a great conversation. So Stephen and then anyone else. I mean, also if you go into the real world environment, you've got this explore exploit situation, but in high stakes situations you also got like they often talk about risk mitigation versus gain optimization. If you're in a kind of a conflict or, you know, a legal situation because it may be that it's a question of, you know, which is in a way, gain optimization is a bit like an exploit. It depends which way you're going. It could be an information gain, but whatever you're trying to do, those two aspects sort of come into play depending on the state of the situation. You know, so, you know, explore exploit is less relevant when you're standing on the edge of a cliff playing a game, you know, because it's like there's a risk mitigation issue. So anyway, I thought that might be interesting. I just thought about, you know, doing learning action policies, but if you're learning an action policy where you could die if you're rock climbing or something, it's going to change how you learn. And so when you're trying to minimize risk versus failure attempt or minimize the number of opportunities you have to see it successfully performed or you can only infer it by observing it. These things may contain a ton of information in terms of how real systems learn where failure is not an option. Mel, and then anyone else who wants to close it out. Yeah, sorry. This is, I guess, a sort of an ignorant question, but I thought some of the beauty of these actions, for instance, in FTP approaches is that they're sort of doing a kind of like multi-level dynamic, Occam's razor type thing. So they're maximizing predictability while also kind of minimizing the complexity of what they're doing. And so I guess what I don't quite understand is how the amortization of inference process works, but if you're fixing the params, doesn't that kind of constrain the ability of these approaches to do that kind of Occam's razor type thing? Not really, because the parameters of your encoder aren't the parameters that you're, they're not the free energy parameters, they're mapping to the free energy parameters. So you could map, you know, to, you know, free energy just says that you've got to maximize the accuracy or the likelihood while minimizing the complexity. So as long as your encoder maps to the part of belief states that is maximally accurate and also minimally complex, then you're still minimizing on complexity while maximizing accuracy. And then obviously there's, what we haven't discussed is the learning schemes for, so the way you learn your encoded parameters is so that they output something that does conform to minimal variation of energy. Yes. So you're not actually constraining, you're not actually placing like additional constraints on the, on the parameters of the actual. Not on the parameters of the encoder, on the output of the encoder. Yeah. I guess that's a good point. So the encode, I guess a problem with harmonization is that the encoder is kind of not part, you shouldn't think of it as part of your generative model. It's just almost like a tool that can map to the belief space of your generative model. Just on a closing note there that really returns us this dual instrumentalism with rappers and rappers and is it what the system is doing or is it how we're looking at it. These questions are very rich and so it's a great conversation. Really, thanks everyone for participating. This was such a helpful discussion and I think definitely while re listening, we'll all pick out some questions for ourselves to follow up on and some curious things to learn about because there were so many good ideas brought up. So if people who are on live, if they check their calendar event, they will see a feedback form which would be helpful and anyone else who's listening or watching, please provide us with feedback suggestions or questions. But other than that, just stay in touch and everyone awesome work for this helpful discussion and we'll see you soon.