 All right. Hello, everyone, and welcome to Active Lab Livestream Number 24.2. Today is June 29th, 2021, and we're looking forward to this follow-up or jump-off discussion with some of the authors as well as other lab participants on this awesome paper, An Empirical Evaluation of Active Inference in Multi-Arm Bandits. Welcome to the Active Inference Lab. We are a participatory online lab that is communicating, learning, and practicing applied active inference. You can find out more about us at the links on this slide. This is recorded in an archived livestream, so please provide us with feedback so that we can be improving on our work. All backgrounds and perspectives are welcome here, and we'll be following good video etiquette for livestreams. It's great that there's so many on the call, and I'm sure we'll have many questions and opportunities to raise our hand, as well as for those in the live chat to ask questions whenever they feel like it. We're closing out June with this .2 discussion on paper number 24. Last week, we also spoke with Sarah and Dmitri and others. That was very informative, and now we're in the .2. We're able to take it to a few different places and raise up a few questions that were asked over the last week. The goal is really just to learn and discuss this paper and related topics. We have a few cool related areas that we're going to be going into. We can just begin with an introduction round. Each person can say hello or check in, say anything that they're thinking about today, and then pass it to somebody who hasn't spoken yet. I'm Daniel. I'm a postdoc in California, and I'll pass it first to Blue. Hi, I'm Blue. I am an independent research consultant based out of New Mexico, and I will pass it to Ryan. Hi, I'm Ryan Smith. I'm an investigator at the Laureate Institute for Brain Research, and I'll ask Sarah. Hi, I'm Sarah. I'm a postdoc at the Technical University in Dresden, and I'm looking forward to the discussions today. And I pass on to Dmitri. Hi, everyone. I'm Dmitri Markovich, postdoc at Technical University of Dresden also, Chair of Neuroimaging. And I pass it to Dave. Anything topic, actually? No. Douglas, my background is basic psych and cybernetic learning. And I'll pass this to Steven, please. Hello, I'm Steven. I'm based in Toronto. I'm doing a practice-based PhD around social topographies and immersive experiences. And I think I'll pass it back to Daniel if I'm not mistaken. Yep, great. Thanks for the cool intro round. So, Sarah, prepare this slide, which we just want to start off with. So, go for it, Sarah. How does this set the intention and the mood for what we're going to be going into today? Wait, how can I make it so that I see it big again? No, okay. So, after the discussion last week and also the questions we've got, we thought it would be a good idea to just make a short overview slide about what we understand as active inference. And as cognitive neuroscientists, you said in the discussion before also, we are often or mainly interested in this action-perception loop that an agent or we as living agents are in with our environment. And that's what you see on the right in this loop. And on the left side on the right, you see the environment that may have hidden states that are the states that you may know from Markov Decision Processes in Reinforcement Learning, but that may also be more abstract things that aren't observable to an agent like context, volatility parameters, anything that describes the environment. And these hidden states may also evolve with time. And then such an environment generates an observation that the agent then integrates together with its prior beliefs in order to infer the hidden states of the environment, what we call perception. Then you can use this information to plan ahead and select an action accordingly and in turn the action may change the environment the agent is in. And concretely, how the agent side of this works in active inference is so that you specify a probabilistic generative model that contains the hidden states, how you are a priori knowledge or your assumptions about how these hidden states relate to each other, which states may follow upon which states and so on. It contains observation generation rules. It contains actions or policies which in active inference are also treated as hidden variables that need to be inferred. And additionally, the generative model may contain its own parameters as a random variable so that they can be inferred and they're with learned. Then you specify probability distribution of approximate beliefs. In theory, you could also solve the beliefs of the generative model analytically but that becomes very computationally expensive really quickly or even analytically intractable. So what you do instead is formulate these approximate beliefs which are supposed to be of a simpler form. And for example, in your approximate beliefs, you could assume that all hidden variables are independent random variables, which makes the beliefs very easy to calculate, but may not properly capture the temporal dynamics, for example, of your environment. And this independence assumption is also called the mean field approximation, but you could also say no with some hidden variables, it's very important that they are allowed to co vary and be dependent on each other. And then you would have pair wise dependencies in these approximate beliefs which is then called the beta approximation. You may have different approximations for different variables, some may vary together, others are independent. Then you plug in the generative model and your approximate beliefs into the free energy, and then you find your approximate beliefs at the minimum of this free energy. And this way you can form beliefs about hidden states of the environment, which really can be anything, your location, volatility of the environment, the context you infer beliefs about policies, which is a probability distribution over actions from which you can then choose. And you can even form beliefs about parameters of the model, and there with update your knowledge of the model in each time step. And then in the end you can use all this inferred knowledge to make a piece. And we as cognitive neuroscientists are often interested in this whole loop, but what Dimitri showed last week is also that in principle this can be really modular, right? In our paper for example, perception was the same for all different agents, whereas Dimitri then plugged in different action selection algorithms, one corresponding to the expected free energy and active inference, and another well-known action selections algorithm too. So here you can really also play around with this framework and use its modular nature. You could on the other hand use the same action selection algorithm and compare different learning algorithms, which is something we're also often interested about in our work. And yeah, with that, that's it from me for now. It's very interesting and anyone can raise their hand, but I think the part that spoke to me there was just that it's a framework. And so it's kind of like the framework of a desktop computer. You can take out the memory or change out the hard drive. And so we can talk about changing the environmental dynamics with respect to the hidden states and how they evolve through time. We can talk about changing perception, but we can also lock in multiple of the components and then explore how different learning rules influence system behavior. That's what we explored in this multi-armed bandit paper, but it could be the same learning and policy selection rule, but then a different perception rule. And then those might map onto very subtle differences in decision making of agents that we care about. What's the difference between somebody who has blurry vision, but they know the language versus clear vision, but then they're semantically unsure about what the symbols are representing? Yes, Stephen. And then anyone else who raises their hand. Yeah, this is really interesting and helpful to see your thinking here. I'll just be curious how action states or, you know, the idea of different types of states alongside these hidden states and just how that you treat those in your thinking. Because they could be kind of the action states that are imagined and the action states which are part of the kind of external environment that you're somehow immersed in. So in the end, yeah, as actions are also treated as hidden variables, I think they are very separate from the hidden states, for example, because the space of action determines what an agent can choose and he cannot choose a state. In terms of other hidden variables, I think so, for example, if I think location versus context, then in my head the generative model becomes a hierarchical model instantly. And in the end, yeah, that's also what I meant when I said you plug in your a priori knowledge or your assumptions about the environment. Because different types of hidden states will have different relationships to which and then you really end up with a completely different generative model. Just to follow up there, Sarah, you said that when location and context are differentiated that it's a hierarchical model. So what does that look like on this layout? You mean in this figure here? That's the setting where location and context are identical. And so it's a non hierarchical model versus a case where they are differentiated and then it is a hierarchical model. For example, if I want to model myself walking through my flat, that wouldn't be a hierarchical model, where just each location, maybe the different rooms count as states that I infer based upon the curtains I see, for example. But then if I want to introduce another person's flat, then the transition dynamics may be different for their flat. So for example, in my flat I can go from the living room into the kitchen. But in my friend's flat, I cannot make the state transition. And that's actually something I'm working with currently where an agent like the underlying decision process is very similar. But an agent needs to infer which context he's in in order to then load the correct transition dynamics. That reminds me of the SPM textbook, how it talks about hidden states being just the state that a process is in that generates outcomes. So we have a clear distinction between the hidden stage, which aren't directly observed, but there is this hidden state like which flat am I in. And then that is going to change the transition dynamics between the rooms, but the observations are what are coming to the agent. And then also there's like learning, which we don't have here, but that would reflect the prior beliefs being updated through time. What about this planning stage? Like what is happening in this planning module? That's a good question actually nowadays. So this figures from my previous paper, nowadays I lump in perception and planning together because it's really not that easy separable. So in this case what I meant is perception is really inferring past up until the current hidden states to which I had the observations that belong to each states. Whereas planning is inferring future hidden states and future observations and potentially future rewards. But in the end it's all one model and it's a chain to go through the states. So the differentiation only makes medium sense I think. It's rather one inference problem. Interesting. Thanks, Blue. And then anyone else? So just thinking about metacognition and where that might fit into this model. I think it actually has published a paper on metacognition recently also. So metacontrol actually, right. But one can stack multiple agents on top of each other. And imagine as a kind of higher level agent controlling what the bottom level agent is doing. And so this would be like a deep active inference model. And in one of the recent papers we're also exploring this in a kind of metacontrol approach for describing cognitive control. So if there's some kind of downward causation, does that then inhibit what's going on at the lower agent level? Yeah, so basically you can see actions on the higher level are defining priors on the lower level and selecting the policy space. So what kind of lower level agent can actually do? This is now just kind of illustration of kind of separating maybe higher level cognition from the executive part which is kind of moving arms or muscles on the body. It's interesting about that like kind of like a virtual machine or a hypervisor that there's this like emulated agents. You know, what would my better self do? That's a classic metacognition question. And then also it was just interesting that, you know, Blue, you asked about metacognition and then Dimitri, you were right to metacontrol. And so that really speaks to the way in which we're thinking about cognition as control. And that's part of a control theoretic perspective, which is that like the cognition planning as inference, the cognition is about action selection. But then one piece that differs potentially is inactive inference relative to just sort of the way that control theory is often discussed is the insight of perceptual control theory, which is that the planning as inference is being done in service of the expected observations. And so we're planning and acting in order to control our perceptions and that's happening as part of this integrated loop. So it's kind of cool that there's like this like good regulator theorem in terms of cybernetics. And then it is like there actually is an emulated regulator agent. And then, yeah, pretty cool. Where does that play in with like our thought? Does that, I don't know. What does that relate to our actual experience of thought or does this happen all at the sub personal scale? Well, I mean, how I knew it is it's really like a problem of separating different timescales and a certain uncertainty associated with different timescales. What you can do now and for example, in the next couple of seconds is very different from what you can do in the next couple of months. And potentially your long term goals and plans have an impact on what you're doing currently. And there needs to be a way how to resolve this uncertainty on different levels of representation. And right from active inference, what one solution for that is just like you stack agents on top of each other, which just representing increasingly longer timescales. Interesting. So it and I mean, as each agent, I mean, we can see it the right this a little as something which is separated from an environment through the market blankets. So basically through the actions and the observations it's making, I mean, you can also separate your brain in multiple nested blankets, right? Where one part of the brain just informs and sends information to other part and steps signals as actions. And I mean, we know this from experimental research. A lot of this is hierarchy constructed. Both temporal and spatial priorities. Thanks, Dimitri. Stephen, and then anyone else. Yeah, would I be correct in between perception and planning you've kind of got this more conscious belief awareness being sort of available. And the prior beliefs, once you get from around the observations to perceptions might be below consciousness in the sense that might be part of the visual system or whatever. And I was just wondering how you see that transition from, you know, below awareness to phenomenal phenomenological consciousness, playing out with the types of models. I know sometimes like Ryan Smith uses some semi Markovian processes which start with some, you know, self reports about beliefs. So it's working at a level which is kind of building on top of the dynamics that might be going on in the kind of biology. So I was just wondering how you see those different dynamics playing out and where you fit your models with some other models if they were sort of to play with each other. If I may say so, I think this relates back to Dimitri's statement about the hierarchy of time scales. Because, or at least if it's like my conscious thought is on a slower timescale, for example, than my visual perception is. And I think the, you know, my conscious thought is maybe rather responsible for the slower timescales, where it helps me to narrate through the slow Markov decision process. Whereas it would be not useful if I have to critically think about any inference I make on my retinal image, for example. It's interesting you said there that the conscious thought it narrates because first off we've talked about narrative and active inference. But also just like you described how there was this hidden state of which flat I'm in and then that sets the transition probability of the rooms. Maybe you have a narrative that sets up like which policies you're transitioning between like in baseball, you know, there's a story to the alternation of the innings. And then it's like, okay, we're going to be enacting this policy because we're in this phase. And so there's like a higher level narrative that helps connect when policy transitions might be plausible to be adaptive. And then that is almost by definition at a slower timescale than the play by play. Because the play by play is going to be a lot more like motor control, the eye circuiting. And then as you slow down, you get slower transition dynamics that are more and more narrative in nature because they narrate like different phases of action, but then the sub actions are very rapid. I mean, in a sense, it also should do with a level of abstraction, right? If I see a leaf, it's just a leaf. It has a very concrete meaning in terms of my physical environment, whereas my study program, for example, is something a lot more abstract. That is a lot harder to grasp with our abstract states that you maybe need a narrative for more. Ryan, and then anyone else? Oh, I mean, I was just going to say, I mean, you know, like when we went through my and Chris's paper on consciousness a couple months ago or whatever. I mean, the question about how conscious our conscious processes relate to active inference really any related kind of computational model is is a complicated one in part because what consciousness is what the word consciousness is used to mean can be different in different contexts. You know, I don't personally think that, yeah, I mean, the question of what processes are conscious or unconscious in and of themselves doesn't really follow directly from active inference, right? All we're doing is modeling things at different timescales and their relationships to one another. You know, I mean, in the model of, you know, visual consciousness, you know, in that paper that we published a few months ago. You know, the idea is very similar to what other people were saying, you know, which is this idea about different timescales. But but here I mean the idea the idea was something more like you have a particular level and a hierarchy that operates over a sufficiently long timescale that you can do the kind of goal directed things that consciousness allows including generating verbal reports which themselves are very kind of extended, you know, integrative policies. But that, you know, but that also requires that it integrates information from enough different local sources, right, to do that. But beyond that, you know, talking about consciousness as itself the slower timescale process can also I think be a little bit sort of subtle. Because what a lot of people mean like consciousness, you know, has to do with the kind of subjective character of the kind of like moment to moment, you know, phenomenological aspects of experience. And that's clearly not happening on a slow timescale, right, like those the moment to moment changes in conscious perceptual experience are fast. You know, so what what we're doing in this kind of higher narrative level that I think you're talking about is something about, you know, integrating evidence for those over time and you know, using them to come up with, you know, certain sorts of longer timescale policies to And, you know, so the way that, you know, we've talked about it previously is just that, you know, you have, you can have these sort of depending on the precision of the interactions between the first and the second level in a model. If that precision is low, then the lower level can, to a certain extent just kind of operate semi autonomously, in which case the higher level doesn't really need to have all that much influence or know that much about right what's going on at the lower level. So you probably have as a kind of selection process where the precision gets turned up or down at different times and for different hidden states. So the second level can kind of selectively become aware of and start integrating evidence with respect to certain lower level processes, which simultaneously allows this kind of top down control over those first level processes. You know, so so with perceptual phenomenology stuff, it's probably more about the moment to moment updates between lower level perceptual processes and these higher levels or timescale processes, when the precision is sufficient for those updates to the second level to match. But anyway, I mean, that's more long winded than I meant it to be, but mainly just trying to point out that it's, it's subtle and it means a lot and it depends a lot on what you mean by consciousness, and it's not specific to active influence necessarily. Nice. Great. I mean, the difficulty I'm talking about consciousness is, I don't, I mean, we don't know what is what it is doing, right? What kind of problem is consciousness solving? What would be like a hyper intelligent organism without a conscious consciousness? Yeah, I think there are definitely, there are definitely a number of ideas in the literature. It's not, you know, it's not like it's completely pinned down. Right. I mean, like, there, there are things, for instance, like requiring working memory maintenance, right, like like extended working memory maintenance beyond a specific timescale looks like it can't happen unconsciously, at least in current experiments. Any kind of like multi step mental processes, like for instance, like you can get unconscious priming effects where people can do something like two plus two. But you can't them, but they can't unconsciously do two plus two minus four. Right. So anything that requires holding is kind of intermediate result in mind to do a further operation on it is something that at least thus far nobody's able to have been to do an experiment to, you know, that people can do that unconsciously. Um, there's, um, Ryan, would you think that consciousness is like a zero one state? I mean, from that perspective, right? I mean, right? For example, Giulio Tononi, right? Kind of sees this as a continuum. Like consciousness is really a continuum of states where you can just have higher levels and lower levels of consciousness. Well, so again, I mean, this gets back to what what you're using consciousness to mean, right? So in consciousness research, there's a distinction between levels of consciousness and content of consciousness. Now, like content of consciousness is, I think most people agree is fairly binary. I'm actually not totally sure what the, um, um, what the IAT crowd would say. I'd have to kind of look back at, um, look back at some of the more recent stuff there. But, but, um, in terms of neuroscience, you know, it's, it's, it's very well characterized work of like ignition events that are, that are nonlinear, all or none, you know, sorts of processes where if something becomes conscious, you get this global nonlinear just kind of broad activation widely across large scale networks. Whereas if you don't get it, you don't pass this like ignition threshold, then you still get the local, like perceptual, like local activation and like say visual cortex provision or whatever sensory cortex. Um, but you don't get, but it's just kind of linear and it doesn't kind of percolate up or pass a threshold to pass this large scale kind of more all or none kind of thing where the where the information becomes kind of broadly accessible throughout the rest of the system. Um, so that that aspect, I think has a fairly binary character to it. Um, that's different than this kind of levels of consciousness. And if you would do something kind of like the being in a predisposition to represent conscious contents, which would kind of be like a continuum from say like coma to like alert awareness. Um, and that that probably just has to do with kind of like the state of particle processing that allows for the kind of dynamics that support this kind of all or none, the stuff that allows for selective contents. Um, but anyway, I mean, all of this is going pretty far field from from active inference in your guys's paper now. So I don't want to, I don't want to detract it away. It's fun stuff and it's important. So, um, we'll go to a more applied question and then following this kind of round on an applied question. We're going to go into a few code walk slash talk throughs as well as learn a little bit about a few approximations and some of other work by Sarah. So Lars asked us a question on Twitter, which anyone is always welcome to do and wrote, I would love to hear any thoughts on how this work on the multi arm bandit might be related to real world problem spaces or applications. How might active improvements in multi arm bandit tasks translate to improving how some problems or decisions are currently solved. We talked a little bit about this in dot zero and dot one, but I'll go to any of the authors for a first take and then everyone else is welcome to give any thoughts or like ask a follow up question. So when you present the work and somebody just goes to this kind of obvious applied active inference question. What is your thought. Well, I mean, as I said, like multi arm bandits are applied to a super wide range of real board applications. So I open our survey like a survey on practical applications of multi arm bandits and contextual bandits. One, Bonifop and a rich, you're in a rich and Jaleel Bonifop. Right. So this is like one of the recent surveys I found 2019 on archive. And they hear the list, for example, couple of domains where they are applied healthcare finance dynamic pricing recommender system maximum maximization dialogue system telecommunications anomaly detection. And I mean, just by looking from that play, I would first try things out with active inference based bandits in kind of non stationary problems. And this is from what I see dynamic pricing and recommender system. They are kind of in this domain, but obviously, but one can also imagine that red telecommunication systems would also be a non stationary problem. And it's kind of searching for the fastest routing path and similar where different routes can change over time and then you have constantly right to explore different channels for for passing along the information. More optimally and right. This is a kind of practical domains where one could press try with multi active inference based bandits. In, as I said, in non stationary kind of problems, which is most of the other things I listed. This is a bit more difficult by one would have first to see if there is a way to deal with this kind of asymptotic, bad asymptotic behavior, basically all are optimistic information search. Awesome Sarah any thoughts on that or anyone else. Stephen. Yeah, could you just repeat that last bit this you said something about isotropic behavior or some sort of type of behavior I didn't quite catch asymptotic right so this is like in stationary bandits, people are interested in the asymptotic behavior after like infinite many actions so to say. And how the how different algorithms scale there, whether they converge to good solution or not. And I mean this is what we find in the right in the stationary problems active inferences to optimistic in a way it converges to a solution to pass. As the way we defined it at least. And that's why it doesn't have good asymptotic behavior so if you would apply to non stationary problem, when you would like with high probability to find good solution, I mean active inference algorithm is not one to go for there. At least, again, we have some ideas how to change this but I mean just based on expected free energy this is not something which behaves nice nicely. So you almost have to do like a hack on it to keep bringing it back away from premature. Yeah, exactly. It kind of just gets stuck prematurely into into a solution which the agent believes it's a good one. And the reason for this one can see immediately just from this from the term which drives the exploration which is this. Let's just expected information gain right in reinforcement learning this is called like exploration bonus or something like this and that this doesn't increase with time. So in a way, all these previous algorithms like upper confidence bound UCB reinforcement learning based algorithms they have a bound which increases with time so if you are not sampling from one arm. This bound becomes bigger so your algorithm is kind of forced to switch at some point. And this is not happening here so right at either one will need a different generative model or different way to introduce more randomness into into the behavior. One aspect of it was a figure one or nose figure figure two in the paper was that that on the top left there that the variance across active inference agents was increasing. So it wasn't that like every instance was slightly degrading in performance. It was actually that a small subset where you wrote there a small percentage of the ensemble did not find the accurate solution and were overconfident in their estimate. So it does suggest a few ways in which maybe you know when the variance of a few parallel instances of active inference starts diverging that could be like a warning sign that some of them are getting too confident too early. And then also is interesting how you explored that the learning rate or lambda through time and just kind of said that okay there's no simple answer here but it's an area of future work for sure. Yeah, I mean the problem with like parallel runs in in practical applications you don't have that how do you say that advantage is to kept in like simulations I can just run. I mean any number of simulation just see how it behaves right in practice when you're solving the problem you just have one trajectory and then you have to kind of provide an insurance that this sample or this sequence of actions will behave better. Then random or better that there is some probability to converge over over long over long run and this is something which you don't get right with with active inference and stationary problems at least in non stationary we don't see this issue anymore because of the basically the generative model itself. Because the more you are the less you're exploring the more your uncertainty rises on the arms which you haven't observed. Because the agent believes that things will change over time. Simply this is also what helps it shift between arms and it's also very efficient to kind of extract information from them because. Agent picks up really fast what kind of arms were not sample and it's aligned with the it's beliefs. I wonder if this also might reflect the difference between optimizing the model for practical application in I don't know a computer simulation program and the way that organisms behave you know it may be that. You know we we prematurely shut off and let things become a habit. But that may have downsides you know maybe that you know organisms just to minimize use of energy prematurely converge on something inaccurate and so maybe that's like gambling and that could it could show a fragility at times so there's maybe two ways that it's being applied you know. Yeah I mean I would say that organisms are never exposed to a stationary environment so right I mean it's in a way if you are not doing that you are sub optimal because you are living in an environment with changes constantly. Right so in a way you can exploit this by now creating situations where. Well people behave weird and have gambling issues and stuff but right. I don't think that's kind of disadvantage for well for what we evolve to do actually. For that we are very good in doing. Yeah I'm finding good solutions reasonably fast. Yeah sorry I was just I was just curious I mean because you know I mean this is something that you know like with some of our like empirical work we do. We've run into you know so like when trying to model like change point detection tasks for example using active inference where yeah like same thing like the thing becomes too confident too quickly. But I wondered I mean because I mean in a lot of other empirical work like in neuroscience especially like it's pretty clear that people don't just kind of learn and then unlearn the like reward probabilities or just whatever the kind of environmental statistics are. And there's like abrupt changes right I mean what people do is instead they infer that there's some new hidden cause right there's some new latent context. And then under that new context you basically just have really flat low you know really really small you know like magnitude concentration parameters and then you just build up right like your beliefs and new under that new context. And so there's there's like something so there's like that kind of approach which either would require having some something hierarchical or having some kind of like a different hidden state factor that would correspond to contact. Right and you know it seems like you could do it either of those ways. Then the other thing to do would be to have some kind of like in the HP out like in the hierarchical Gaussian filter where you have some kind of like dynamically adjusted learning rate. You know or really something more like you know so then like recently when we were updating some of our tutorial paper you know like after talking with Carl. You know it seemed like it would be a good idea to also include this kind of like forgetting rate parameter as opposed to just like the standard learning rate parameter which is just kind of like a scalar parameter on the actual on the actual configuration parameters right prior to adding on the accounts. You know and so you could that's not dynamic but but at least it does prevent you know it can act as kind of like a something like an implicit prior and volatility that can prevent the thing from becoming too confident too quickly. But but yeah I guess I just I wondered if you know you guys have like thought about or you know like played around at all with with something like that something like inferring inferring new kind of like late in context as opposed to just having to kind of like you know like the thing becoming too confident and having to spend a bunch of time overwriting rate. It's you know it's old it's old beliefs which happens way too slow. I mean in the end I act my current model is actually based on such a hierarchical model as you described where my agent instead of unlearning all the action outcome contingencies it opens up in your context. And of have you inferred that the context change. And I think when we were looking for proper learning algorithms for the paper. We most definitely looked in contextual learning and I think yeah in the stationary case there are no context. No that's why we didn't introduce it there. And I think the smart algorithm we used for the non-sessionary bandits also has a forgetting rate. Just like you say a pocketing rate on the concentration parameters which I think is an other reason why this algorithm worked well in the non-sessionary case or better than in the stationary case. And in theory I think we also tried some version of this where we have both context and I'm not getting parameters but yeah in the end then for the randomly moving bandits we had also thought about then having another layer to this hierarchical model. That in person volatility to adjust the learning rate and if that your learning model becomes very abstract pretty quickly. So I guess what I don't completely understand still is so even in the stationary case if you're forgetting rate is sufficiently high I mean that all this kind of happen is your actual like magnitude of your concentration parameters are never going to build up to too high of a value right. So like does that not still help with the overconfidence issue I just sort of thought it would. Well we don't have a kind of forgetting parameter in non-stationary case at least this shuts off because assume agent believes it's in a stationary environment. Yeah I guess you could still put an agent in a stationary environment but it seems like it could be plausible that it has a kind of. Yeah I agree I mean that's one way one could try out right to resolve the exploration problem in stationary case right just simply giving agent the wrong beliefs. I say just tell us. We use the same learning the same Bayesian belief updating algorithm for all the different action selection methods that we compare it and then it's a weird interplay between the learning model that we chose for the stationary case and the action selection rule. And I mean Thompson Sandling which uses the same learning does not have this issue. Yeah and that I think is also interesting. So to the second part of the question where how would that translate to improving how some problems our decisions are currently solved. I'm hearing a few things we talked about speeding up computation relative to other approaches but that's not a solved problem. For example if there's a 10 times speed up but then you have to run 10 agents in parallel to get a good ensemble estimate then it's a wash so potentially for speeding up just the computational requirements for certain challenges. A second would be that it might be possible to more rapidly lock in to dynamically changing regimes and to avoid some of the pathologies of model fitting in multi arm bandit contexts. And then a third way that it could translate to improvements would be it might reveal some hidden similarities between these different problems and settings. Like we already know that they're somewhat similar because we can apply a multi arm bandit to health finance recommendation systems etc. So we know that there's a lot of problems involving data that have similar enough structures such that a similar kind of general algorithm can be applied. But then it could be interesting once we have them on the common grounding of active inference to say actually you know the structure of the decision making is similar across these two settings. Or you know the telecommunications routing and the logistics routing are similar in this unexpected way so maybe insights relating to what kinds of tweaks an agent could improve on their performance with. Like what we're talking about here with the categorical hidden states or learning and forgetting tweaks. Maybe some of those insights could be implemented in active inference and then more easily transferred across different domains. So hope that conveyed some of our thoughts on this question to Lars and anyone else. Yes I mean the one issue there is that in different domains you will in principle have different generative models. I mean although that's problem is the same multi arm bandit I mean you would need different representation of the environment and this is then where the challenge comes potentially. As if you can represent it as a multi arm bandit problem you can choose different whatever action selection algorithm you find the best and in non stationary situation. At least from for what we investigated this seems to work well this doesn't mean necessarily that this generalizes one will still have to try out different things just to make sure. But in the end the bigger challenge is like OK what is a good generative model for this dynamic problem which I have and one can go there with many different things. I mean for example what Ryan also mentioned is kind of open and contextual learning I mean one can represent simple in the environment where you don't know anything about what's going on you can just do a non parametric generative model. Like Dirichlet process or Gaussian process you just try to learn even what the model itself should be. Awesome so let's go to this little sub discussion on Sarah one of your previous papers and then we're going to turn to some notebooks and walkthroughs of the bandit project. But hopefully this will be informative because first off belief propagation and message passing and these types of approximations are of interest to the lab in the community and also we're seeing a few faces that we can ascend active inference mountain on. We have Ryan with a matrix based MATLAB approach. We will walk through in just a few minutes with a Python based approach of the bandit and then this is a slightly different approach based upon the Beth approximation. So Sarah anything you'd like to describe I'm sure this will be new to many people so it will be helpful to convey what you were working on here. Sure so in figure to the upper one what you see is the generative model of a normal observable mark of decision process. Where the in the upper row with the unfilled circuits are the hidden states where they were caught as in the previous slide and below are the observations and then the whole dynamics of which states follow upon each other is determined by the policy pie on the left side. And then what people often do is assume in this queue that we saw two slides before the mean field approximation which means which means that for example here we have a bunch of hidden states ht ht plus one and so on. And then if you assume the mean field approximation, the approximate belief distribution would be just q of ht times q of ht plus one times and so on and so forth. And there with you create an implicitly. Yeah, you essentially treat all the hidden states as independent in your approximate beliefs, and then the dependencies will be averaged out. In figure three, you actually see the inverted model and with M you see the messages that are being passed in between notes. So if you it doesn't matter what approximation you use in the end you can calculate your beliefs with some sort of message passing algorithm. Except that now if you toss a mean field approximation and you estimate all hidden states separately. What we found is that actually they may not fit very well to each other so for example when my agent predicted how it will go through the grid and a certain policy it actually often predicted it will jump. And go places that don't adhere to state transitions. And there with of course leading to decreased goal reaching success. And then, instead of doing this mean field approximation, we assume the beta approximation, which instead of having q ht times q ht plus one and so on. You have small pairs of joint distributions so you have ht and ht plus one times q of ht plus one and ht plus two. And then during the math you can show that if you assume this type of approximation, and you plug it into the free energy you want to minimize the free energy, the belief propagation method passing algorithm comes out. And so you can use the belief propagation and message passing algorithm. To calculate beliefs and this is actually exact on graphs without loop. And then you get a more appropriate joint representation, in this case of temporarily dependent hidden variables. But I mean in the end, yeah what I do nowadays that I also have a hierarchical model where now for example, the parameters of the Markov decision process are context dependent. So I model and think, which variables belong together. And then I apply the mean, the beta approximation in these parts, but then some slower varying variables like the context. I just use a mean field approximation because it didn't. It varies differently anyways. Thanks for that breakdown. What is message passing like who are the messages being passed between. And does that reflect a variant on active inference, or is it the same exact active inference model can be approximated or can be calculated through message passing or through other mechanisms. Correct me if I'm wrong, but I think in the end, all active inference agents to message passing depends on the approximation which type of message passing algorithm. The beta approximation it's belief propagation propagation message passing I think they are equivalences to the some product algorithm or so you just get different messages that maybe that are worse but okay what are the messages essentially each note in the graph so each hidden variable sends another hidden variable that it's connected to a message about which state it should be in so HD would say, hey, we're currently here. Then I think in the next date, we should be there. And then vice versa, HD plus one can send a message back that says, hey, we want to be there next time set where should we be now. And so, yeah, these variables essentially send each other messages on what they should be, so that they are in agreement with each other. And that's how you get essentially a probability distribution over what you think you're in which state you're in and will be in. Yet, Dimitri, any thoughts on message passing or where do you see message passing fitting into Bayesian statistics and a few other topics. I mean, as Sarah said, I mean, they would say all the algorithms are message passing. So when you talk about mean field approximation, this would be traditionally variational message passing. It's called the algorithm. Here, this is like the propagation. It's message passing algorithm based on marginal probabilities. Instead of in the variational message passing, you have like expectations of the log of conditional probabilities, right? I mean, this is kind of the differences. What you're what you're getting and losing. And I mean, we have another paper with Thomas par and Carl neuronal message passing using mean field beta and marginal approximations, where we kind of contrast the different ways. So who is interested in this topic can look into a bit with unpacked discussion of similarities and differences. In practice, the difficulty with mean field approximation is, I mean, for dynamical problems for the decision making, it's not really a good approximation. This is not something which one would use and implementation wise in active inference mean field approximation is not used on the dynamical level. What what they're using in Matlab, for example, is all this marginal approximation. It's still kind of gradient based method, but it's computed slightly differently. So. Yeah, sorry. I just wanted to wrap up is basically the more complex problem that it is the more uncertainties you have on the state transitions, the more difficult you would difficulties you will have it mean field and marginal approximation. Basically, the beta approximation is the only thing which kind of corresponds to actually exact inference in the in non cytograph. So basically, this is, this is theoretical solution, you know that you can be exact under specific conditions on the marginal. I can only warmly recommend by your didier and vice I think in 2001, called understanding belief propagation and its generalization. And I find it very didactic. It helped me a lot. And they explained in detail how variation inference is also connected to message passing algorithms. Interesting. So just to capture that one interesting thing you said there about the gradients, the gradient, it's sort of like your models in a given spot on the landscape. And then it checks the temperature and it goes in the direction of the gradient. So we've talked a lot about gradient based methods with the straight line versus the ISO contour message passing kind of breaks that down into a process. But each click of the model messages are being passed back and forth to one another, which is both computationally tractable. It's also shown through some work of the bias lab, Bert de Vries and others that for specific categories of Bayesian graphs that message passing algorithms are basically equivalent in the 40 factor graphs the FFG, which we're going to be learning about in the future. So this topologically puts you more into touch with the predictive processing, for example, the messages that neurons are passing to each other. So it's one thing to say, well, it's as if the neurons are messaging to each other and that's doing a gradient descent. But it's another thing to actually saying we have a message passing scheme for modeling how these message passing agents do inference. So there's a few points of contact there that I think are pretty important and it's pretty it's also interesting that you brought up that early paper. So it's we'll check that one out. Stephen, did you have a question? I think it's been sort of covered really. Cool area though. So thanks Sarah for sharing that just one last question on this before we go to the bandit codes and walk through like, where do these approaches. Are they converging and they're going to weave together more closely or is one of them like an umbrella over the other such that work will continue mainly under the generalized form. So where do these different approaches that we're talking about to implementing active inference. Where are they heading. Like is it is more development happening on the message passing approximation or on other modes of breaking down active inference. I'm not sure I have the overview of hundreds of papers published every month and active inference to say something like that. As far as I mean our motivation is here we just want to have in different situation good enough inference approximations so In a way for me more important question is like what is a good representation of different tasks and environments to have rather than what is the best kind of inference algorithm to use Because especially if I mean depending again on the environment you're working on and but in dynamic context it's difficult to get like lots of advantage with just improving slightly on the inference performance. Just because there's lots of uncertainty and things anyway change all the time so yeah. I mean with this we also kind of write for this multi embedded paper we tested for lots of different algorithms. There is kind of one of notebooks in repository just kind of lists different things we tried out. But in the end one doesn't see you I mean any reason why one algorithm will be specifically a way is better than another it just they're very similar. Sub subtle subtle differences. Great point that there's a lot of work on the comparability of different approximations and different algorithms. But actually it might be more beneficial for a given application to focus more on how they're specifying the generative model. And making sure that that really captures essential features of the environment because it's like okay let's just roll with active inference and spend our attention on the generative process and the generative model. Rather than try to finesse potentially a grossly inferior generative model with some better approximation there might be limited returns there. So Steven and then we'll go to looking at some of the bandit code. Yeah this is a kind of general question related to that is I often think about sort of didactic the ontic queue type. Sort of ways of making inference is sort of deductive thinking and such like and I'm wondering whether the message passing is more present in those types of models because it's something that's been detected. Something that's been detected in the environment and decisions based on that are being made and you've got then you've got your kind of inductive where you're trying to narrow the gap between a goal and then you got your abductive where you're trying to build something up and infer from like a landscape that you're trying to work out what what is out there so to speak. And I'm wondering if this is correct in my thinking that the message passing is more used when you have like a particular deductive reasoning approach and inductive or abductive ones would be different. Interesting question about how those different modes and types of logic are connected to message passing. One thought which might be on or off base is that we're thinking a lot about how variables in a model are like nodes and then there's edges connecting the nodes that reflect the relationship between those variables. And message passing is just one way to describe as a model updates through time which information is being passed between the variables. So it doesn't say anything about the mode of operation of an agent which might be engaging in different kinds of logics. And I think that's a really excellent question like how how do we break out of the known with respect to how our algorithms update. And it does actually touch upon this mean field approximation. For example if you think that all through all time past present future that there's some stationarity of the hidden state then the mean field approximation will work. But then if there's going to be a change in the state then the mean field approximation is potentially going to give a misleading outcome. But in any case message passing is just describing sort of the mechanics of how the model updates and which information between variables is connected. But importantly which variables are not connected like the observation at a time point doesn't influence the observation at a different time point directly. But it could be a specified path of message passing. Let's look a little bit at the bandit code. So the links to the GitHub die Markov you have the right name to work in the area. So we have a few of these notebooks up and is there we can look at the overall notebooks folder or do you have a sense of which of the notebooks might be interesting to walk the record. And go to a few or if you want to jump to a first one. I think just this first one to think about this maybe. Oh go ahead. Yeah, there are a couple of things. I mean, there are a couple of notebooks which are not immediately relevant for the paper. So which we can focus just on the things which are part of the paper or I can just generally kind of talk also about these other things which were just process of thinking about the problem. Okay, how about before we even how do you as a researcher working on this area, keep that separate like the paper specific developments but then your overall developments. Do you find that you're on an overall development mood, and then you dip into specifying a paper or do you pursue the paper and find that you have more general insights while you're working through the problems. Well, I mean, I pursue the paper, but I mean, then I mean, there are lots of branching paths on that way. I have to figure out what's potentially interesting what's relevant. And so part of this code is just exploration a bit of things topics which were interesting for me but which turned out not to be so important in the end just maybe for some other paper and things which are just focusing exactly on the comparison of multi and discussing this part. So in a way it's difficult to combine lots of potentially unrelated things into one person, one always has to make some favorites in the end. I knew it would be a both type of question because it's something that researchers are often, you know, interested in general questions but we need to deliver on specific research projects with a defined scope and conclusions as well. So it's just cool to see that this repository holds a little bit of both. So for example, this notebook which you open first like expected free energy comparison. This was just my contemplation of just different ways you can define expected free energy. So typically people will think about expected free energy in these terms of expectations of our outcomes. But for example, another question is okay but why not, why not computing in terms of expectations over states. And there is a relation between these two, right? One is an upper bound on another. So basically you see this last relation, there is S of pi, there is G of pi and there is I of pi. And basically G of pi would be expected free energy in terms of expectations over latent states. So this first, and this is upper bound on what this would be like expected surprise for me in equivalence to the free energy being bound on marginal, like log likelihood or surprise. And then there is the I of pi which is just KL divergence between posterior prior which gives you then something else. And in one can also think, well, we can select policies or make decision making algorithms based on this quantities and what happens when you use one or another. So this is something which I was just testing out for myself. And I'm still not clear what to think about this. So that's why I don't have a paper. This is very interesting. It's all conditioned on policy with pi and then we're approaching it from the top and from the bottom. And so the free energy is being sandwiched in between these other approximations. And then you wrote here that the minima of I and G of pi match but S is giving a different minima. So what was curious about that to you? No, at least in this example, right? So I just kind of build up simple example and one can see that optimal policy or like minima of these different quantities is different. And one can also think probably different examples. Well, this will, I mean, this relation will not hold what I just wrote as a comment there. But I mean, this is more like than practical question. So if you would then build an agent, which one of these quantities should you use? So they are all kind of effectively can be seen as expectation of our future surprise and having different bounds on that expectation as an approximate quantity. So just the question is, and I also found in different problems, depending how I formulate problem, one of these algorithms works. Well, not algorithms, but objective functions, let's call them works better. So, all right. So in this particular multi embedded task, which we explore the G of pi is not. So what should I mean, what should correspond to expected free energy, this is not behaving that well in a way you don't get such a good performance. But I can slightly change the task and I can get better performance with G of pi or S of pi. So, as I said, I'm not sure what to think about this. So cool. Well, it's really like exploration of different things. We'll have you back on this topic when you're in a different phase. Do you want to look at inference algorithms comparison or is or. Yeah, so inference algorithms, as I said, this notebook and just leave some of the things which we consider right in the literature when you think about this. Well, approximate inference problems in changing environments. One can think of different ways how to solve this. What we use here is the right is representation which comes from change point models. As an approximation for the task, which is a good approximation for this kind of kind of pitching bandits, which is what we want to charge of showing here right. For example, this is the question bluehead last week also. So this would be how probability changes in us in switching banded on one arm. Over time right now probability of generating one. Let's call this a reward. This is what this plot what is showing. And right then this is the switching concept that after each whenever switch occurs we are just sampling the new probability for each arm. So this is this kind of setting with non stationary difficulty right so basically this difference between the best arm and second best is it varies over time. And then you have like a drifting dynamic for which just like the different generative process model would also be be better. But then I can also say well we can use any generative model for any of these problems and let's just see which does inference better or are there any differences there. So can you just use different representations in for different underlying environmental dynamics. In a way, misspecified in the model but still doing reasonably well in the inference part and writes. When you kind of look at the results kind of posterior expectations you get over time. Different approaches lead to very different similar results in the end so there is no kind of strong. Advantage or disadvantage of one or another. And that's that's the reason why we just picked the simplest thing which kind of more efficient basically algorithm because then it's much easier to scale to more arms more time steps. Okay, very interesting. This is a pretty thorough walk through the hierarchical variational smile. Yeah, exactly. It just described the generative model some of the steps one needs to take to get to the posteriors. They're also then just implementation of the algorithm is unpacked there. But besides this right to what we used in the paper we also implemented some of other approaches which are non variational kind of Asian inference, which is seems to be quite good and I mean right on average it performs better it's like more optimal representation. This comes from I think also recent paper on multi arm bandits. Do you remember Sarah Delferstata, Louis et al. Unfortunately not I would have looked. Well we are citing the paper anyway so but right. But yeah pretty recent paper. Pretty recent paper and one can see that it does slightly better job so that would be I guess this algorithm you're showing now. Yeah. What are the lines representing the red, blue, green and then the sort of flop. The green is just for the posterior expectation or like reward probability so right to perfect algorithm should just match green with blue. And the red is basically a change point inference so basic posterior probability that the change occur at this specific moment of time. And as you can see this process is quite noisy in a way that you have lots of small errors in a way or slight jumps in places where change didn't necessarily occur. You know what what this reminds me of is the blue is some hidden true price of an asset and then the green is like the markets tracking that price or value. And then the red are like orders like buy and sell orders on the market that represents the underlying situation changing. And it's like. Well I mean markets yes one can see markets is doing some kind of inference on the true price value of the price. Right this will be kind of distributed inference problem. Yep. I mean it's the whole no one knows the price of a pencil but maybe the person making the eraser knows when the price of rubber changes a little bit. But but the colors are just a coincidence. Yeah. Okay. And then here we see something. Yes. Here is this. This is if I remember correctly this would be the hierarchical motion filter applied to the same problem. This work from Matisse et al. Chris Matisse. So he actually in his first paper he did apply to inferring the price of an asset. So when he introduced that but it's found lots of applications in the cognitive neuroscience and just understanding how people adjust to volatile environments. So what we have here is that this kind of change probability is constant over time. But what you can also think of the environment to change probability itself changes over time so that you need to kind of adjust to these changes also. So that would be also going one straight forward extension of this multi banding problem that different types of dynamics and testing out then more complex generative models are approximations to deal with that problem. And so right we are just comparing now how this algorithm tracks the underlying price value in different environments. I have a question about what action does. So here the underlying generative process is stochastic but the actions don't change the process like choosing a different slot machine isn't changing the probability of slot machines. So it's almost like a little bit more of a niche modification setting or can it just be directly put into the model that certain actions actually change aspects of the underlying generative process. Or is that kind of a feedback between action and then future hidden states is that like another module that has to be constructed in. Well for what we have implemented here that would definitely need an extension right so we kind of this is more than this general representation which also implementation of active inference which for example is implemented in SPM which helps you deal with this more extended problems. So here we really are working with the simplest algorithm just for having it like very efficient and compact so that it can scale easily. What the more general you are trying to capture many different problems the more difficult to have scaling. It's like inference all the way up or all the way down and it's it's a thing that will return to many times which is that it's all good to track the absolute value of what you're interested in but then the uncertainty on that and then the uncertainty on your uncertainty about that can get you into this infinite recursion so you just do go quick and dirty and just have a simple idea of how variance and higher order uncertainties propagate or does one fully specify all the possible ways that uncertainty can exist across multiple levels which can get to an explosion of the computational requirements really fast. Well, I don't think that necessarily hierarchy is the issue. But it's more right if you then start assuming that your actions change the states state changes or transition matrices in this stuff right so basically your effective actions you're modifying the state transition matrices. Then this requires that I mean you also think about the planning problem and it's not any more simple action selection problem but it's then becomes a planning problem. And this makes things more complicated if you introduce such an environment because then it depends where you are at different moments of time in different states. Because we've seen in the Markov decision processes that policy pi plugs into be which is the transition matrix between hidden states. And so that's like actions changing the way in which hidden states are inferred to change through time. Whereas the one step decision making is doesn't have to be done in that same way. Yeah, I mean I think also I mean at least original implementation in SPM what is used this doesn't scale so very well for this type of problems and different groups or people have started exploring like Monte Carlo research and other methods which actually allow you to then figure out potentially best policy in a very complex five dimensional problems. Right, but but I mean as long as you're kind of in the main of cognitive behavioral neuroscience you can get away with this by just making your task reasonable simple. Yes, but even as as Ryan like pointed out earlier that real humans even when you control the experiment or you think that you're introducing like a gradual change in a parameter they might actually be cognitively doing a different type of inference. Humans are problematic. I don't like them. Oh humans. We should do experiments with robots. Well, maybe that will be the, I mean, we brought the conversation of course logistics planning motor behavior exploration exploitation spatially. Those are things where maybe having a defined digital twin for some robotics, and then we go from in silico to robotics to starting to introduce the element of the human and the unknown. Steven. Sorry, I think I mean like what in behavioral experiments you always have this problem of just like convincing yourself that the model you're using is something which reasonably well represented humans are doing and you can be quite certain that this is not what they're doing exactly. This is just an approximation of all the complexity which we have in our non parametric representation of the role. I mean, kind of there's this issue where you can kind of get people to perform very well on the task through lots of training. But then it's like, okay, this really what I want to kind of experimentally, whether like how the humans learn to do this task well, or I'm actually interested what people are doing when they are solving any task, how they represent the environment. The time representation, how is this incorporated in their decision model so and I think it's, I don't know how other people feel maybe Ryan can comment on this, but for me it's always like a difficulty to to deal with it like it always certain whether you're doing the right thing like simply enforcing something. Some task on people. Yes. Thank you, Steven. Yeah, so talking about this, the problem with humans. But is I'm interested in how the precision parameter you talk about the precision parameter and it helps to determine whether someone's going to explore explore the context. And I'm interested in how exploration can become a pragmatic. This precision about the usefulness of something that's exploratory, I would art or an experience of some sort and how that meaningfulness in the future could offset a pragmatic game in the near term. So I'm interested in this, the way that precision fits in with that and sort of the evolution of that and they even talk about that bit with the affective charge with Casper Hesse by affective charge is how your precision about an expectation has been violated or not. You know, it's not necessarily whether it's good or bad is whether your your prediction of how well you could expect something to happen suddenly got violated and that amplifies everything. So I'm interested in that because I'm trying to create immersive experiences for theater and places like that. But I was just wondering what your thoughts about the how that precision parameter fits in with that dynamic and how that could be extended or if there's other other parameters that kind of can fit in there as well. I'm quite sure that I understand the question, but this is the Are you asking like, is precision always relevant? Yeah, can you stack the precision with the pragmatic? That makes sense. So say you've got low precision, but you have a high precision over the fact that exploring the low precision would be useful. So the two things kind of stack on top of each other as being a sort of pragmatic epistemic game. Yes. So I mean, in this other paper on meta control, we are playing with it a bit differently. So we are saying that you can kind of control your exploration tendencies. If you learn over time, the exploration is bad. And this is kind of where this kind of stacking of different levels of the hierarchy, like a higher level kind of agent controlling the lower level comes in play. All right, because simply if this high level observes that over time, you're like the agent is not performing well. In a way, it's not reaching the goals efficiently, then it kind of punishes exploration and learns just to be more exploitative in the specific settings. And the other way around, we can make kind of a setting where the exploration is beneficial always. And then the agent learns to kind of behave in this way better. So one can then assume in the real situation, this is something which people learn. So just if they observe that depending on what they do, they gain more or less. They will also adjust their tendencies and associate this with different contexts. You could have almost like it's not just precision, you could have like another generative model managing those. Yeah, exactly. I mean, you can just have another kind of higher level representation which controls priors or precision on the lower level. And basically inducing different behaviors through that. To connect that to the ostentative cues. It's almost like if you had a sound or a cue that said, okay, now we're in a brainstorming period. Okay, now we're going to drill down. And by alternating between a brainstorming period and by drilling down, we're going to have the right kind of outcomes. But giving a cue for when one should be in a more exploratory mode versus potentially like more working with what is already known. But this looks like an interesting paper, this meta control of exploration, exploitation dilemma. And also framing it in terms of the hierarchy of time scales, whereas often it's framed in terms of, I guess, just instantaneously. What maneuver would be best, whether one is instantly preferring exploration or exploitation, rather than through deep time. The interaction with reviewers during that paper is what motivated this paper. So that's like, okay, I really need something to just explain, unpack some of the similarities and differences between active inference and everything else around. Nice. Well, if there's any other sort of closing thoughts, this has been an awesome set of discussions, I think. We'll have a lot to think about and hopefully people will work through these notebooks. Keep an eye out for when the final paper is published. So you got this one fully published, while the Multi-Arm Bandit is still not ready. It's still under review, so we are finished. Yeah, hopefully so. Cool. It's a first revision around currently. Well, to anyone who's on here, any other thoughts or questions or what do we take moving forward? I'm just thinking about different tabs I could be having a regime of attention on as the Multi-Arm Bandit, you know? Do I check something I haven't checked in a while because I'm uncertain? Or should I pull back to my higher level and be like, you know what, it doesn't even matter if I'm uncertain about that tab. I should just stick on this one. I'm sure there'll be some fun. Yeah, Dimitri? Well, I mean, it's a very general problem, as I said. Most of the problems are Multi-Arm Bandit problems. It's not surprising, really. Awesome. Well, Sarah and Dimitri, thanks so much. It was great to have your engagement from the .0 on through it really made it awesome for the lab. So thanks everyone for watching and until next time. Thank you for having us here. Thank you. Yep. Yeah, it's been much. Until next time. Bye. Bye-bye. Bye.