 Hello, everyone. Welcome to the Active Inference Lab and to the Model Stream 2.0. Today's going to be a really awesome discussion, so let's just introduce ourselves and get right into it. I'm Daniel Friedman. I'm a postdoctoral researcher in Davis, California. Ryan? Yep. So I'm Ryan Smith. I'm an associate investigator at the Laureate Institute for Brain Research. Hi, I'm Christopher. I'm a PhD student at the University of Cambridge. I'm Max Murphy, and I'll be starting my postdoctoral fellowship at Carnegie Mellon in hopefully a week. Awesome. Thanks to Ryan and Christopher, the authors, or two of the three authors of this paper for joining today and for Max for lending your expertise as well. So this is the second Model Stream in a multi-part series that is going to be highlighting perspectives and addressing questions related to applied active inference. And so right now, we're looking at the tutorial paper of Smith et al. A step-by-step tutorial on active inference and its application to empirical data. The link you can find in the video description. So if you have any questions or thoughts or comments during the live stream, type it in the YouTube live chat and we'll be monitoring that. And then if you have any questions after the live stream, leave a comment on the video or get in touch with us another way and we'll try to address it in a future session. And if you want to learn more or if you want to participate, go to activeinference.org. All right, so today in our second session, we're going to be picking up with a brief overview and summary of where we're at, where we've been. We're going to go through a few clarifying questions that have been raised in the last week. And then we're going to dive into the specific examples and into the code and do some code walkthrough. So that will be really exciting. All right, so I'll just throw out the first opening question, which is, who is this session for? What kinds of tools and skills would you say are required and which kinds of tools or skills might be helpful? Okay, well, I mean, I would say that the session and I mean, really the entire tutorial is for people who are just kind of, I mean, mainly people who are just starting out, people who want to be able, who are interested in active inference, but maybe don't have a lot of background in mathematics, you know, especially like people coming from like neuroscience and psychology backgrounds, for example, that want to or people who do some computational modeling, but like maybe like simpler reinforcement learning models, things like that, where active inference is kind of notoriously a little bit more kind of like black box, you know, it's harder to kind of break into. A lot of people think that the models are described in a way that's kind of like opaque, you know, and so the idea was to give people the background needed and the actual example code that would be needed that could be adapted to build your own models, do your own simulation studies and apply and build models of task studies and be able to fit models to data. In other words, to kind of start to finish to be able for a person who wants to to be able to use active inference models to model data and empirical studies and actually use them in their own research. So as far as tools necessary, you know, the main thing right now is going to be having some minimal understanding or motivation to learn how to use MATLAB, you know, so, you know, we in the tutorial codes, I mean, we have like seven or eight different, you know, tutorial scripts, you know, at this point. But the main one is the one that I'll be focusing on today and it's very heavily commented. So it kind of best we did try to describe kind of like line by line, each thing that's going on. I mean, it's honestly, there's enough comments and there'd almost be kind of like a second paper in and of itself. So, you know, we're hoping that even someone who doesn't have a ton of background in MATLAB will be able to kind of bootstrap their way up. But you definitely need to have a little bit or be motivated to learn at least a little bit. Anything else to add on that, Christopher or Max? No, I think that sounds that well. Just one note on the code there. So you mentioned it's in MATLAB. Now, what about other programming languages or backgrounds like Python? So what other languages are people in the active inference space developing in and how could we include more computer languages as well as natural human languages? So, I mean, there are so at present, everything is almost exclusively in MATLAB, unfortunately. And the reason for that is that a lot of this is actually built off of SPM scripts, you know, that Carl Friston originally wrote in relation to neuroimaging approaches, so things like dynamic causal modeling. And so a lot of the current scripts for doing active inference call on a broad range of other SPM scripts that are used for a lot of different purposes, for instance, in sort of dealing with large matrices in like neuroimaging. So for that reason, and because Carl has primarily worked in MATLAB in relation to SPM, everything has kind of been in SPM and specifically in what's called the DEM toolbox in SPM 12, which stands for dynamic expectation maximization. There's a ton of active inference scripts as well as a ton of other modeling scripts for, say, like continuous models and other sorts of things in there. And we just make use of a couple of those. The main ones, not the ones they call on. But but that being said, there are people who are trying at the moment to translate some of this stuff into Python. Alec Schatz, for example, is currently in the process of trying to put together Python versions of some of the main simulation and inversion scripts. I don't I mean, they're very kind of in beta at the moment, as I understand them. But I know I know that that's something that's in progress. Cool. So it sounds like because this project has such a history of being developed in MATLAB and the SPM toolkit and Carl Friston's research and collaborations were drawing heavily on MATLAB. However, in the modern development context, there's a lot of open source projects that are developing in Python. So that's definitely an area we'll hope to be expanding and also a great opportunity for people who are experienced in Python or MATLAB and want to be curious about active inference. This is the opening to help contribute to something that could be really exciting. So just a last a last warm up introduction question. What kinds of projects or questions or investigations do you think this could apply to for somebody who they've heard of scientific modeling, but not active inference modeling? What do you think are some low hanging fruit or really straightforward interpretations where you think it could be applied? So I mean, I guess there's a couple of questions here. I mean, one is, you know, like, well, I guess the main one is given that the kind of history of computational modeling has largely been in the like reinforcement learning, you know, sort of literature or I mean, there are others like drift diffusion modeling and things like that. But the, you know, one of the kind of major questions is, oh, a lot of inference models seem like they're really complicated, you know, relative to some of these simpler sort of traditional or reinforcement learning models. So what is it that active inference actually adds, right? Like, what can you do using an active inference model in a study that you maybe couldn't do or couldn't do as naturally using standard sorts of existing reinforcement learning models? And, you know, here, the kind of answer that I would want to give is there's a few parts to this. One is that because the kind of value or cost function in active inference is expected for energy as opposed to a reward function. And because the expected for energy has a reward component and an information value component, then it's quite a bit more natural in active inference models to model, explore, exploit tasks, you know, so tasks where the agent has to somehow intrinsically assign value to seeking information as a means of knowing how to maximize reward eventually. So there are and there are actually two different types of sort of directed or goal driven explanation or exploration in active inference. One is what we just call parameter exploration. And the idea here is this is probably the closest thing to reinforcement learning where, say, the agent might not start out knowing what the reward probabilities are if they choose one action versus another. So they will choose actions that will help them essentially learn best what the reward probabilities are before, you know, just reliably continuously selecting the same option, right? Because that's the kind of know which one is the most rewarding first. So that's called parameter exploration. And that, like I said, has analogs in reinforcement learning, but doesn't have like a specific in standard models anyway, doesn't have a very specific sort of information drive to seek that out. But there are some. So there are things like information bonus terms. There's different things like that in reinforcement learning that can do something like that. But they're kind of tacked on. They're a little, they have a little more kind of like ad hoc feeling to them, I guess. Whereas it kind of emerges naturally from expected for energy in active inference. There's also something that's a little more unique in active inference, which is called state exploration. And that's more where you're just kind of choosing to move to states that you expect will give you the most precise observations about what state you're in. You know, so this would be the thing like you don't know whether you're in the green room or the red room. So you should go turn on a light. So you'll know which, which room you're in. Right. So that also falls out of expected free energy very naturally. But not out of a lot of other traditional computational modeling approaches. The, there's also, sorry, I could go out for a while about this. But there's also something in reinforcement learning models that has to do with Explorer exploit called random exploration, where people essentially start behaving more randomly. If they're uncertain, if they know they have to make a bunch more future choices. But those values are typically fixed in reinforcement learning models, whereas the expected for energy precision parameter in active inference is actually a dynamically updated version of that. So it changes to remain driving essentially the amount of randomness and behavior that's expected given past observations. And so that's kind of nice. The last thing I would say is that active inference has a very kind of specific neural process theory associated with it. So it makes very specific predictions about the neural responses you ought to see in studies, whereas most other models don't. So those are, that's what I would say. Thanks for the answer. Any thoughts on that or any of the questions? Christopher or Max? I would just say very broadly speaking, active inference is really great for modeling decision making under uncertainty. And in particular, various types of perceptual decision making and in discrete state spaces. So you can do anything from modeling. So like Ryan and I have some previous work where we show that you can reproduce a whole lot of kind of classic visual awareness neural correlates and the classic visual awareness paradigms, for example, using a hierarchical model. And the reason we can do that is because it had active inference has such a detailed and rich process theory attached to it. And so for me, I think what's really unique about it is that there's lots of different ways of providing unified explanations. But one way that you might want to provide unifying explanation is just have a unifying set of kind of a unifying computation or one computational architecture where you can draw in all of these different things into one paradigm and then understand or well one computational paradigm that is and understand how they're related through the different parameters that are being modulated when you or being fit when you build models, for example. So I'm not aware of any other modeling paradigm really where you could model things as diverse as a basic kind of explore exploit task or looking at kind of phasic dopamine dynamics and then something that goes all the way up to something like visual awareness or even like planning and Bellman optimal planning. So to me, that's what's so special about this. If you want to unify things under one umbrella, this is really Yeah, one last thing I'll say and this is just kind of a practical nice thing is this is that the in most other modeling approaches, you actually have to kind of change the equations you know, like to fit it to a specific task, whereas in active inference technically you don't. All you do is the actual update equations are always identical. All you have to do is write down the appropriate generative model. So it's nice because it's really generic and once you have the generative model specified, everything else kind of just comes for free. And just to jump in off of that generalizability, the reason I'm interested, I'm somebody who studies movement and totally different area of neuroscience than decision making and uncertainty modeling. But at the same time, this framework, what kind of interests me about it is that exactly that which is that it seems that I can start out with kind of a structured generative model that is consistent even across domains and then potentially use that to apply it to my own neuroscientific research. The main questions that I have and things that I'll be interested to hear about moving forward would be, you know, we talked about that we can use it both for parameter exploration and state exploration. But when we're in the empirical context of the experiments being performed for it might be the case that I'm uncertain about certain parameters or I'm uncertain about the state and I may be uncertain about both of those things simultaneously. So finding a way to either design my experiments so that I impose certainty on either the state or the parameters is probably seems to me an important part of using this model. No, no, it's a great point. Actually, me and Chris were just talking about this like yesterday or something, right? Like how to design an optimal task to disambiguate but include both having to solve problems that involve state and parameter exploration simultaneously. Technically, the example model that we have for the tutorial is able to do that but not in an incredibly interesting way. Justin kind of way to show that how you could do it. Very interesting about the experimental design because the kind of statistical techniques like a t-test we know the statistical power calculations or we might know the kinds of experimental designs that lead to good clean t-tests but it's a great point about the experimental design. And just to highlight this generalizable and also generative nature, we talked about a lot of different areas of research, a lot of different frameworks, but the parts that stood out to me were Christopher, you said decision making under uncertainty and action and perception. That covers a lot, a lot of different areas of neuroscience and then we are looking for this kind of unified explanation but rather than an explanation which might be philosophical or based upon how much we understand it or how we feel that one day, we're rather going to look for a unified computational architecture or a computational paradigm as you said that will let us compare and design and develop these generative and generalizable models. So really cool discussion and I'm just going to ask one question from the YouTube live chat then we'll go to Ryan with a screen share. CB asks, can active inference research help us understand the neuroscience of stimulus independent thoughts when there isn't any sensory input at all? Thanks for elaboration in this regard. Oh I mean I guess I mean the the broader answer is yes but the way that you would do it would probably be pretty task specific. You know so I'm not necessarily sure how to give a really precise detailed answer without knowing the particular like task or even just general psychological context that's being imagined. So I mean it's actually really easy to model for instance like working memory tasks where you might get a stimulus at one point but then there's just no stimulus for a really long time and somehow the thing has to continue to maintain representations in the absence of any stimulus to know what to do you know at some distant future point. You know there's that kind of thing. I mean just as one example again like I said the way to answer the question is probably is would be different depending on the exact sort of psychological or task context and question. Cool so just like explore exploit there are tasks that looked more exploratory or more exploitative almost within this action of perception we're going to be talking about some tasks that are more towards the action oriented side like the policy planning or the motor behavior and other tasks that are more towards almost the perceptive side like this visual awareness task. Now ocular motor movement the movement of the eye muscles is still a behavior and there's papers that model that but it's almost like we're looking at the whole spectrum of explore exploit or action perception reinforcement learning and wondering how we can again make this unified framework for talking about these different kinds of systems. So shall we jump right in Ryan or go ahead and add anything else. Well I mean I mean just just one last point to bring up in relation to their you know this person's question is just that it's actually what's really important and probably a more recent development and this is stuff that some of the visual awareness stuff that me and Chris have done and also some stuff that I've done previously with like emotional awareness for example like also emotional awareness. There's a really important sense of what you're doing not just behavioral actions but actually cognitive actions right so you're selecting policies that just do something like in the neural implementation just do something like change the the functional connectivity between two brain regions or change in other words it is stimulus independent action they're actions that that have to do with changing what you're doing cognitively and not what you're doing behaviorally you know so that's also an important part of it that's been done and and you know a few previous papers that I think relates an important way to this kind of stimulus independence sort of question but again only if I understood it right. Cool good answer and thanks for that so go ahead and share your screen and just however we can help whatever you want us to do or not do we're there but I'm really looking forward to this. Okay so the you know the main the main thing here that I want to do just to start as you know and this is in part just because I'm assuming that not everyone who's watching today was watching last week so I'm not going to be able to give you a full understanding of where we're sort of of how we're segueing from last week to this week but I just want to give kind of like a really sort of rapid overview of where we're kind of going you know so first we covered this slide here which just shows an example of how you can do how you can do perception as Bayesian inference and then and this is just an example a mathematical example of how it works but the idea is where you're you see this sort of two-dimensional image and you're this gray thing under gray disc under observation and you're trying to infer based on prior expectations whether it's a three-dimensionally it's a concave or convex shape and so the idea is is that in simple problems you can do that exactly by solving Bayes theorem so this equation up here on the upper left but in most cases that's actually intractable to do and so what you do in active inference is you instead define this thing f here which is called expected for energy and it's more or less a measure of the accuracy of predictions minus the accuracy of beliefs and combined with the complexity of those beliefs essentially how much you have to change your beliefs to to get a model that's accurate so by finding a set of beliefs that minimize expected for energy in other words ones that are accurate while changing as little as possible this is just an example of how by doing that by just kind of searching around for different qs's here different beliefs about the probability of states then you can find one that when you calculate it gives the lowest value for f and that's going to be the one that's as close as possible to the true answer that Bayes theorem would give you and so uh so what you're um I'll kind of skip this one but well so what you're trying to do then in active inference is you have some true thing going on out in the world true states and the observations that they generate um but then the brain has this model where it's trying to say okay what state must I be in given that I got this observation so the brain is trying to essentially come up with the representation that best matches what's going on in the generative process in other words the thing that's truly generating the observations um and so that's probability of o and s so states and observations and this pie thing is policies so it's what's the most likely action I ought to choose given that the world is this way and given that I prefer some observations over others um and what's going to be important for what I cover today um is the way that you actually set up these generative models um and the way that these are set up is you have these states s and you have these observations o um and then you have prior initial prior expectations that we just called d um and if you solve this right if you get some observations and you say given my generative model what states are most likely given those observations and given my prior expectations d then you get your sort of optimal belief about what the what the states ought to be given those observations um but in active inference we do this um in a way that's also dynamic um so for instance states at time one are going to transition into states at time two and so you also have beliefs this thing in B about what states are most likely to transition into what other states um and then um and this is kind of one of the unique moves in active inference is that policies so the different action sequences you might choose are just models as things that entail different transitions between states um so B matrix one for example here would entail choosing to move from one state to another state whereas some different policy would entail a different B matrix that means you transition to one state to some other state um and the choice of policy depends on G here which is the expected free energy which is a function of C which is your preferences um so essentially what you're trying to do is you're trying to find the policy which corresponds to a set of state transitions that will generate the observations over time that are going to be as consistent as possible with uh your preferences which corresponds to having the lowest G um and so and then there's some other sort of parameters up here that can say um add habits so E here can give you kind of an additional prior over policies that gives you kind of habits to do one thing over another it can have this beta gamma thing that also controls how precise or how reliable um the model expects uh the expected free energy estimates to be um so we realize that's a lot but we kind of covered this more uh slowly last time but what you really need to know um when we're actually putting together the code like i'm going to walk you through um this is what we're going to be specifying step by step we're going to be specifying what the observations are um we're going to be specifying the likelihood A so in other words what states generate what observations with what probability um we're going to be specifying D and B which are the priors of our initial states and beliefs about how states are going to transition um and we're going to define policies this pi thing which is going to tell us the different possible state transitions we could choose um and then we're going to find C here which is what the agents preferences are what observations it wants to get over others as well as the habits and uh this beta thing the expected free energy precision um so building a model and the code amounts to just specifying what all these things are what the state space is what the observation space looks like what A and D and B look like and what C looks like and maybe E and beta um and so if you want to know the exact equations for solving this um you know we've shown them here but again i'm not going to describe them in detail um but basically all they're doing is they're saying given my beliefs about state transitions so my priors and given A my likelihood what's my most likely posterior belief gonna be um which is just Bayesian inference right priors and likelihoods together equal posterior or approximate to posterior technically um but so that's what we're doing just make one comment on that slide right yeah yes so on the top left we have state inference given priors and given observations so that's kind of inference and we're going to be building on the strata to go from inference Bayesian inference to active inference so you go from the top left which is just perception at one moment in time that's static to the bottom left where you have states that are updating through time given priors and given observations you're updating your estimate through time then you go to the top right so that so the bottom left square adds in time now on the top right we add in this element of policy selection or control theory and then on the bottom right we get all the way up to this full active inference model which not just has a control theory element through policy but some extra parameters that help us as we're going to find out soon like e and beta so just to give that one more time because this is what's unique about active inference and this is the skeleton of the model that everything else is going to be scaffolded onto thanks Ryan continue yeah no thank you that was definitely a good uh quicker uh clear summary um so appreciate it um okay so and then again just to remind people the end goal of all of this is to learn how to put together um particular generative models right particular sets of a matrices d vectors b matrices c matrices etc um for um empirical so for behavioral tasks that uh participants can do um and then you can fit these models um to actual behavior um to learn things about uh participants for instance you might learn what their parameters are for a or for b or for instance what their beta value is um you know etc um so that's the that's the kind of ultimate goal here um but the kind of first step to do that beyond just understanding the structure I was talking about um is to know how to actually put that together in the code um and so last time I you know I gave a bunch of examples of past experiments that we and others have done where we have built active inference models for tasks fit them to data and um and learn things um for instance about differences in uh substance use and learning rates for example or differences in beta and decision uncertainty in between healthy people and people with depression and anxiety or substance use um or differences in um perception of uh precision of precision estimates for uh inter receptive perception for instance um you know so so I just kind of went through these examples um of specific tasks um and empirical results um just to show the kind of thing um that you can do with this in practice um so I'm not gonna re-go in but um and then the the task that we're modeling is one that's designed to be really simple but that also kind of showcases um the major important elements of active inference models um and so here what we're doing is we have a task where the agent starts in a start state and they can either directly choose to pull the left to choose the left slot machine or the right slot machine and if they do that they can either win four dollars or zero dollars depending on which machine is better the left one or the right one but they don't know ahead of time which one's better so they can either kind of take a risky guess and either get four dollars or zero or they can choose to seek information first to ask for this hint and the hint will tell them which uh the left one which of the left one or the right one is better on that trial um and if it's the left better context that means the left one will win 80 of the time and if the hint says the right one's better then the right one will pay out 80 of the time but there's a cost to taking the hint which is that if you take the hint and then choose a machine then you can only win two dollars instead of four um so that's what we're gonna be modeling um and so what we need to specify for that is we need to specify that there are two types of hidden states one is whether it's the left better or the right better context whether it's a left better or a better trial um and also the behavior states so choosing to move from the start state to the hint state or choosing to move from the start state directly to the left or right slot machine um so that's two types of states um we have to specify the observations or in this case the major observations are winning the four dollars two dollars or zero dollars um or in this case we're actually just gonna model you win or you don't win and i'll show you how that works um and um and we'll have to define uh the agent's preferences so essentially saying it prefers four dollars more than two dollars and two dollars more than zero dollars and that's going to be that c thing um so uh so that's kind of the general task structure and um i went through um you know the way to do this kind of simply outside of the code which is you define your initial state priors um you define your state outcome mapping you define your preference outcome preferences over outcomes um you define the state transitions slash actions um so b and then you define the policies which you define as this letter of v um and then the final thing i'll talk about before going into the code is that um in these models you can separate out the generative process so it's actually out in the world generating observations from the generative model which is the agent's beliefs about what's generating the observations um and if we want to separate those in the code then what we do is big d or big a or big b um those are the generative process um but if you also specify a little d or a little a or a little b then that's the generative model um so that's the agent's beliefs and if you specify those then they will be separate from the generative process and that's typically what you need to do that is what you need to do if you want to model learning which uh i'll show right just just two two quick points so first off um someone has asked will these slides be available or is there any resource where we can look at these slides because there's a lot of great information on them um i can make them available everything everything here is i mean almost all the figures that i'm showing um other than the actual task figure here are in the are in the tutorial paper and um everything i'm showing about how to actually build the models is is in the code and well explained in the code like i'll show um but i'm i'm also happy to to provide you know these slides i mean just just let me know um like i said the vast majority of them are literally just the figures in the paper um so uh so yeah i mean like i said i'm happy to make them available or send them um to anyone i could even just put them as supplementary additional supplementary material at the tutorial as well so i mean yeah just whatever whatever people want great idea and what a fun and accessible science we can have when we just receive questions like that and can put up our slides and then i think max there was a question that you had about the b matrix just while we're defining our terms what would you like to say about that all right so but just real quick uh for anybody who's just jumping in on the stream the link is is below the video you can go to their paper if you're looking for that supplementary code it's in the supplementary materials you can download all of the code from their supplementary materials there um the question i had was um one thing to touch on is as we walk through the code one thing i noticed is that we have kind of a tensorial structure for these matrices um where we could also specify a transition matrix that was that had like a 2d structure with you know with with probability of 0.8 0.2 but instead we have a tensorial structure where we have a fixed probability of one for transitioning from any given state to another state and i just thought uh if you wanted to touch on maybe the motivation for that i i believe it's related to how the model like basically interpreting the model and and and how the model gets used um is that uh i didn't know if you wanted to elaborate on that any yeah i mean i mean yes i mean so technically you know we call these things like that we call like the a matrix and the b matrix matrices but technically they're tensors so they're high dimensional matrices um and um so so and you know obviously go through this when we go through the code but because you can have different um types of states right so the state of being in the left better or the right better context um versus the type of state that has to do with you know where you move to or what you choose right the being in a start state versus the hint state versus you know the left or right bandit or yeah slot machine choice state um because you have two different types of states those are called state factors um and so essentially what you need to do is you need to specify these matrices or like i said really tensors that again this is for a that um say um what outcomes for each type of outcome will be generated for each combination of each value for each state factor um you know which makes which makes it um you know higher dimensional um which is probably the trickiest part i'll explain this as well as i as i can when we go through the code um but so but so that's that's that and also so simultaneously because you have two um types of states here that also means you have to have a b matrix a transition matrix for each um type of state um and when you when the agent doesn't have control about state about transitions for a certain state um then there's only going to be one of these b matrices um so for example here so this just says b for factor one which is what context you're in this is just a simple uh 2d matrix um although technically for reasons in the code you define it as a as a three-dimensional thing with just a one here this is just but that's just because the third dimension here defines the the number label for the action um so this is basically just saying there's only one action for state factor one and that is an identity matrix here which just means that for each trial it's gonna be the left better context through the whole trial right in other words within a trial it's not gonna like after they take the hint it's not gonna like switch on them you know so really the right one is better if they got the hint that the left one was better this is just saying if you start in the left better one you're gonna stay in the left better one the whole trial um and that's it um whereas for state factor two so the b matrices for state factor two um there are four of these which correspond to four different possible actions um so here the third dimension says this is action one which means that from any state which are the columns you can move to state one which is the first row um whereas the second one says if i pick action two so third dimension value two that i can move from any of the four states again columns two state two which is the hint state um and so forth for action three and action four moving to choose the left bandit or the left machine or choosing the right machine so these are going to be the four actions which are just the four possible transitions um and then when you specify policies this thing in v this is just saying for each state factor which of those actions can i string together so v for one here for state factor one is just a bunch of ones because again there's only one action it just stays in whatever state it started in um and each column here is a policy and each row is a time is a an action time point basically so this is saying so the first row here is what action could i take when moving from time one to time two and the third row and the second row here is what action could i take when moving from time two to time three and again each of these columns just says stay in ones because there is no possible action for uh changing the context um but policy the policy for a space for state or for um factor two here um so again the the actions um there's these five columns and this just says i can move to the start state and i can stay in the start state right so that's just action one up here twice um then the second column here says i could take action two so look at the hint and then uh and then uh move to action three which is choose the left machine or i could take the hint two and then choose four which is move to the right machine or i can go straight to the right machine and then go back to start or i can go straight to the or straight to the sorry three is the left machine or straight to the right machine action four and then go back to start. So in other words, you know, there's a bunch of other possible transitions, right, strings that you might think of, right, you might, for instance, go to the left machine and then go take the hint or something like that. But we're not allowing that. We're saying these are the allowable transitions. So these are the different policies or action sequences that can be chosen. So correct me if I'm wrong, but it seems that this tensorial structure kind of allows us to compress all those different combinatorial, you know, combinations that we might have. So it allows the structure of basically just allows the code to be a little more efficient. Is that at the end of the day? Yeah, that's a perfectly reasonable way to talk about it. So, okay, so that already gives you kind of a little bit of a start here. So now, and then in, so at the end of the day, what we're going to be doing is we're going to be specifying each of these things. So t, the amount of time steps, v, the policy space, a, the likelihood, the state outcome mapping, b, the transition probabilities or transition matrices, c, which is the preferred observations. This is preferred states here, but it should be preferred observations. D would be the priorities of our initial states. And then all these other extra little parameters that you can or cannot include. I'll talk about those later. But then we also specify this little D, which basically says the thing doesn't start out knowing the context. So it has to learn the context over trials. It has to sort of build up beliefs about D over trials. So with that, as a kind of quick background, I'm going to now kind of move into the code. It's cool that the whole thing is stored as one object with a bunch of fields, because it's almost like you could have ensembles of these different models, or you might have a bunch of ants in a colony. It really lends itself to being reproducible and transparent with what each parameter is and comparing a whole host of them alongside or different kinds of models. Just really cool to see that instead of just a tangle of parameters that you can get lost in, this is like a column of things that each has a very clear purpose. Yeah. So I should say, so you do, yeah, put all of these things into the structure that we usually just call little MDP, which just stands for Markov Decision Process. And then when you actually run the thing, then you run it through the VB, the SPM MDP VB underscore X code. And we've provided a version that just appends it with the word tutorial. And you make that big MDP, and then you can take, use sub commands like this that just generate figures that will show you the outcomes of the simulation. And these are example sort of simulation results. But again, I'll go into these after we go through some of the code, because to know how to actually use this stuff, you need to understand the code. So hopefully I can make this not too monotonous, you know, walking through code isn't always the most exciting. But the way that we have this set up is, you know, like I said, we have comments that kind of try best to explain what every little command means, you know, like what clear and close all mean, what RNG default means, you know, et cetera. I'm actually going to change this to shuffle. What that means is just that the MATLAB will use a random number generator that won't do the exact same thing each time. So we can get variable results when we run it. So we've provided several different simulation options that we can use after building the model below, which is what I'll walk through. So if you set the sim variable equal to one, then that will just simulate a single trial, which will reproduce figure seven in the paper. So that'd just be a single trial simulation. If you set sim equal to two, then that'll simulate multiple trials where the left context is always active. But the agent won't know that, or if the agent has to learn, okay, the left one is the one it always is. So after a while, it'll be confident and won't need to take the hint anymore. If I could just briefly interject there. So that would be, for example, an example of what we talked about earlier as parameter exploration. Yes. Well, Well, not exactly. It would be an example of learning the parameters over successive Yeah. So state, I mean, state exploration is a bigger thing here, right? Because either they choose to move to the queue or not to learn which context it is. I agree that, so yeah, the tricky thing is, is that when you only have two options, then parameter exploration would just amount to even though I've gotten some good results from choosing the left one, I might also instead try to choose the right one a few times to be more confident in what the reward probability is for the right one. But typically to get good dynamics like that, you're probably going to need something that has more like three arms, like three different options instead of two. If you're also going to include the state exploration thing with the queue, it would get a little more complicated. Like a standard kind of paradigm for doing parameter exploration would be something where you don't have the queue and the agent just has like three choices. And it just kind of has to choose different ones over time until it becomes confident which one is the best and then it'll just keep choosing that one. So the parameter exploration would amount to how many times kind of it chooses each slot machine initially, just to figure out which one is best. And a standard reinforcement learning agent will do that differently than an active inference agent. And the substance use study I mentioned above actually specifically did that. But but yeah, so it's one question here in this sim equals 12345, there are like scenarios that you're going to be able to rapidly plug and play. So scenario one is like a static or scenario two is multi-trial. So this is what is going to allow reproducible construction of the figures and understanding of how different parameter changes influence behavior. But that's what we're laying out right now. It's kind of like the scenarios for the agent, right? Okay. Yeah, I don't know about a quick is one thing. I think sim five would take about half an hour to run. Yeah, don't do it. Sim five we'll talk about with when we talk about parameter fitting, which will be a different session. But mainly we'll be talking about sim one, two and three here, which is has to do with perception and learning, as opposed to the real empirical stuff, which is estimating parameters and using them for group analysis and things like that. Well, just so helpful though, how it's laid out from the most static straightforward, perceptive inference to action in the loop. And then now that actions in the loop, we're going to think about the simplest setting to explore action in the loop on a single trial, unchanging. And then we're going to keep on adding layers of ecological variability or other features into the model. So thanks for making it so stepwise. And that's why it's a step by step tutorial. Yeah, yeah, we tried. But that's the last thing for today. So if you set sim equal to three, then that will simulate a reversal learning situation. And that's where basically the left context is active for a certain number of trials. And so the agent should build up a prior expectation that it's just the left one every time. So it'll stop taking the hint and just start choosing left. But then it switches. And so now the right context is the right one is the correct one. And you see basically how long it takes before the agent again says, Okay, no, I need to start taking the hint again. And then eventually, when it becomes confident to just choosing the right one, directly and not needing to take the hint. So it's kind of like a standard reversal learning kind of task. So those are the three that hopefully we'll get through today. And there are several parameters that can be kind of messed with. But the one that I've focused on mainly here, at least we can set in the beginning is this RS one thing, which just turns into RS later down. But that stands for risk seeking, or it could also be reward seeking. But more or less, the bigger this value is, the stronger the agent's preference is for winning as much money as possible. So it's this is the value that's in the C matrix that defines preferences over winning versus losing. So the bigger this number, the the more the agent is going to say, Okay, I care more about reward than about information. And therefore, I'm just going to take the risky choice to go for one of the slot machines directly. So the higher this less information seeking, the more the lower this is, the more or the more information seeking the agent will be. And so that's just kind of a really simple one to show how by tweaking parameters, you can get different results here, just allowing you to weight how strongly reward value component is in the expected free energy versus how strong the weight is of the information seeking component of the expected free energy. And then kind of ignore this peb thing down here. This is basically you can turn this on if when you do sim equals five, you want to actually do group level Bayesian analyses on the parameters. This will also save the outputs of some like Chris said, sim five like takes like a half hour to run. So if you turn this thing on, it'll also save the results of some five so you can run this peb thing without having to redo some five every time. So we just figured hopefully that would help. I just I just want to jump in and point out that it's really nice in this part where, for example, with ourselves set there. Now let's say that I'm designing my experiment and I need to know how many trials and how many subjects do I need when I'm doing my power analysis. I can run a simulation like this, set that parameter at what I think might be a reasonable range of values from the literature. And then I can say, well, this is the expected difference in my groups based on what I think this range of values might be. So it's a really powerful tool to be able to plug and play a single parameter like that. Yeah, no, it's nice. I mean, typically with experimental, you know, with experiments, what you do want to do first is you want to kind of see, hey, can tweaking different parameters in the model produce simulated behavior that looks a lot like the type of behavior I actually see in real participants. You know, so yeah, so exactly you'd be hoping for is, hey, like there's a chunk of people that look like that act as though they have a high risk seeking value and another group of people though they have a lower risk seeking value based on their behavior. You can fit the model to that explicitly and then you can say, hey, is the difference in the RS value between these two groups of people? Does that predict anything interesting? You know, does that tell you something about, you know, how well they're going to respond to like drug A versus drug B or does it tell you something about, you know, their general cognitive ability or, you know, their current emotional state or, you know, like you can do all that kind of stuff, which is the ultimate goal of all of this. But so, okay. So the first thing we need to do, at least in the order I have it set up, is to set T. So T is the number of time steps in the trial. So in this case, there are three time steps, remember, because the agent has to start in the start state and then move to the second time step, choose the hint state, and then third time step, choose one of the machines, or it can go straight from the start state to one of the machines, but then at the third time step, it'll just go back to the start state. That's how we have it set up anyway. So you need three time steps. So then what's typically easiest is to first start out by specifying the prior expectations over states. And so in the, in big D, which is the generative process for factor number one, which is in the braces here, we just specify this as a column vector. So a row, and then this means that it's transposed and becomes a column, this apostrophe thing here. And so we set this as one and zero, which just means that it's, with 100% probability, going to be in the left better context. And then for the second state factor, so this D2 thing here, we set it so the agent always starts out in the start state, so a one in that first entry, and then zeros for all the rest, because it's never going to start in the hint state or in the choose left state or in the choose right state. So we're just saying absolutely the agent will always start in the start state. But then, and this is all explained, but then for little D, so this is the agent's actual beliefs, we say the agent has no idea what context it's going to be to start with. So we set these to values that are flat. So where each, the left better and the right better context have the same values. But notice here that we didn't set these up to add up to one, like we did for big D. And the reason is, is that these are technically what are called Dirichlet distributions. And Dirichlet distributions are basically they say the smaller these numbers are, the less confident the agent is in its beliefs. So if these are 0.25 and 0.25, it thinks it's a flat distribution, but it's really unconfident, whereas this thing was like 20 and 20, then it would also believe it's a flat distribution, but it'd be really, really confident that its beliefs are about a flat distribution. And to explain that better, we're actually going to transition over to Chris, who's going to give you a little bit of a kind of more general explanation of how learning works here, since you actually do need to understand this now. So we can transition over to Chris here briefly. Yep, because usually a probability, people would imagine it sums to one, something has to happen, you know, the coin has to go heads or tails. So it has to add up to 100% has to add up to one to be a classical probability. But this is a little bit of a tweak on that, some similar features, but some unique features. So thanks for switching over there and helping us understand it. Yep. So can you guys see my screen? Yes, we don't see you, but we see your screen. All right, go ahead. Okay. Okay. So as Ryan was saying before, essentially, our likelihood, what we were talking about before, say we're looking at the A matrix, there is technically speaking a Dirichlet, so all of the distributions will be categorical distributions, our D's, our B's and our A's. They're all categorical distributions. And they all have a prior or a distribution over the parameter space, which is a Dirichlet distribution. And so just kind of the people who are technically like in the know, as it were, a categorical distribution is just a special case of multinomial distribution. And a Dirichlet distribution is what's called a conjugate prior for the categorical and the multinomial distributions, which just means that if you multiply these two things together these two distributions, you get out the distribution that has the same form as the prior. And that's really important if you're doing like empirical Bayes, for example, where the prior or the posterior after round one serves as the prior for in round the second round of inference, that kind of thing. And so just to briefly go through this, so you guys can see my cursor, right? Yeah. Yeah. Cool. Okay. This is the categorical distribution where X is some categorical variable that occupies one of one decay mutually exclusive states of the world. So that is to say I have a distribution over a grid world where my agent is my agent can't be in two places at once. It's either in like it's either left or it's right, for example. And theta are just the parameters of that distribution. And so this function here, this thing sticking out the front, it's this normalization constant we can adjust. It just counts the combinatorics of the distribution and assume and ensures that sums to one. Exactly the same thing here with the Dirichlet distribution, although it is a distribution over the parameter space of our likelihood, right? And so Dirichlet distributions often kind of colloquially called distributions of distributions, because they are a distribution over a vector that has to sum to one. And then what's really beautiful about these two distributions is that when you multiply them together, and we'll just ignore the normalization constant, because things get pretty messy, but you end up with a distribution that it has the same functional form as the prior. It's also a Dirichlet distribution. So our posterior distribution over the parameters of our likelihood is also a Dirichlet distribution. And all that changes when we go from the prior likelihood is we add this little alpha thing here. It's called a count. So we essentially just add a count to whatever it was that was observed, to the variable that was observed. So I'm just going to go into a little bit more detail on the next slide. Is there anything you want to add to that, Ryan? No, I mean, at the end of the day, and Chris will cover this, I'm sure, is that while this might look really complicated to somebody who doesn't have a lot of math background, it really just amounts to they're being like what I showed you with D. They're being a number that's assigned to each entry. And when you observe something, that number will get bigger. So that just means that, like what I said, that, for instance, if I believe that I was in the left better state the last five times, then I'll add like five numbers. I'll add one five times to that left one. So now it would be like 5.25 as opposed to just 0.25. And ultimately, those numbers get softmaxed. So they just get normalized and turned back into a probability distribution. But like I said, at the end of the day, you're just adding counts each time you observe something. Yeah, this is just like how you might be a little bit more confident about the true ratio of different kinds of colors of balls in an urn. If you had three of one color and then one of the other color, you're not quite sure. If you had three million and one million, you'd be very sure. And then Max, you had a question. Yeah, so just going to the other end of the spectrum, then, if I'm somebody who's in probability theory, and I'm talking about a Dirichlet distribution, and I'm talking about the support for that distribution on the open interval 0, 1. And as we can see here from these equations that are provided, that if I go to 0, then I'm basically going to end up with a, since it's multiplying each time, then there could be problems. So in practice, however, my question, not being very experienced with using these things, in practice, when we implement this, does that have much of an effect on any of the estimated parameters? No, but one thing to say is that because you always work with log, you were always in log space, we end up, once you just have basically a little numerical trick where all of your, to make sure that you never have a log of 0, we add e to the negative 16 to all the entries. Does that kind of answer your question? Yeah, so for example, if I see, at the end of the day, I might have a converge parameter that has apparently a very low log likelihood of, you know, when I converge on that estimate, is it, you know, maybe part of that could be related to, okay, there's sometimes where I'm entering in a state that might be unlikely to occur most of the time, or maybe it wasn't observed in this particular set of experiments, but the rest of it kind of holds up. Is that maybe one way to think about low, low values when I converge to my parameter estimates? I'm not sure I totally understand the example just because it's a bit a little bit abstract. Could we give a, how about the slot machine example? It's like you go into the casino, you've never done any of the slot machines. It's like you haven't had much experience. So even if you're not sure about which one's better, it's like, you're not sure overall. And then as you try more of the slot machines, just like Ryan was saying, you increment up observations, so to speak, in those columns. So after you've made 1000 slot machine visits, you're a little bit more confident than after you had made 0 or 1 or few. Is that the case just to bring it to the example that we're going to be? Yeah, or I could pull the slot machine three times and get only one result observed and I would never have the complement to that result observed. And so then effectively my probability distribution is then 1 and 0 on my posterior. So if I end up having like weird looking numbers on my converge parameter estimates in terms of the log likelihoods for the different things, it's kind of maybe just relates to this particular property. But at the end of the day, what it sounds like is that I'm still going to converge to the correct values using this approach. That's just like kind of an insurance. Yeah, I'm not sure, honestly, about if there are any formal results that relate to this. But in practice, because we always basically the categorical distribution is just and it just ends up being a softmax function over the concentration parameters. You'll always have something that's between 0 and 1. And the concentration parameters are basically will always kind of be somewhere between 0. It's never been observed. And whatever it is, it goes to however high your of any experiments you run or simulations you run. I've never encountered a problem with this. But then again, I've never actually really looked into possible consequences of that. So I just don't know. I'm sorry. Ryan, do you have anything to say? I mean, the only thing I would say and yeah, and maybe maybe I misunderstood the question. But I mean, if it has to do with whether or not you'll there's certainly no guarantee that you'll converge on the on the true generative process. There's there's lots of cases where things can go wrong and you'll get stuck in local minima and things like that. We actually show several examples of that in this structure learning paper, where we show how you can use how you can use active inference for structure learning and like Bayesian model expansion or state space expansion and state space production and things like that. There's there's lots of cases where you'll end up with the wrong likelihood. So so it's not. Yeah, it's or maybe in my mind, just a caveat would then be, you know, if it going to that example with the posterior distribution, I only have pulled the or picked the machine three times. When we're designing our experiment, it might be important to consider setting it up so that, you know, we're not going to get into the case where we only have a sample event of three, because obviously that doesn't sound like good science to anybody anyways, but it's just kind of the oversimplified case where as you add more categories, you could run into that scenario for one of your particular categories, which could cause some problems. If there's a hundred kinds of cereal and you only let people choose one time, you're not going to get a good estimate. It's kind of like a sample size and an experimental design question. So just to kind of pull back for those, because that's that's a really interesting question about where there's provability or how different sample sizes are related to each other, but we're just thinking about the structure for how additional observations are going to increase the kind or the quality of this estimate of a parameter. Yeah, and you'll see in what Chris shows you that basically the lower the concentration probability values are generally, the more the expected of free energy will be weighted toward choosing like parameter exploration policies. So it'll be a thing where if it's never chosen the 99th cereal before, it'll be driven to go choose that one just because it wants to be more confident in what the actual parameters are. But anyway, Chris, yeah, go ahead. Yeah, so we actually don't have any numerical examples in the paper where we kind of have show the free energy values for different parameter values. We only kind of have state exploration terms. But there are, I mean, maybe that's something we could add to the paper actually, that would be something interesting. Yeah, cool. Yeah, okay. So essentially, just to kind of give a recap, we just have our probability distribution over A or the parameters of A is just a Dirichlet distribution over kind of over A. And then so we just end up with a matrix that has the number right where the rows are our possible observations and the columns are hidden states. And literally all that we're doing this, so this little term, this little kind of cross inside a circle is called a Kronecker tensor product. Actually, is that goodness? Is that correct? I actually don't know. That's the notation. Cross product. It's in the SPM textbook. That notation is used for the Kronecker product. Yeah, thank you. Okay, good. Sorry, I just had a brief mind blank. I was like, Oh, God, anyway, so in this Ater thing here is just a learning rate. And so whatever it will just multiply essentially this here. So say I'll just give an example. And we'll just add that to our previous values of whatever these A's are. So just very simply say we have our posterior of states is something like 0.7.3. And we receive observation. So this is a transpose vector. So this corresponds to row two. This all that this would do is we're literally adding a value of 0.7 and 0.3 to this matrix. That's all that learning is really. It's very simple. And then both of these things will get both of these columns will get passed through a softmax function at the end of the day. And just kind of give a give an example of what we're talking about before. So because they're both softmaxed, all that softmax all that softmax cares about is that the difference between the two numbers in all the two elements of the vector. So 51 and 50 and two and one the same same value in the softmax. But in one case, the model is incredibly confident. And in the other case, it's not confident at all, for example. So if you had an agent that was actually doing parameter exploration, there would be no the agent would have no motivation whatsoever to explore the parameters in the first case of 51 50. But it would be quite motivated in the case of two one, for example, even though the likelihood distribution ended up getting out of that is the same. Okay, good. So that is actually, I think all that we really had that's all that I'd prepared for this is because it's really is in practice. It's so simple. So back to you, Ryan. Okay, sorry. Okay, so anyway, with that as a little bit of the background, all we're all we're doing here is we're saying to start with the agent believes it's a flat distribution. It has no idea which one's going to be better left or right on each trial. That's in little D, because that's our model. So each time it believes that it was in the left better context, it's just going to keep adding, you know, numbers to this 0.25 here. And those will be scaled by that learning rate thing that that Chris showed, where, you know, if if learning rate is one, then each time it observes the left better context, it'll just add a one. So it'll become 1.25 and then 2.25, etc. But if the learning rate was like 0.5, so it was like lower, then it would add 0.5 each time. So it'd be 0.75, 1.25, etc. So the learning rate just scales how big the counts are that get added. And the learning rate could be zero, at which case there wouldn't be an updating just thinking about just what could the whole range be from not learning agents to very slowly incorporating new information using this Dirichlet framework that Christopher just described, all the way up to just ridiculously high values so that it just, yeah. I should just highlight as well. I think there's this is, I just kind of want to, this probably isn't a good point for discussion, just to keep things moving. But one thing that's worth highlighting is there's a genuine kind of debate, or maybe debate's a bit too strong a word, but discussion to be had about whether we should actually be modulating, modulating learning rate, given that the model is Bayes optimal, or approximately Bayes optimal, one would assume that there should be a way of modeling things without actually changing the learning rate from one. But in practice, I'm not sure if that actually works, but anyway. Yeah, so I mean, technically, technically, a learning rate of one is optimal. But for, for when you're fitting behavior, when you're fitting the model to behavior, most people are not optimal. And so you have to estimate, typically, the learning rate for each person. And oftentimes, it's quite a bit below one. But that's actually an interesting attribute. And again, it's an example about how using these generative and generalizable frameworks help us compare not just who could remember more numbers or who had a faster reaction time, as far as just observable characteristics, we're moving a step back and asking whether people might differ in their learning rate or in other attributes of this model. Yeah. And there is also, and again, I mean, this isn't something to really get into so much, but there is effectively a learning rate that comes out of the precision of your posterior beliefs over states. Like for instance, if your posterior beliefs over states are really imprecise, like say it's like point, say afterwards, you know, this is like 0.6, 0.4, then whatever observation that you get is just going to add counts of 0.6 and 0.4, which means that you're not really going to build up strong priors, you know, a straw of a more precise prior for one over the other because you're not confident about which state you're in. Just a quick example of that, you go into the casino and you're drawn a card and you can't read the card or the number is rubbed off. So if there's no precision at the sensory level with which card you drew, you're not going to be updating your estimate of how likely different numbers are. And then if there's a perfectly legible card, so it's 100%, you have no error about which card it is, there's the secondary question about how much you update your internal model, but it's not a precision about the perception question. So it's really a nice point and it's worth understanding that you can have uncertainty about observations that then is captured by this model, but also even in the case of precise or even totally observable scenarios, there's additional uncertainty related to learning. Yeah, I mean, again, there's still potential problems. I mean, what tends to happen in these models, especially if learning rate is one, then like the agent kind of becomes too confident too quickly, or a better way to put it is they once they've like observed a certain number of like, lefts, for example, becomes really hard for it to unlearn that, which is different from what humans do. Humans tend to infer that there's a different context. They're like, okay, no, now it's the, now it's a new context where right is better. So it doesn't necessarily unlearn the original counts. It instead sort of infers, now I'm in a new state and I need to accrue different counts for that state. And that requires more complex models, which can also be built. Just a couple, couple little notes that that's kind of like humans using narrative to help reset local parameter estimations like, oh, well, now it's a new day. So now things that were unlikely yesterday, now it can be likely today. And also it relates to this question about the optimal learning rate in a mathematical or an analytical framework, maybe one or some other number is quote optimal, but then in a realized ecological setting, it might be actually not the best solution in that setting. Yeah. So I mean, yeah, typically people talk about this in other literatures being like latent cause inference or like inferring the latent causes behind things. There's different latent cause. And it's kind of the same thing as saying there's a different context and therefore a different likelihood. But, but yeah, so, so I'll just kind of, you know, go through this here, you know, kind of step by step. So, you know, so I, so I'm going to set up, right? So I have, I have my big D and my little D here. And then the next thing I'm going to do is, and I'm doing this a little bit, you know, condensed is to just set the number of states here for each state factor. So that's going to be the length of D one and the length of D two. So this one's going to be the two contexts. So there's going to be two states. And this one's going to be the length of D two, which is going to be four states. So at the end of the day, this NS thing, if you look on the right, is just going to look like that two and four. Just one note there for those who are unfamiliar with MATLAB. So at line 148, where NS is, you put a little red dot, like a stop sign. And then you hit run, which ran everything that we had just discussed from a clean slate up to that line. And then it halted. So that's what allowed us to actually instantiate all these variables we were just talking about. And then you stepped just one line at a time so that you could run this line 148 with all of its prerequisites fulfilled. So it's kind of like a breakpoint in the program. We took a breath and then we ran one line. And so now we're going to go step by step. Yeah, it just allows you to kind of see what it's doing step by step, which is, yeah, it's helpful. So sorry, I should have said what plays and stuff like that. So so now that I've specified that I can put this little for loop thing, which just says, for that for i equals one to the number of states for factor two, I'm just going to make it all the no hint. This is just saying, for all the matrices in a one to start with, everything is just going to generate the no hint observation. So it will end up just looking like this, where so I just have, so this is just saying, and I should say that the columns correspond to the first state factor. So left context, right context. And these third, this third dimension here, one, two, three, four, that's what corresponds to the second state factor, which is the behavior. So this is saying, when I'm in the start state one, each context is going to generate the no hint observation. When I'm in the hint state factor, it's going to generate the no hint observation and so forth. So I'm just starting it that way for ease to say that they're all that way. But then what I'm going to do is I'm going to define this thing PHA, which is the probability of the hint being accurate. And again, this is just kind of to make things a little bit concise and convenient. And then I'm going to replace that A2, because remember A2 means I'm in the hint state. And so I'm going to say when I'm in the hint state, column one in column one here, I'm going to observe the machine left hint with some probability if I'm in the machine left state, which is the left column. And then one minus that for the no hint state. And again, if I'm in the right, the right better context, so the right column here, then it's going to be the reverse. I'm going to observe the right machine with some probability. Just one note on that, Ryan. So people might look at this and wonder, why didn't you just say hint or no hint instead of this one minus one equals zero? And this is because as you've had it, it generalizes two cues that are less than 100% accurate. So then you could have the probability of the hint being accurate to being 90% or less or some other value. And so if you have a probabilistic learning task, you can model it with this exact framework just by changing PHA equals point something rather than reimagining the entire matrix multiplication, which is really the hard part. Yeah, exactly. So, so if I step through that, then in this case, I've set PHA to one, which just means the hint is completely accurate. So now if I look at a one again, then it's going to look like this, where all the other behavioral states, so in the start state, in choosing the left state or choosing the left machine and choosing the right machine, they still generate row one here, both of both context due because the no hint. But if I'm in state two, which is the hint behavioral state, then if I'm in the left machine, if I'm in the left state, then I will observe the left is better hint with probability one. And if I'm in the right column here, the right context is better state, then I'm going to observe the right hint observation. So the bottom row here with 100% probability. So in other words, if I observe this thing, then I'm going to know for sure that I'm now in the left better context. But if again, if I made this like 0.8, 0.2, 0.2, 0.8, then that would just mean, there's some probability that the hint's going to give me the wrong answer. So that's all that is. So remember, the first state factor is the columns. The second state factor is going to be the third dimension. That's kind of key. So now what I'm going to do is I'm going to define the second outcome modality. So this is another thing that's important, is that the number here for A that's in the braces, that corresponds to the outcome modality. So the first set of observations I can get is no hint, machine left hint, and machine right hint. So that's one set of observations that we call like an observation or an outcome modality. Now for A2, the second observation or outcome modality, the possible observations are null, loss, and win. So what this is saying is for the first two behavioral states, for i equals one and two, the, so being in the start state or the hint state, is always going to generate the null observation. So it's not going to generate a loss or a win. Now, after that, if I step through here, I can set what I just called P win. So the probability of winning. And I set that to be 0.8. And this is the part where the actual reinforcement or reward learning comes in. Because what I can do is I can say, okay, if I'm in the left, choose the left machine state, so 3, then the probability of winning, the probability of losing, is 1 minus P win. And the probability of winning in the right one, or the probability of losing in the right one, if I choose the right one, is high. Which makes sense, because if I'm in the left better context, then if I choose the right column, if I choose the right machine, then I'm most likely going to lose. In contrast, if I'm in the right better context, and I choose, if I choose the right machine, and I'm in the right better context, then the right column here should generate a win with high probability. So this is just saying, if I choose the right machine, that's the four, and I'm in the right better context, which is the right column, then I have a high probability of winning. And vice versa up here, if I choose the left one. Again, just to talk about how this is the minimal example, we're talking about two options with the two slot machines, there could be more options. We're talking about how much it costs to get the hint, and how much it costs when you win, or how much you receive when you win, that's a parameter you can change, how accurate the hint is, you can have a perfectly accurate hint that sends you to the slot machine that wins 100% of the time. So the probability of winning can now be changed. So then you could say, okay, I'm imagining a situation where there's a perfect messenger with a hint, and then it's free to visit that person, and then it's 100% likely which slot machine is the winning one. Or you can imagine these more gray zone scenarios where different things are associated with each other in a less direct way, or there's more than two options. Just for those who are like seeing this one example for the first time, this is kind of like the tip of the iceberg of a big, big family of different kinds of models that have a lot more options and a lot more nuance. Yeah, I mean, you could have, you know, 20 outcome modalities and 10 state factors if you wanted to. It would just mean that like, there'd be a fourth dimension for the third state factor and a fifth dimension for the fourth state factor. And, you know, it'll just get really big, really fast. But so now that I've done this because I set the probability of winning to 0.8, given that you're, you know, in the correct context, then A2 will look like this. So this just says, if I choose, if I'm in the start state, I don't observe a, I never observe a win or a loss. If I'm in the hit state, I never observe a winner or a loss. If I choose the left option, the left machine, then I will win with 0.8 and lose with 0.2. And the opposite for if I choose the right one, right, I'll lose with 0.8 if I choose the left machine, and I'm in the right context. And if I choose the right machine, so state 4 here, then the probability of winning is 0.8. So the third row here is winning, right? And 0.2 and vice versa. So this is just defining the probability of winning given your choice, given whether it's the left better or right better context. And I know that's mouthful, but yeah. So then finally, and you know, this isn't really all that practically interesting, but it's important to include is, is that what this just says, so outcomeability 3, is just the agent's own behavior. So what this means is every single time it makes a choice, it observes itself making that choice. And all that does is make the agent completely confident about what it did. But you could imagine if somebody weren't aware of their own decision making behavior, you could imagine that might lead to decision making that's not in line with what you would do if you could observe your behavior. Yeah, so you know, I mean, if you if you get rid of this, then in some cases the agent, yeah, won't be all that comfortable about what he did. And so therefore it'll be hard for him to or to learn what actions were the right one, because we won't know like what it chose basically. It'd be like choosing a slot machine blind. I just want to jump in with a question about my motor systems context. So for example, if I'm studying sensory motor integration, and I'm measuring motor cortex signals from neurons in motor cortex, and I'm measuring outputs such as the kinematics, the kinetic forces that are generated, and the EMG signals in the periphery, I could then use that context to say I could specify it as either one one or as zero one, for example, if I didn't think that it was capable observing its past history or its its decision states in the form of some kind of a sensory blockade related to my experiments. Would that be a practical application? Does that have face validity in this context, given that I'm measuring those signals? One thing that just collected before I generally speaking in motor control domains, you're working with like continuous quantities, right? Right. And that's why you use things like Kalman filters, etc., as your models. I'm not sure to what extent a partially observable Markov decision process would actually be the appropriate generative model for the situation. I think it would be, if you want to model like the decision-making processes for motor control, so am I going to move my arm to the left or the right? That is a discrete decision, but actually measuring motor control and a lot of those things that you were talking about, those continuous signals, I'm not actually sure POMDP is the appropriate generative model. But for example, if I thought that I had a state space that consisted of a fixed point or any kind of, however I'm abstracting those continuous processes into the realm of a discrete state, just glossing over that, this kind of model could maybe account for that? Or be used in that way? In any case, where a signal is technically continuous, to use these sorts of models, you're going to have to just pick some quasi-arbitrary or maybe just by in some sense a way to kind of bin time, right? You have to discretize time in some way. The signal was X for these three milliseconds and now it changed to Y, these next three milliseconds. That would be some kind of funny average or something like that. I guess the potential concern here is also that there has to be some way to link the states as you're kind of binning them to some sort of policy selection process, presumably. So I'm not completely sure. At least I can't imagine how it would work. It's almost like this one is about the agent deciding which way to walk and then maybe there's some appropriate set of feedback. But let's think about what this decision-making example is and then wonder about where the motor and where these other modalities and the continuous signal processing also interesting. Let's just walk through this discrete time, discrete opportunity space model and then we talked about like an active inference stream 8 without a chance about what does it look like to move active inference from a discrete into a continuous space. So it's the kinds of things that are actively being developed and it sounds exciting about the motor example, Max. Yeah. I mean, so there are and Thomas Parr has done a bunch of this like where you use mixed models where the top layer is a discrete state space markup decision process model like this, deciding left arm, right arm, but then it feeds into a continuous state space below that actually controls the dynamics of the movement itself based on the decision. My guess is it would be more something more like that would be better. That being said, these sorts of models are much more about behavior and choice. I mean, when you're talking about neural responses, I would think that would be more useful in these models to use them to predict the neural responses that you would get when a person does X versus Y and see if that matches with the firing rates that you see. There's a paper on the mixed model thing. There's a paper. I think it's called like discrete movements to decisions and back again or something along those lines. It's in neural computation. I think it deals with all the formal issues that kind of rise with this. Yeah. Thanks. Okay. So just for the sake of time here. So again, what I did here and this is for the agent observing their own behavior, all I did was I made it like this. So this is just saying when the agent chose state one, then it observes the observation associated with state one, row one with 110 probability. When it was in the hint state, then it observes that it was in the hint state row two with 100% probability and so forth. So it's just providing maximally precise evidence so that it knows what it did. So now notice that this was big A. So that's the generative process. If I wanted to, if we wanted to do reward learning, if we wanted it to learn P win, what the reward probabilities are, then we'd have to set little a here. And you might start out just sort of making big A, making little a equivalent to big A to start with and multiplying each of the values in little a by a big number like 200. And what that does is it just makes it so the agent is really, really, really confident about its beliefs in A that are associated with big A. And what that does is it just prevents learning for anything you don't want it to learn. So if you multiply this thing by a really big number, then it won't learn anything, even though technically it's learning because it's just already too confident. But then what you could do is for anything that you do want it to learn, you can just redefine those with really small numbers. So you can say A3 here, so the probability of choosing, of winning, choose if you choose the left machine, is 0.5, so just 0.5 is all around, or 0.25 is or whatever. So this is just saying that the agent would start with completely imprecise beliefs about whether it would win or lose depending on what it chose, depending on context. So this is what you would do, you'd kind of turn on little A if you wanted it to learn the reward probabilities, but we're not doing that here. So we're just showing this as examples. And same thing if you wanted to, you could do the same thing, but for outcome modality one to have it learn like the accuracy of the hint. So again, these are just kind of common examples if you wanted to do that kind of thing. So now we've defined the likelihood matrix rate, the A matrix, like what states generate what outcomes with what probabilities. Now we're moving on into B, which are the transition probabilities. And remember that there's one of these per state factor per action. So this is what I showed before that for B1, so for state factor one, the context, there's only one action quote unquote, and it's just an identity matrix. This just says the left better context never changes within a trial. And again, I showed this before for the second state factor, the transitions, there's four different possible transitions transitioning from state one to any of the other states. So just to be clear here, these transition matrices, the columns are the states you're in now, and the rows of the states that you would move to. So this is saying from any state, you could move to state one. You know, from any state, you could move to state two, etc. So you end up having, you know, one, two, three, four different actions for that state factor, four different possible transitions. So now the next thing that we do is we move to the preference distribution. So that's C. So this is what the agent wants or what the agent finds rewarding and how rewarding the agent finds it. So to do that initially, we just, with NO, we just say what are the different number of outcomes for each outcome modality. And that just looks at the size of the row dimensions in A. So if I do that, and again, this is just using this size function, which is just, you know, again, to make it more convenient or generalizable. So if I do that, then NO is going to look like that. That's just saying I have three, I have three different outcome modalities. One is no hair, null, no hint and hint. One is null, lose and win. And one is the agent observing its own behavior, right? Observing action one, two, three or four. Can I ask something? So many people have asked in the past weeks about dimensionality of not to go into the Markov-Blanket discussion, but the dimensionality of observations and how we talk about different kinds of outcomes, different kinds of senses, for example. And it's almost looking like it's just about the way it's specified. So the hint could be in your ear and the observation could be visual or something. And so you could get philosophical and ask whether there's really one or two different sensory modalities. But in the context of this inferential model that we're writing, it's almost like those things don't really matter. They're just variables that are observed. Yeah, I mean, so again, we call these modality, like outcome modalities. So in this case, there's three, right? So like you said, the hint, this could be the auditory modality and there's three things that could hear. This could be the visual modality and it's the three, whether it wins or loses or hasn't seen that yet. This could be visual. And this fourth thing, the third one could be proprioception, right? It could be observing or feeling what it did. I mean, in this case, it's probably also somewhat visual, right? It observes what it does. But you get the point. So yeah, so there's just a factorized dimensionality for each modality. Yeah. But okay, so now what I've done is I've started out by just putting zeros in the C matrix, the preference distribution for each outcome modality. So this is just saying for outcome modality one, so the hint, the agent has no preferences. And I should say the rows are the observations for the null hint, no hint. And the columns are time. So this is saying at time one, the agent doesn't care whether it gets the hint or not in terms of rewarding this. Same thing at time two, same thing at time three. And I've just done that for all three outcome modalities to start with. The only thing that we want to add preferences for is the win-lose observations. So in this case, so that's C for outcome modality two, which is the win-loss thing here. So what I've done again for generalizability is I've set this parameter LA that is called loss aversion. I've set this parameter RS, which is that reward seeking thing, which I just defined above as RS one. And again, that's just convenient, so I don't have to go down here reset it every time. So I define these just for, again, just for convenience. And then what I say here is, so the observations again are null-lose-win, so this is at time zero, the agent doesn't care whether it observes a null loss or a win. And that's just because it can't observe anything but a null. At time two, it disperfers losing with a value of, again, negative one here for both time two and time three. But here's a little bit of a trick is that I define the preference for winning as RS at time two. And remember that RS is four right now because that's what we said it above. Then at time three, I set the preference as RS divided by two. So what that ultimately means is that this state factor will look like this. Or I mean, sorry, this preference distribution will look like this, saying it dislikes losing equally at each time, but it gets more reward for winning at time two than at time three. And so that's our preference distribution. And that's basically it. This is all in how the details of how the sketched out scenario, okay, four dollars, but then you got to wait 20 minutes. And however your scenario is, this is about the detail in matrix form of how the payoffs work. And so it could be minus one or you might ask if another number is appropriate there. But it's kind of like, that's the detail level, that's the applications. And that's where we hope to see a lot of people exploring different scenarios so that we can understand better. Yeah. And so, and so you'll see that when we actually do the like fitting like parameter fitting to behavior, you know, one thing we'll do is we'll fit this, right? So for some people, RS might be like eight. And for other people, it might be like two. So you find this value for each person, the best explains or reproduces their behavior. I mean, you gave that example last week, right? In the substance abuse example, where that actually has really meaningful or that parameter has a really meaningful interpretation. Yeah, exactly. Because it has to do with, in the case of an explore exploit task, the lower RS is basically means the more people, the more like someone is driven to explore, you know, before they kind of jump to conclusions about which one's the best and just keep choosing that one. It makes me think about maybe in the substance example or another example, it's like, if we're going to have a society that rewards good behavior and punishes negative behavior or something on some issue, do you go with, here's how much you get positive and here's how much the punishment should be, should be small punishment versus large, these kinds of questions, which are really important at the individual and the group level, not that this is even close to the answer, but it's almost like it's a way for us to start talking about how much do we value and how much do individuals need different blends of these different attributes. So really a cool way to talk about this. Yeah, absolutely. And so like, oh, so one thing that I did want to clarify here is you'll notice that these are supposed to be probability distributions, right? See, you know, each column here is supposed to be something that adds up to one and technically we work with log probabilities. So, you know, negative one and four and negative one and two obviously don't, aren't probabilities or log probabilities. So in practice, these each column here is softmaxed and then each element is logged, which is how you end up with the actual log probability distributions that define the preferences when they're actually used in the code. And I show that, I show an example of the numbers they turn into in the, in the, or we show them in the paper. So, you know, last kind of really central thing is to define the policies. So here, I'm just going to use NP here to define the number of policies I want to allow. And here, the NF thing is just the number of actual state factors. And that will just allow me in a generalizable way to say that v has t minus one rows, right? So it's going to be the number of actions minus the number of times because you have to move from time one to time two, time two to time three. So there's always one less action possibility than there are times in a trial. And the number of policies again is going to be columns and number of state factors is going to be that third dimension. So I can start by just defining that as ones. And I can, you know, keep it that way. I just wrote this explicitly so you could see it. And this just means again, it can only choose to stay in the same context every time. So that's not something the agent knows or can control. Whereas for state factor two, I can define each of these possible policies, stay in the start state, take the hint and then choose the left machine, you know, take the hint, choose the right machine, choose the left machine and then go back to start, choose the right machine and go back to start. And I, you know, I describe what each of those means in this kind of one, two, three, four, five thing down here. So then if we want to, and I'm not going to do this here, you could specify E, which is the kind of like habits, you know, essentially it's just a prior over policies that biases you toward choosing one thing versus another. And just one point there, right, is in the previous active inference streams, we talked about that as the field of affordances for those who are connecting it to the ecological psychology or to the inactivism. The prior, the prior on what can be done among policies is the affordances of the organism in their niche. So if you don't have the object, it's not going to be a policy under consideration. And so E is this mathematical device that's going to be waiting policies by their habit basically. Excellence is a habit, that kind of thing. Yeah. I mean, I guess like, I mean, technically what you're doing is, you know, like the, when this makes sense functionally is when you're learning it. So, so what you would do with little E. And so this might start out, you know, flat, right, with just counts of one for the Dirichlet distribution. But if you choose the same policy 20 times, let's say you choose policy three 20 times, then this one, this third one here would become a 20. And so that just means that you'd start out with a strong bias now for choosing policy three, because you've chosen it a bunch of times before. And that's an optimal thing to do under the assumption that option three was the one that continued to have the lowest expected free energy every time, right? So if in this model-based way, based on minimizing expected free energy, you succeeded over and over again when you chose option three, then it makes sense to build up this kind of bias so that you don't actually have to work that hard cognitively to just keep doing the thing that's right once you build up a prior that it's going to keep being right. So I guess I can see how if you do the same sorts of things in a given niche over and over again, then you develop priors that bias you toward doing the things that are successful in that niche. And if something's never been observed, then it's unlikely. It takes a different mode of operation outside this script. If something's never been observed for it to be considered. This is just a basic, that's why you initiate with a non-zero value, because it's something that could happen. Yeah, that's true. If you set a zero anywhere in E, then it's as though they don't have that option. Yeah, that's true. One other thing to say is just like agents can get stuck if you have non-stationary environments. So say you have a hundred trials of one thing, and then you switch contexts, the agent will probably be stuck. And so I know it's just useful with all of these parameters, they have kind of, we can have our favorite interpretations related to ecological psychology or whatever, but I think it's always useful to consider them functionally and use them appropriately for whatever situation is you're modeling. Yeah, that's the thing is the most useful places with this and say like computational psychiatry are that if you have the right set of experiences in these models and you get stuck doing what's technically base optimal given your past experience, but that makes you really, really maladaptive in what you're doing in the new context that you're in, so you can get stuck in really maladaptive places in parameter space that could lead to symptoms that look like psychiatric symptoms, for example. So I mean, this is this kind of idea like Philip Fortenbeck has a paper on this as like optimal inference and suboptimal models, I think, but the whole point is you're doing the optimal thing, but your model is wrong. Right, outside of this. Doing the optimal thing under the assumption that your model is right. Just outside of this model stream, which is really amazing and I hope it will continue, we would also like to feature these kinds of direct discussions about the computational psychiatry. So this is awesome with a walkthrough in the code, it's going to be helpful for a lot of people who are learning that, and then let's table and find the right time to talk about these issues because it's really awesome what you're describing. Yeah, you know, I mean, obviously computational psychiatry is what I do in practice, right? So yeah, I'd love to. Okay, so finally, there's a bunch of kind of additional parameters, right? So we talked about learning rate, which here we're just setting to 0.5 for, you know, arbitrarily. Beta, this is the expected free energy precision. So it controls essentially how confident the agent is in its expected free energy estimates. So if this value is low, or sorry, if this value is high, that means that the agent is really unconfident in its expected free energy estimates. And when that's the case, then behavior will be a lot more random, unless the agent has strong habits. If agent has strong habits, then having a high beta value means that the habits will have a much stronger influence on ultimate policy selection. And we actually show this is the thing that I was talking about. It's kind of like a random exploration sort of thing where it's actually updated over time. So with each observation, the agent essentially updates its confidence in its expected free energy estimates. And then alpha is kind of like a standard, what are called inverse temperature parameters, where it just kind of controls randomness in action selection. So once you've chosen a policy, then that policy might say this action is better than that action, but occasionally agent might still choose a different action kind of randomly. And this is actually, again, this is technically like a suboptimal thing, right? You ought to just always choose the best thing. But this is actually quite important to fit in practice because in actual human behavior, there tends to be a notable amount of randomness. And so you need to actually fit something like alpha to get a model that fits behavior well. Now, these other two things I'll probably skip over largely because they only have to do with the neural process theory primarily. But this ERP thing, it just controls how much beliefs reset at each time point with respect to the modeled firing rate changes. So it basically controls how much priors at one time point carry over at the next time point with respect to the neural process theory. And then tau is basically like an evidence accumulation rate. So it controls how quickly you update your beliefs based on new observations as attached to the neural process theory. Yeah, if anyone digs into the code, it's just a time constant on gradient descent. Yeah. So and I go through a couple other ones here that I explain, but I'm going to kind of skip over those for now. So then that's basically it. If we wanted to, we could hard code what the states, what the initial states are and what the observations are like that by setting S and O. But typically what we will do is we won't do that. We'll just let the generative process generate the observations itself, which is probabilistic. So it'll generate with whatever the probability is of D over D what states are actually the true states in that context and therefore what observations get generated. And so last thing, and again, I already covered this is you just stick all these things you defined into this MDP. And we allow little D, which means we're going to allow it to learn whether the left or the right context is more likely. We set all these things, the learning rate and the action precision or inverse temperature, so alpha, et cetera. We're going to leave out other things we could have defined like E or like learning A or learning B or learning C or learning E, et cetera. We're not going to define the states ahead of time. And we're not going to define some of these additional parameters that we could include. And then we can just kind of do some nice labeling here just so we label what the contexts mean, right, the semantics we're laying on it for the figures. And that's it. And then once we do that, then we can use this little check script here just to make sure we didn't mess anything up in building the model, and it'll kind of nicely tell you or give you a hint anyway about what you might have messed up. And then this script that I mentioned before that we provide is the one that actually runs the simulation. And then finally, once we have the simulations run, which will be stored in this big MDP, then we can use these plotting scripts to actually show the results. And if I set sem equals one, then it's going to do that. It's just going to do a single trial simulation, which I'm going to do now. So I'll just let the thing run the rest of the way. So I won't all get rid of these stoppers. Well, just to give a quick, let's give a coding recap just to where we are at the end of the second session while you're going through the stop signs. So in the first session, we talked a lot about the basis of this paper and the rationale. We talked a little bit more about who might be interested or what kinds of experiments it was adjacent to. And then today, that was awesome with the code because we went through a full definition. And we moved from that sort of analytical definitions that we went through in the first session into seeing how they're realized in the code. And for me, at least it was really helpful to see how a lot of these things were put down. And then we got all the way up to line 631 or something, but could be different in a little different version. But we got to the definition of the full model, and then the call out from that object to another script. And then we're going to be storing all the outputs in this kind of big container called MDP. So today we went from zero to 60 with the code, getting all these things run through and defined and operated on. Now we're going to look at the output. And then it sounds like in future sessions, we're going to be fitting parameters and maybe doing a few other things. Yeah. And future stuff will, I mean, it sounds like maybe we're going to wait to do, to show learning next time, I guess, depending on the amount of time. But yeah, next time then we'd be in a position to do actual code for learning. We're planning on covering the neural process theory and building hierarchical models. And then the last thing is actually fitting models to data. So yeah, we'll see how much of that we can get through. But okay, so just to finish here, so I got rid of all my little stoppers. So now I'm just going to click continue. And it's going to run same equals one. And what I'm going to get out of this is a little plot like this. And also there'll be a neural process theory one under here. But I'm going to ignore this for now, because we're not talking about the neural process theory right now. So what this top one here is is it's hidden states of contexts. Now black equals a higher probability, white equals a low probability. And so there's, you know, the gray is kind of in between, right? So this is saying, and these are all posteriors over states at the end of a trial. So that's important. So what this is saying is at the end of the trial at time three, these are the beliefs that the agent has about what state it was in across all these time steps. So this is saying at the end of the trial, the agent was completely confident that it was in the left better context. And these little cyan blue dots here mean that that's what the true context was. So in other words, the agent was right. So this is saying at the end of the trial, the thing was the agent was really confident it was in the left better context. The other state factor is choice states down here. And this is just saying that it observed itself, go from the start state to the hint state. And then, well, sorry, it moved from the start state to the hint state to the choose left state. So those are the states, the actions correspond to that well. So this is saying it first chose the hint state, and then it chose the choose left state or choose left actions or these the actions. This thing in the middle on the left, it's not really important. It's just kind of depicting what the different policy options are kind of arbitrarily based on the numbers. So I just ignore that. Down here at the bottom, we have the outcomes and the outcome preference distributions. So this is just saying it observed null, and then the left hint and then null again. This the third one just says it observed it go to the start state, then to hint state, and then to choose left. So it just observed its own behavior. The second one here is the win-loss one, which is the most important. So you can see that this one actually has a nonzero distribution over it, which is what means it prefers some observations over others. And so in this case, what it's saying is it started the null state at time two, it stated in the null state, right, because it took the hint. And then it observed a win. So in other words, it got the hint, it knew what context it was, it chose left, chose left, and so it observed a win. So that's what that means. And then this distribution over policies here is just saying at the beginning, it wasn't that confident, right, so there's a lot of gray across all the policy options. But after the hint, it became really confident at time two about what the right policy was. This bottom thing is part of the neural process theory. It's simulating dopamine. So dopamine responses quote unquote, it's the updates in the expected precision that beta thing. But that's how you read these plots. And again, I won't go over the neural process theory one because we're going to do that next time hopefully. But just the last thing to show is if you noticed, in that case, the agent went to the hint first, right? So it went for the hint first, I'll just do that one more time. As you can see, it's very reliable that it will choose the hint with a lot of confidence. But now look what happens if I make this RS value higher, if I make it eight, which means that the thing is really, really, really wants to win the four dollars basically. So in this case, it should be risk taking, it should just forgo the hint and just take a guess immediately. You can see that's what it does. It has a flat distribution. It has no idea whether it should choose the left one or the right one. But it just takes a chance because it really wants to win the four dollars. And the value of information and the expected free energy is low because the reward value is really high. And so it just takes a guess. And it guesses right. And it turns out that it was in the left context. So it observes that it lost at time two. And because it observed that it lost, it refers, oh, I must have been in the left context as opposed to the right one. So at the end, it still knows now that it was in the left better context. So that's what that means. You can see right away that by changing how much the thing values reward in the preference distribution changes how information seeking it is versus our risk seeking it is. So that's that's the way that that works. So do we want to end here or do we want or is there time to show learning? We can absolutely go through learning if you would like. So the topics that I've written down, I have learning, which it sounds like would be awesome to run through now. Then we have actual data, hierarchical models, neural process theories, and then computational psychiatry. So that's the stack of topics. Let's get to learning today. And then another time, another model stream. Okay, sounds good. So learning. So if you can see, I sort of said how to do this here, I said for to reproduce figure seven. So for figure nine here, which is sim two, which is the initial learning, I'm going to first set RS to three. It's just a value that turns out it's nice for showing the dynamics of learning in a way that's clear. So because it should start out being information seeking, and what we want to show is how it learns to be more confident and therefore starts to forego the hint as it builds up a stronger prior that the left context is the one that keeps being the case. So now what I'll do is I'll set sim equals two. But I'll show you what this does down when I go to sim equals two. Okay, so this is actually incredibly simple. So when sim equals two is set, all I do is I say, okay, I want 30 trials, so n equals 30. To keep the mdp thing something that remains in that just kind of single structure, I redefine little mdp as big mdp. And again, this is just for convenience. And then I can use this deal function, which is just kind of a nice convenient function in MATLAB, which more or less just says, repeat the mdp structure 30 times. So now there's mdp one, mdp two, mdp three. And all those are just the same, just identical as the initial mdp that we built. So if I do that, then what will happen is it will look like, so okay, so I'll do two things. One is, so before, so now I just have big mdp equals a lot mdp. So now big mdp will look like this, right? There's just a single field for tv, ab, blah, blah, blah, everything that we set. Now once I run this deal thing, then if I step, then all of a sudden mdp will now look like this. Where you can see now that same structure is repeated in 30 rows. And this will all start out being identical. So what will happen now though, is if I run it through the vbx tutorial script, then, and this will take a second, it will run through the whole thing 30 times. And again, this will just take a sec. So wait, let me ask something. So first, maybe someone's, maybe has a little noise in their background, but it seems like this little mdp is like a column that conveys all the details for a single trial. And then we're making a meta matrix, we're just generalizing the matrix into many dimensions. And now we're stacking all of the total model as like a column into, in this case, 30 trials. But you could do 100 trials in a row, or you could have 20 sets of 10 trials. So it's the idea of concatenating total model setups like a column, so that you can do operations on big, big, potentially hierarchical sets of models. In this case, each mdp is a row, really. It's just a giant model. Yeah, it's just a giant structure with mdp over and over again. Yeah, I meant rows, sorry, good, good call. My r-centrism, I'm always like thinking in a column. Yeah, all right, but okay, so now I ran it, I ran the mdp with all those different repeated, repeated rows, repeated models through this mdp script here. And now what you'll see is now the mdp structure will look a little different. So now in addition to all the stuff in it had before, it's also now going to have all this, how this thing is like being, now it's going to have all this other stuff. It's going to have the free energy F, it's going to have the expected free energy G, it's going to have the total free energy H, it's going to have the actions U, it's going to have the free energy for the learn parameters D, a bunch of other stuff. And we have a whole table in the tutorial that says what each of these fields mean, so you can interpret them. They don't all matter enough for me to go through them right now. But so then once I have that, then I can generate a plot with this little game underscore tutorial script that I, for plotting multiple trials that we put together. And I can run that and it will generate a multi trial behavior. So what this is showing is the first action on each trial. So here what you can see is the colors of the probabilities, so dark equals higher probability and the blue circles are the actual chosen actions. So you can see for the first several trials, the agent kept taking the hint. But after a while, because the left, if the left context kept being the one, the one that it thought it was in at the end, it starts to be confident around trial nine, 10 here that it's always just going to be the left one. So it just immediately chooses the left arm or left machine. But now what you can see is that it gets one wrong, right? Because it's only 80%, right, that the left one is going to be the right one. And once- Wait, could a dog be needed? Or is that yours? Yes, it is dog wanting to be in my lap. If the dog's a contributor, then absolutely welcome to speak. Yes. But anyway, so it starts to get one wrong here around trial 12, right, because 13, because again, it's only even when it's in the left better context, it still will lose 20% of the time, right? And so once it loses, it says, okay, actually, I'm not confident anymore. So it starts taking the hint again, and that happens a couple of times. And so it starts kind of going bouncing around. And also note that this is under an alpha value, right, like the action precision value that's not that high. So its behavior is a little random, right? There's a little bit of randomness in it, like what a real human would do. And so this next call, this next plot here below is just green as a win, black as a loss. And the, you can see when it loses, the negative free energies are larger. And again, this I won't go through these last, these two here are just part of the neural process theory, they're like the expected ERPs and expected dopamine responses that model that theory would predict. And then this bottom one is just kind of the evolution of the agent's priors about whether it's going to be the left one or the right one context. So that's how to read this. So it learns and then, but look what happens if I instead make the thing a little bit more risk seeking. So I make the, the value of the, the subjective value of winning $4 more. And I do that just by setting RS to four instead of three in this case. So now, by the way, it's awesome that basically by hitting play on this script, it seems like even with minimal tweaking, people can reproduce this in their own setup, starts to play with some of the variables, and it's just really an awesome toolkit. This is pretty cool. Yeah, no, thank you. Yeah, we, we definitely tried to make it as easy as a user friendly as possible for, you know, people to have minimal background. So I'm glad if you guys think so. But okay. So, so in this case, right, I set the risk seeking parameter and the preference distribution to be a higher number. So now if you look, the agent is now, it takes the hint twice and then just sticks with the left every single time. Even if it loses a couple of times, it just continues to be a risk seeking. And again, everything else is basically the same, except the predicted dopamine responses and ERPs and things like that are different. But, but so that is what it looks like when you just do like a normal learning, just where it says the same, the context is the same across trials. And the last thing I'll show you guys is if I set it to three, then we engineered this kind of this reward learning or this reversal learning version, right? So reversal learning is for the first several trials, the left context is better. So it's going to build up a prior that the left context is better. But then it's going to switch to the right context being better. And we're going to see how the agent does. Now I'll show you how we do that, which is also very simple. Here we're going to set n to be 32 trials, do the same kind of, you know, deal thing. But now what I did is I just said for one to the number of trials divided by eight, we're going to make big D one zero. So this is saying the true neural process or the true generative process is going to be the left one. But then I say in the MDP for n divided by eight plus one to plus one to n, so basically one after this number to the end, then it's going to now be switched to the generative process as the right context. So in this case, it just means the first eight trials are left better and the rest of the trials up to 32 are going to be the right better context. So that's literally it. Then you just run the same thing through the same VBX function and you plot it with the same plotting script. So that's really very, very simple. So if I do that with risk seeking of three, then what we'll see happen is then, you know, give it a second to cook here. Just to give one note on that software design, instead of having five scripts called simulation one, simulation two, simulation three, it's that there's a common core and then the simulation scenarios are defined basically like variables down here in the 600s lines. And then by changing which simulation is in play at the very top, it seems like Ryan is being able to change which simulation we're in very fluidly so we can change the simulation, look at a variable, change a variable, go back up, change the simulation, go back in. Yeah, I mean, it's basically it's just an if then function. It just says if sim equals three, then do this or else if sim equals four, do this. You know, I mean, it's just a very simple if then statement. But but okay, so in this case, so I set RS equals three. And what you can see is, is in this case, the agent just continue to take the hint the entire time. Because it was never, it was never confident enough. And the, you know, and it happened to get the first one wrong in this case, which is part of what drove that. But so if the actual observations ago are a little different, let's actually try that again and see what what I would get here. But this characterizes all these interesting things like a, perhaps over or under eagerness to ask for hints or acquire more information. It's showing how the model this one is not trying to fit human behavior, we're really playing with the bare bones. But already we're seeing that's kind of pathological hint taking behavior, but it never goes when it knows. And so now it's something else. Well, it's just its parameter play, but it's going to play out differently. Yeah. So like, yeah. So like in this case, where it does take the hint every time it does kind of start, you can see on trial five here, it starts to have a little bit of like, maybe I should just choose the left, choose the left one immediately. But then it gets a couple wrong. And then the context changes, you can see in context learning here, it starts out, oh, I think the left one's better. And then it starts switching. And then around trial 11 and 12, now it starts to become more and more confident that it's in the right when context. So now last thing I'll show you is what happens under that same thing. But when I set RS equal to four, so it's a little more risk seeking. Because this actually gets a little more fun. So the risk seeking is like a multiplication on the money. So it's like $1 to $2 versus $1 million to $2 million, like how high the stakes are, because it sounds like you're kind of tuning up, whether the agent wants the big wins or what is it really conveying in this model? It's just essentially the precision of the preference distribution. So it's just how big the number is over win at time step two. But that's really all it is. It's just how high the probability is for a win in the agent's model, which just encodes how strongly they prefer winning. You could think about it as like $4 for them subjectively being like the value of $10 or something like that. But all it ends up doing in practice is it ends up saying the reward value component of expected free energy has a higher weight than the information value component, is all it ends up being in practice. So here you can see that the behavior is actually, again, it's more interesting. So you can see it takes a hint twice and all of a sudden it's like, all right, I'm confident enough, I'm going to go for choosing left immediately. It does that for a few trials and then the context switches. So it trialated switches to being the right context. And then it's like, oh, nope, I'm going to start taking the hint again. And then it observes the right one a bunch of times. And then it's like, okay, now I'm confident it's the right one. And then it starts choosing the right one without taking a hit a bunch of times now. The behavior is quite a bit more interesting because it becomes confident, it loses confidence and then it switches to being confident again in the other thing. It gets one wrong around 23. It gets a wrong one. And then it does one hint and then it's like, nope, I'm on the right track. I just lost one time. It's all good. Yep. Exactly. So that's the way the learning works. And if I show you, I'm plotting this here, but literally in the MVP, it will just be, if I go to the things learning D, right? So if I go to little D, trial one, then it observes, it's 0.75, 0.25 because it observed left the first time. And then at, stay like trial eight, it's 2.2, 2.2. So it didn't make... It's because that's what it's confused. It's flat still, but that's because it got some wrong. That's when it was most confused. Yeah. And then, but if I go down to say like, at the end, say like trial, I don't know, 31 or something, then all of a sudden, it's two versus 13, right? So it's way more confident. It's added a lot more accounts to it being the right context that's better. And that's really it. So, and that's literally all the learning is. And like Chris showed you the equation for learning A, but in the tutorial paper, we also show it for learning D, which is even simpler. It's just, see if I can get to it real quick here. Learning process theory, learning. Yeah. It's literally just this. So D for the next trial is D for the previous trial plus EDA times whatever the posterior states were for the previous trial. That's literally it. Although I'm realizing that should be a minus, not an equals, I think, but little typos here and there. But anyway, that's really it. So it's not complicated at all. It's just add account based on what your posterior was over states at the end of the last trial. So, I mean, that's, you know, for stopping at the end of basic learning, then that's really it. Like I said, four and five have to do with estimating parameters based on data and then recovering parameter estimates and fitting behavior to data and things like that. So we'll cover that at another time. Awesome. Well, just to catch our breath at the end of this really fascinating and very appreciated stream. And if anyone wants to put any last comments in the live chat, they'll have just a couple minutes to do so. Let's just each take a final recap or a final thought. What do we want to remember from this time? What are we going to say forward into next time? What are we looking forward to learning more about? So whoever wants to go first, maybe Christopher, any remarks on Ryan's presentation and then on the code and also like your role or which parts had you done? I was just curious about that while listening for the last. So, no, I think I agreed with everything that was said there. One thing to just say is very when you really get a handle on building these models, it will take, I think for the most time I've ever spent building these models once we'd actually figured out the model structure was a couple of days. You might spend a very, very long time figuring out the model structure, but once you figure out the model structure, everything falls in place and it's super easy. So it's kind of interesting in the sense that scientists and friends do like neural network or modeling with like spike in neural nets and they can spend like weeks building these models. The hard work here is really conceptual. And then did you ask like what part of the tutorial I wrote or versus Ryan? Was that the question? Not a partitioning, just which parts were your background drawing from or just how do you see it similar differently? I wrote, generally speaking, I wrote the mathematical appendices and then did like first drafts of a lot of the techie sections, but that was kind of like my, I literally really asked Ryan to do that just because I wanted to kind of get a better understanding of them for myself. And then Ryan kind of wrote a lot of the broader sections. That was kind of how that stuff worked out. Yeah, I mean, I mean, yeah, I wrote, I mean, I wrote the the code we're going through now was, was, you know, I took the first pass at it. Chris wrote the hierarchical modeling code, which another another day you'll see in a lot of the neural process simulation code, or the kind of custom one, the ones I was showing you now are just kind of like Carl's standard, the ones that are in SPM. But Chris made some way way cooler ones for the hierarchical neural process simulation stuff. So, but yeah, I mean, a lot of this was a fully collaborative, you know, I wouldn't necessarily say, you know, one person did anything more than the other. So yeah, everyone's who's like written a scientific paper knows like it's after the first draft has been written, it's kind of hard to tell the product because when you have heavy backward and forward editing it. Yeah. I totally agree. It's a great perspective. So also thanks for sharing that. So maybe now let's have our last little round, maybe max, or we'll each take a last step. Yeah, I'm just, you know, to as a lead into next week, I'm excited to hear about the neural process theory and really about, you know, looking at those dopamine traces and it's cool to see the spiking that you are able to reproduce with this kind of a model structure that does reflect, you know, kind of how these actual neural biological processes might look. One thing I'm really keen to hear about is the, you know, the link function between your gradient on free energy and your dopamine expression or simulated dopamine expression levels and the postulated link between, you know, precision in so far as dopamine. I mean, these are things that I'm just, it's a little out of my realm, but I think it's really interesting because fundamentally that's such an elegant mechanism if it's that gradient only from a mathematical standpoint that it's really cool to tie that from the math and theory into empirical observation using that to validate the model. Cool. Chris Furrier. Yeah, definitely. Well, we're excited. We're excited to show you. Nice. Do either of you want to make any last comments? No, this has been really fun. Thanks, Daniel. Yeah, thanks everyone. I've been talking this whole time, so I, again, people probably don't want to hear me rambling anymore. Not till next time at least, but yeah, thanks everyone for watching live and replaying everything and just for participating because the conversations we're having, the real-time errors we're finding and the learning that we're all contributing to. So just thanks everyone for participating and we'll see you later for now.