 Okay, I think we're live. Okay. Yes the screen was blocked out for a second, but hello everyone Welcome to the active inference lab. This is February 5th 2021 and we are here in Model Stream 4.0 with Ryan Smith and Christopher White. I'm Daniel Friedman I'm a postdoctoral researcher in California and a participant in active inference lab Maybe the two of you can just briefly reintroduce yourself Yes, for anybody who hasn't you know been watching the previous sessions. I'm Ryan Smith I'm a investigator at the Laureate Institute for Brain Research in Tulsa, Oklahoma Yeah, hi, I'm Christopher White. I'm a PhD student at the MRC Cognition Brain Sciences unit in Cambridge, England Awesome. Well, we are here today in part 4, which will be the last section on this paper Hopefully not the last time we get to speak together on any stream But it's part four of four on this paper a step-by-step tutorial on active inference and its application to empirical data You can find the link to the most updated version of that resource in the video descriptions And as far as points of process if you have any questions or comments or thoughts during the livestream You can type it into the live chat and we'll address it when the time is right If you have questions arising after this live stream or you're watching it after it's occurred Feel free to leave a comment and we'll try to address it in future sessions because the model stream never ends So to learn and participate and learn about how to participate go to active inference org That's where you'll find more information on active inference lab So today in our fourth session It's going to be pretty exciting because we've covered a lot of the groundwork and Daniel Right. Yeah, you're we're not hearing anything Daniel. Oh Give me one second. Okay, do you hear me now? No, yeah, you froze. All right. Give me a second It's all good. This is how it happens sometimes when live streaming Let me go back to the event And I will rejoin the event Cool waiting. This is just how it is. Welcome to live streaming everyone Cool. Look at that ghost Daniel Friedman He can be kicked out. You're really quiet again. Okay. Okay. Okay. Yes Device settings Okay. Yes Thanks Microsoft teams for facilitating these conversations. I it's still live and happening is just a little Little hiccup with teams that now we've gotten that out of the way. So it's fixed But thanks, right? And I don't know if this other person can be kicked out, but take it away. We're all good though. Thank you Okay, maybe Which one are you the ghost of Daniel's past it's okay Yeah, it will auto drop them like after you share and unshare, but it's all chill. Okay. Okay perfect Okay Think that that worked. Okay, so we are still we are still live and going. Yep Okay All right, so So I should just start or is there anything else you were kind of in the middle saying something just if we could start with some Recap for those who probably have seen the first three Sessions just where did we get to as of today and then where does that take us into the future? Okay, so yeah, just as a quick recap As annual said the the kind of whole purpose of this tutorial that we put out was to try to make Methods for actually building active inference models and I'm applying them to experimental data We wanted to try to make those methods more accessible to a broader audience Because up until now it's been fairly difficult to find sort of clear resources for beginners to start learning these sorts of methods and so the whole kind of the whole You know point of what we've covered so far has kind of been building to this section If the focus again is on be but being able to actually use these models for experiments So obviously the the earlier sections could be useful on their own if you want to do simulation work, and there's lots of papers reporting simulation work, but in terms of then applying the models and simulation work that you do to Two actual experiments to actually fit it to task behavior with participants and you know predict things about like neural responses and fMRI studies and all that kind of stuff Then it all kind of boils down to what we'll talk about today So in earlier sections, we kind of covered the basics of active inference, you know what free energy at what variational free energy is What expected free energy is what the motivation is for doing active inference what the potential benefits are over traditional reinforcement learning models You know things things like things like that just kind of the nuts and bolts and then we covered how to Build active inference partially observable markup decision process models And as an example We built a Simple kind of explore exploit model, which I will or explore what task model should I say As a just kind of a concrete example of how you would build a model And we showed some simulations both for perception and decision-making And so and then we also kind of Took a took a break from that we're going to return to that now, but we also took a break from that a little bit to cover Hierarchical models and the neural process theory But now we're going to come back to using the explore exploit task model because it's Simpler But now we're going to come back to using the explore exploit task model because it's Simpler and more straightforward in terms of what you do with participant behavior and and how you fit it So so all that like I said is building up to Using the task model we built as an example for fitting it to data So Christopher, I don't know yeah, yeah great summary Christopher What would be your perspective or what was something that stood out to you? From the previous sessions. Yep, because you're you're so both deep in the game and actually producing this Artifact that it's been fun for us and for the audience just to be Kind of talking through it because it so what what does it leave upon somebody who's been involved in the generation of this artifact? Now to be talking about Interesting question that's close I thought I think it has made and I'm just so this is kind of in The context I have I started my PhD four months ago. So this is all kind of in the context of figuring out Not just how to build these models kind of for myself Or sorry how to teach other people how to build these models rather But actually how to build these models for myself and apply them to my own research questions and So I actually have never fitted them to data all of the stuff that I've done so far has been Theory essentially just I've done a lot of simulation work, but no no data fitting so I Don't know I think that's actually the the frontier in active inference The moment is actually figuring out how to get these models to hit the road And there's a really one one really tricky thing is most of the people who work on active inference in a real way are neuro images So I for I for example, I'm in a neuro imaging department I know Ryan you work in your imaging as well It's actually challenging to test these models in the domain of neuro imaging neuro imaging is really hard Actually, I think many of the clearest predictions actually in terms of the neural process theory Are actually things that you could test really quite easily with optogenetics not the optogenetics or anything like that's easy But say predictions about micro circuitry in PFC, which active inference certainly makes Can very much be tested easily In I don't know a rodent model, but it's really difficult to test it in a human and so I think The point that I'm trying to in a rather rambling way get to is I think we need to think very deeply about When we build models about what it actually is that we're trying to solve do we just want a directional hypothesis? In which case you don't necessarily need to fit these things to data You're just saying a is be able to be bigger than be which is still better than most psychology. So let's let's be honest Yeah, it's really interesting the neuro imaging case has a few specific difficulties massive amounts of data in a system that has massively complex internal and external dynamics like the brain and You're making a ton of data with a very complex error profile and in the SPM textbook it's chapters and chapters of Normalization and rewarping and all these different kinds of ways to deal with these very complex data sets And now we've almost built out that framework and now just like you said where it's hitting the road is in being applied to empirical data And it's a little bit of a surprise at least to me and probably others that maybe neuro imaging although that's the home of SPM We're seeing it applies to systems beyond Neuro imaging and even there's other systems that help inform our neuro imaging quest So pretty interesting stuff Okay, so should we just get started them sounds good and anyone in the live chat just drop a question and we'll get to them. Thanks a lot Okay So one brief thing that I did want to mention Is that a couple times ago? We we talked about we covered learning But since then Just wanted to let people know we actually caught a little a small error in the learning section in the previous version that we've now corrected and it has to do with the way that you calculate the the novelty term and expected free energy that gets added to It gets added to expected free energy when you're doing learning So it's what drives the parameter exploration term expected free energy it basically drives an agent to seek out policies that will tell it more about In this case what the a matrix looks like so tell it more about what the relationship is between states and observations And there's equivalent terms that could be added if you're trying to learn transition beliefs or other matrices and vectors in the model But so now just to kind of make that clear We actually have added two different additional worked examples of how Of how to calculate the the novelty term just to make sure that's clear so if if anybody doesn't happen to have the kind of most updated version and you're interested in kind of knowing the kind of rigorous details about how learning works and In the in the formalism then you know, I it might be helpful To download the updated version so that you can have additional worked examples to to get a better sense of how that works So I just wanted to Make that clear like I said it wasn't it's not like a big error But it's a small error that if you were to read it it could be confusing So I just wanted to make what just wanted to highlight that for people But today like I mentioned the focus is going to be on actually building task models So and applying them to data and there's actually it, you know, it's probably you know This doesn't this isn't necessarily like the only way you could break it up But I've broken this up into kind of six steps that are probably a good kind of heuristic You know roadmap or or whatever you want to call it For doing this kind of work So the first thing is that you have to have a task and you have to have participants perform that task The second thing is you have to build one or more models of that task So one or more models in this case partially observable markup decision process models in the active inference formulation That can reproduce or generate simulated behavior for that task and So once you do that Then the next thing you need to do is you need to find Somehow find parameter values in each of those models that you construct construct that can best reproduce Each participant's behavior So for instance under one model you might find that the first participant The model can reproduce the first participant's behavior the best if it has a particular Say value for how much they want the reward. So for instance like the value of the Think about it as a precision of the preference distribution Or you might find that for another participant you need a lower value for that to reproduce their behavior well And but they need something like a higher learning rate Or a lower action precision Value, right? There's all these different parameters in these models We've been covering and different values for those parameters are gonna be better Different parameter value combinations and these models are gonna be better at reproducing one person's behavior versus another on the task And so what that requires is having some kind of estimation algorithm Something that basically searches through in some way different combinations of parameters and checks how well it reproduces a given participants behavior and so and so like I said, those are called parameter estimation algorithms and There's a few there's several different Types that you can use I'll briefly mention a few and then we'll go into detail on the one that is probably most consistent with the general kind of theme of active inference models, which is a scheme called variational base So once you've found for each model The parameter values that best describe each participant then you want to know Okay, well, which of those models is the best model And so in that case you need some kind of way of comparing how well Each model on average can reproduce the behavior of the whole group of your participants So each model is going to have their best fit parameters for each participant But still the best fit parameters for one model might still do a better job than the best fit parameters for another model And again, we'll go into this and Then once you've identified what the the best model is Then and this is something that might not be obvious But it's really important is you have to somehow confirm that the parameters in the best model are Recoverable so you can either talk about this as whether the the model is identifiable or that the parameters in the model are recoverable and That's something that again, I'll expand on but the but the basic idea is more one one kind of simple way to think about it is What if it's the case that two different combinations of parameter values are equally good At reproducing or explaining a given participants behavior If that were the case then there isn't any unique Combination of parameter values that is kind of the best set of parameter values and what that means is that You could take one set of parameter values Generate behavior with it, but when you ran estimation on that you could get a very different set of parameter estimates And if that's the case then the parameter estimates that you're actually getting for that model for a participant Aren't necessarily a reliable or uniquely best description And so that means that parameters aren't recoverable or there's no kind of uniquely best set of parameters under that model So you need to do that to make sure that's true for the winning model And if it's not then you might move down to say the second best model and see if that's the case Right, so you want to find the best fitting model Ideally you want to find the the best fitting model that is also clearly recoverable that has clearly recoverable parameters Finally once you have identified the winning recoverable model You have this set of parameter estimates that describe each person so these are individual difference variables, right? One person has higher action precision than another one person has a more precise preference distribution than another And at this point you can take these up to the group level and you can say okay You know for instance does a healthy group have different parameter values on average Then say a group with depression or a group with anxiety or you can say You know do parameter values predict something on a continuous, you know in a continuous way, right? Maybe higher anxiety levels are associated with lower action precision Or something like that And so I'll show you guys examples of this kind of thing as well So first and you know we've kind of already covered this in previous sessions Or just examples of it But there are lots of different kinds of tasks that you can model Which is kind of a nice perk of active inference models is that they're not just about Decision-making as a main component you can also use them to model kind of simple perceptual tasks so Christopher in a previous session Showed how you could build a model of a simple perceptual oddball or a local global task, which is Almost entirely about perception You won't get necessarily interesting individual differences in behavior in these tasks, but you can show differences in say simulated event related potentials and EEG studies and In principle you could fit the ERPs To find the best parameters for a person But you can also do Inferential or inferential or prospective decision-making tasks like where you Plan for the future and make a decision based on what you expect future outcomes to be Which is the kind of thing? Kind of like What I'll be showing you or with the Explorer exploit task But the Explorer exploit task that we'll be showing is is kind of very similar to a standard kind of reinforcement learning task It just requires that you seek out information before you can really know or even learn what the right reward values are And then lastly you could combine this with neuroimaging in the context of Predictions from the neural process theory, which is what we've talked about and what we covered in relation to the hierarchical model that Christopher covered So I should say that this is kind of a fairly new emerging field And so there's not that many papers empirical papers that do this kind of thing You know, it's it's something that my lab has kind of been trying to make more common Practice, which is again part of the motivation for this tutorial is so that a larger number of researchers can do this kind of thing But so, you know, I mainly work in computational psychiatry And so, you know, we've this has been used to show say for instance less less precise lower action precision in substance use disorder and also slower learning rates for negative outcomes in substance use disorder You know, we've shown things like that Greater decision uncertainty is associated with People with depression anxiety and substance use in the context of approach avoidance conflict tasks Most recently We also use kind of a simpler Bayesian perception version of an active inference PMDP to show differences in the sensory precision in an inter-receptive context. So for instance when people How precisely or accurately people perceive their own heartbeats And interesting differences there Between again clinical multiple clinical groups and healthy participants and then lastly and you know one of the first papers using this kind of approach was actually Philip Jordan bucks paper where they looked at predictions about midbrain midbrained dopaminergic responses and And Changes in the expected for energy precision term that we had talked about in previous sessions And I was able to show nice relationships between these expected precision updates and and What and neural responses using a femur I in a region of the midbrain that is known to be rich in dopamine neurons So this is just kind of these are recent examples That you know, we're hoping other researchers can and can you know build on? Um on that neuro slide these examples We're seeing a couple examples of different technologies like fMRI But we also heard it goes beyond neuro imaging the scope of these methods and we're also seeing a couple of biological questions or conditions Related to your clinical experience Let me compliment those two axes of variation with the methods and the biological question with this algorithmic Dimension and there's a question in the chat that says what kinds of tasks? Reflect the relationship between dynamic programming and active inference for example Bellman optimal state action policies Thanks for the elaboration. So how do we connect some of these biological and methodological? Uses and axes of variation to some of the algorithmic questions about optimization dynamic programming Yes, I mean, it's a great question. We actually have a recent Preprint that Lance de Costas first author on Where he was able to come up with some proofs to show when active inference is and is not Bellman optimal and in terms of what the conclusion was in terms of one step policies active inference is Bellman optimal in the context where the Precision of the preference distribution is maximal. It's maximally precise over whatever the reward outcome is Active inference models in the kind of simple MD appeal MDP scheme. We've been talking about here They're not Bellman optimal or they're not guaranteed to be in the context of a multi-step so like deep temporal policies but whereas the more recent sophisticated active inference algorithms are Bellman optimal for deep policies That's because they more or less correspond to backward induction so That's that's the probably shortest answer I could give to that Mr. Perr, did you have something to add there? No, I yeah, I was just going to point to the Lancelot to cost a paper Great. I posted the to cost of papers that I think are relevant in the chat. So thanks for that question and continue Okay, so just to be clear This is a very specific paper that you know, Lance and I and nor and a few other people put out as a pre-print Just a couple of months ago So so I'm talking about one particular paper of lances So I mean I can certainly that point that that specific one out It's a people know what's when I'm talking about the relationship between dynamic programming and active inference the discrete finite horizon case with Yep, that's the one to cost us a gene par Friston and Smith Yep, okay, cool All right, so hopefully that helps Okay, so now I'll kind of be going through Each of the the steps here that I was talking about Kind of one by one right with our explore exploit test example. So first Step one is you have to have participants perform a task, right? So in this case and I should say for some of this It's really going to be assumed that people were following along with previous sessions The time constraints just really don't allow us to cover all the background So that somebody could watch this one without having seen the other ones and fully follow everything I'll be talking about So just just to make that clear if anything is kind of confusing then then the previous sessions are kind of necessary We walk through the matrices and some of the degrees of freedom in this setup So I think it should be good. Check out the other videos if you haven't already But but you know as a brief, you know kind of refresher about the task that we're talking about The on each trial the agent starts in a start state and then it can either directly choose one of two slot machines And if they're right, they'll win four dollars, but if they're wrong, they'll win zero And they start out not knowing, you know, like just flat prior over whether the left or the right one is Whether it's the context in which the left or the right one is more likely to win So it's just 50 50. So it's pretty risky to just pick one right away At least before you've learned if one of them is actually more likely if the context In which one is more likely to win is more common So instead the other thing that the agent can do is they can choose to get a hint and the hint will tell them which Whether it's a context in which the left bandit is the one that is more likely to win and when it's the right one And in each of the context in the left in the context where the left one's better It will pay out 80% of the time Whereas in the context of the right one The right one will pay out 80% of the time But the catch is if you choose to take the hint then you'll only win two dollars if you get the right one afterwards So this trades off this kind of reward seeking and information seeking Where if you seek information first you're more likely to win But if you choose first right away, then you'll get a higher reward if you're right So it specifically trades off reward and uncertainty And this is where a lot of this is not going to make sense unless you've seen the previous sessions Is that the second thing is you have to build one or more models of the task? We just talked about In our example, we'll use two different variants of the model one where we're just going to fit the risk seeking parameter And an action precision parameter and a second model where we also assume that the agent has a learning rate Or that there are differences in learning rate So for those that don't remember The risk seeking parameter just corresponds to the magnitude of this RS value So what this says is at time to If the agent observes a win so this rows a win then the value of that win will be RS And if they instead wait and choose the hint They'll continue to observe this start thing at time to but at time three if they win Then the value they will win will be this RS value divided by two So so whatever RS ends up being the win at the third time point if they take the hint is half of that So we're going to be fitting for each person what that RS value is Which again stands for risk seeking Which would be clear to anyone who's watched the previous sessions because the higher this value is the the less exploratory drive the agent will have And that means that they'll be more likely to take the risk and just Choose one of the slot machines randomly as opposed to taking the hint first to be more confident in which one's right So the second parameter is this action precision parameter alpha and all this basically says is the precision or the Probability of selecting some action given an alpha value corresponds to this So our corresponds to whatever the probability is of an action given the given a policy Scaled by this alpha thing and then softmaxed to become a just a proper probability distribution again So it just controls how more or less it controls how precise action selection is given the choice of policy and Then finally in the model in the second model where we're also going to assume an agent has a learning rate Other than just the optimal learning rate of one Then we're also going to be fitting this thing at a here, which is the learning rate And this just says that your beliefs about the probability of the context being That the left slot machine is better versus that the right slot machine is better just updates based on your beliefs about Whatever the that state whatever the winning state was on the previous trial again four times step one So basically each trial Whatever the posterior beliefs were over states They the count for that just gets added to Whatever the prior over initial states was at the neck that is at the next time point But it's scaled by this at a thing So, you know if at as one then it'll tack on account of one Again, if it knows with certainty that it was in say like state one But if that is point seven that'll only add a count of point seven on each trial Given that the posterior distribution was like one zero So so that's The the basic setup is we're gonna fit either just RS and alpha or we're gonna fit both of those and Edda One note on that Ryan I'd like to step to build one or more models because we're thinking about Active inference as like this process theory. It's instrumental, you know, the whole instrumentalism versus realism question We're using it as an instrument and that means that we're iterating over models We're relating our models to each other in specific ways like make the simpler one and then make one that just adds one more variable And then it's not like you're asking. What is the active inference model here? It's here's the active inference approach And we're going to combine our perspectives and make multiple models and we'll be comparing models So the only limiting factor there is how many models can we think of at the beginning and how wise can our model selection process be? So it's just really like a pluralistic but pragmatic way to go about it by saying construct one or more models Because then you're never gonna fall in the trap of thinking you're making the active inference model because there isn't the model of anything Statistically, they're just instruments. So it's a really helpful perspective. And so this isn't just a little Convenience to make one or more models It's something that reminds the scientist or the modeler that they're actually making one of many or infinite possible ways to think about a situation Yeah, I mean I mean hopefully you've really kind of you know thought deeply about what the most plausible models that are given current theory and You know current previous or previous empirical research, right? So I mean they should be informed But yes, so you're just you're trying to find the model that has the highest evidence that the observations or the behavior provides the highest evidence for And yes, it is always possible that there's some other model that you haven't thought of that would be better But you know if you have a model that Explains behavior really well and it does so better than any of the other ones you can think of then it's kind of a good bet as at least giving you Interesting individual differences to you to again bring up to a group level analysis So so okay, so You know to get kind of a little nitty-gritty in terms of the the code and the MATLAB structure and everything You know the kind of thing you need to do is You first so you have a participant's behavior, right? Like it shows the hint and then slot machine one that shows the hint and then slot machine two Okay, now they just chose the slot machine right away Vers slot machine two right away And so forth and those have to be coded in right to each trial and the MVP structure which which again Previous sessions will explain So the MVP structure in MATLAB for each so each Each number right inside these parentheses here will be the trial that you're talking about and the dot you here is the The matrix that described that encodes the actions that a participant took Now if we remember So row one here corresponds to state factor one, which is the context Whether left or right is better and there is no action for that. That's not something agent control It's just stable within a trial which one it is So these just get ones and there is no other possible action For the second row though, so this is the state factor that corresponded to action selection To you know action to corresponded to taking the hint and action three corresponded to choosing left So if that was what a participant did on trial one, then you would have to feed in Two followed by three right they chose the hint and then they chose left on that trial And so you just have to iterate that over all the trials in the task So one to you know end trials Whatever they did So that's MVP you and in this case like I said there are two actions So two columns and there are two state factors where only that rose which only a second one is control So now MVP oh dot oh is the observations So again, if you're familiar with the way we set up the model for this task before There are three different Outcome modalities three different types of observations you can get so the first Observation modality what a hint or not So in this case One is just you're kind of still observing the start observation If you observe to that means you get a hint that the left one is better And if you observe a three that's the hint that the right one is better So in this case, they got the observation that the left one is better when they took One two one because after they got the hint they just returned back to observing kind of the initial observation for the hint modality Whereas for the second modality here, which is the wins losses For the first two time steps They just observed kind of the starting observation here because they chose the hint at the second time point So there was no win or loss And then at time point three they observed a win So when is encoded as a three in this model and two or losses encoded by a two and then the final Outcome modality is just the agent observing what it does So in this case, this just says started in start state chose to take the hint and then chose the left chose to It observed itself choosing the choice of the left slot machine So that's it. So all you have to do when you're instead of simulating behavior, you're actually Fitting away actually feeding in participant behavior to fit it to fit a model to it Then you just have to you know sort of translate the behavior in the raw data into a form like this that matches the Action and outcome representations in the model And and so that's it you first feed those in And then once you've done that Then you have to use some kind of estimation algorithm, which basically means the thing is going to somehow repeat Simulating behavior in that model until it finds a model that maximizes The overlap between the probability of choosing the participants action of the probability of choosing actions and what the Participant actually chose right so for instance if it chose Hint and then left band it left slot machine on trial one It's going to find a set of parameters that apply across all trials that maximize the probability That the simulated agent also would choose the hint followed by choosing the left slot machine So there are a number of different estimation algorithms that can be used You know the the simplest possible one is just something like a grid search So in this case say you just have two parameters, which this is just arbitrary called alpha and beta You can kind of divide this up in a little grid different combinations of parameter values so for instance like you know parameter point two for alpha and parameter value three for beta And then you can just encode for example what the probability of the model is given the participants actions or That would be the posterior which I'll talk about or just the probability of participant behavior given the model and those parameter values I mean so in this case This is kind of clear a nice result where this value right here something like point three and four that is maximally and uniquely the best set of parameters for reproducing that participants behavior and And in this case You're using something called in this case. You're using something called maximum likelihood. So Maximum likelihood estimation, which just means again you're trying to find the model and set of parameters for which the Probability of the participants behavior given that model is highest Which is a type of likelihood? Right, so you're trying to find the maximum value for the likelihood Which you just in this case encode with these you know redder colors equals higher likelihood Of behavior given the model You can also do something a little bit So to be clear what we actually would want right is there is the inverse of this We would want the probability of a model given participant behavior, which is the posterior But when you're doing maximum likelihood estimation, you're just assuming here that you have essentially a flat prior for What the probability over models is? So in this if that's the case then the probability of the model given participant behavior is just proportional to The likelihood But another thing you can do It's called maximum a posterior or map estimates Which is the same thing except it doesn't assume that you have a flat prior over models right so if you just start out with a prior expectation that one model is better than another then you can incorporate that information into into parameter estimation and This is kind of an example of this where You might find this distribution in terms of just the likelihood right the probability of data given the model where a Parameter combination right around here would be best But then you might also have some reason to have a prior expectation That models down here right are are actually more likely And so if you combine those guys together you can get a posterior that looks a little different right like a something very similar A set very similar model wins in this case, but that's just because the Behavior that's driving the the likelihood here is just already a really good fit So it kind of dominates over over the influence of priors, but that's not always going to be the case So so those are two approaches and And again, you can you can do them with a simple grid search Or you can do Something so I should say that the grid search kind of thing it only really works when you have a Few parameters, but you don't have very many And the reason is because the more parameters hat you have the higher dimensional this Parameter space becomes and it just becomes it it takes sort of intractably long To do to search through every possible combination So what you do instead is you use some kind of gradient descent process Exactly like the gradient descent on free energy that we've been talking about For how active inference models arrive at posterior years over states and post years over policies So there's a really kind of interesting overlap between the way that you estimate models Via gradient descent and the way that you actually the way that active inference models update their beliefs via gradient descent So in this case or is a very good point I just want to point out one similarity the difference because people may have heard about these gradient descent algorithms for descriptive Statistics are for fitting descriptive models. So that's from going again from the data That's empirical to a descriptive model or descriptive statistics like regression coefficients And if it's a multivariable regression You might need to use not just a grid search But act that you can do a simple maximum likelihood approach like a least squares approach or something like that You need to use this kind of a complex Multidimensional optimization to get to that descriptive statistic and then there's this little twist. We're actually in active inference We might be using those computational techniques But we're estimating the parameters of an underlying generative model and then there's this nice little return Where actually the generative model is implementing that as well Yeah, so so in so in this case You you absolutely need Some kind of prior value Because with gradient descent you're not exploring every combination what you're doing is you're starting at some starting point Which is just coded by a here and then you're just like when you're minimizing free energy in In an active inference model you're searching neighboring values and trying to find a value that has a higher likelihood or lower free energy And then you just kind of keep doing that iteratively until you find some value that where the Likelihood stops getting bigger or free energy stops getting smaller and That then corresponds to your best estimate or your posterior posterior estimate for the parameters for a given participant and So when you're doing this with when you're doing it with a gradient descent for Free energy then that ends up corresponding well to the actual approach that we'll be talking about Or actually using in our example, which is a variational base So the variational base you do exactly this you set prior mean value and a prior variance for each parameter And then you do gradient descent until you find a set of parameters that minimize free energy Which is again an approximation to the model evidence And So but but it's important to keep in mind some potential limitations of this approach, which is if you see here this parameter space It does look like there is a kind of nice single right like local minimum, right? the local place where the free energy is lowest or Where the again? It would be a local maximum be like top of a mountain in this if you were talking about the highest likelihood Typically do log likelihoods as opposed to likelihoods and but one of the issues is this sort of The parameter space need not have a landscape. That's quite this clean You might end up for instance having a landscape just moving to a two-dimensional case now That looks kind of like this where if you start out say with a prior value with prior values that You know like this kind of red circle up here Then if you do gradient descent you might end up getting stuck in a little local minimum like this Where you know and then and then the gradient descent algorithm would say hey You know actually, you know, this seems like the best one because if I move in any direction Then the free energy goes up Whereas actually the global minimum the one that you would want to get to is this different one It's on the other side of this little kind of free energy hill And so, you know, this is this is an example where I'm choosing the the right priors is important to get the to be able to get the the best parameter estimates But also this speaks to the kind of thing about parameter recovery that I was talking about earlier Because it could be that if you set your parameters kind of right on the top of this hill thing here Then even if the parameter Combination over here is the one that actually generated the data You could get stuck over in this one in which case the estimation algorithm would not give you the right parameter estimates So this is why what I mentioned earlier about Assessing parameter recoverability is really important So although I have saying it's important to choose good priors or to figure out what good priors are And there are there are interesting ways to try to optimize that so there are kind of hierarchical Bayesian techniques We won't talk about these explicitly, but where you can kind of And these won't necessarily solve this problem, but where you can More or less you can think about it as it estimates the parameters for each person And then it notices. Hey, you know looks like these are all shifted over You know the distribution of these looks like they're all shifted over say to the right You know so actually the looks like the distribution is more kind of over here. And so then it might Choose that choose that That the kind of mean value of that as the new priors for for redoing the estimates until it finds values that That essentially it's it's helping you to find whatever the optimal priors would be Um, but uh, but again that won't um, that won't always solve this sort of problem with a lumpy landscape You know just one other little thought it's actually almost three layers with the red dot So there's us as agents on a landscape We actually need to come up with policy as agents in our niche to get around Local minima like my doors closed. I need to step away to open the door before I can move through it Then how are we going to fit policies for ourself under pervasive uncertainty? Well, we're doing this state estimation with the policy So that we can actually come and then how are we going to converge on those parameters using A little bit like a gradient descent algorithm internally not internally does is it just within the agent computationally It's just it's a very interesting because people may have I'd imagine there's one kind of person who sees this and says Oh, we talked about optimization for a year in my course So I've heard all about non and convex and non convex optimization techniques or rugged fitness landscapes And there's another group of people for whom actually this optimization theory might be quite novel because of How they've looked at modeling before So very interesting and to put into the fourth section like that really spoke a lot So interesting stuff Ryan. I just wanted to ask christopher. You want to add anything before we continue with this No, this really is very much Ryan's part of the paper Yeah, I'm just here kind of More than more than welcome to add thoughts or if you have any other insights about technical aspects No, no nothing really, um I will just I might ask some questions if something if things come up Because i'm going to be doing this at some point in the next couple months, but um So yeah Great, thanks Okay, so So Like I said, the the kind of detailed example that we are going to use is variational days And note that variational message passing is again What agents are using within? Within active inference models, which we talked about before. So we're using something very very similar to this Or I mean so technically I should say Now it's using marginal message passing in the latest versions, but again, we cover this is very similar To variational message passing. It's just kind of a slight improvement But so we're using The same sort of approach to estimate parameters Approach to estimate the parameters that people are using The parameters that are kind of you know potentially kind of stored in their brain in some sense So so we're doing gradient descent on variational for energy as I mentioned And as I mentioned you just specify prior means and prior variances for each parameter And then you just move the prior values in the direction of increasing action probabilities But what you're doing because it's a gradient descent on variational for energy Is you're not technically Just trying to find the maximum likelihood value like you're doing with like a grid search. Um instead What you're trying to do is you're trying to well I should say that the variational for energy part of it means that there's a complexity penalty um, and what that basically means is so if people remember from Variational for energy when we talked about it before The simplest way or kind of intuitive way to think about it Is that it's just complexity minus accuracy Right and what complexity means is how much you have to change your beliefs um, so If you start out with particular prior values, then the farther you have to move those Move your beliefs from those prior values, um the That's going to push a variational for energy up um Whereas at the same time For a variational for energy is going to go down as the model predicts behavior better um So what it means is for instance, you know, if I start out with a prior value that's a way down here like around like I don't know one in three Then it's going to have to move From those prior values a pretty long way before it gets to um a set of parameter values that fit well um Whereas say if I started over here, it's not going to have to move those as far um But if I were to do that then potentially instead of The posterior estimate posterior parameter estimates actually settling on this thing that has the maximum likelihood It might stop say like around this one right a little lower than it or something like that um, because that maximizes That leads to high accuracy while also not having to move the parameter values Uh really really far not having to change beliefs too much from what the um Uh informed prior belief was um So it's you know, it's it's important right to think that this assumes something that the that the priors you have that you have them for a reason Right that they're actually based on something that they're informing you um So that so that it is actually does actually make sense not to move them too far from prior values um, and in practice the reason this is helpful um, is because one thing that often happens when you're just doing maximum likelihood um, is that um That maximum likelihood will be with respect to your particular data set right your particular set of parameter or set of participants um, but often if you choose just the maximum likelihood value Then that's kind of over fit to just those participants Where if you were to say apply that exact same model to a new set of participants It might not do as well Because it was fitting, you know specific things that were not generalizable about your data set in particular um, so by by um By putting this kind of complexity cost on it it prevents over fitting Which means that the predictions of that model are more likely to uh generalize to a new set of participants later So it prevents over fitting, which is a very common issue in in just standard frequent of statistics as well um Okay, so so that's so that's what you're doing with variational base is doing this kind of complexity minus accuracy thing where you're preventing over fitting while also maximizing the accuracy of model predictions um Okay, so then um, like I mentioned you need to do model comparison to identify what the best model is um now uh Let's see. So I just realized I should probably set something in motion here See if I need to uh, so when we actually go into the code here in a minute one of these things takes a long time um Let's see. So I I set it to store stuff. I just want to triple check that I shouldn't um, you know set this thing Um, let's see Uh, yeah, why about I should have double checked this beforehand um cool, though, it's really Interesting and christopher you want to add anything or maybe just a quick note while ryan's figuring out like What is it like to be learning these models or what kind of skills do you wish you had earlier when you were learning the models? Um, other than of course your own tutorial I mean Um, I think everyone who did undergrad psychology is probably at some point in that I I did cognitive science as an undergrad But um, I took a lot of psychology classes when you take staff psychology a lot of people read this andy field textbook and he has a chapter on uh Some type of like esoteric regression and the introductory sentence is like I've never done this I don't see myself ever doing it But I wrote this chapter about it and if I ever need to do it, I will be very impressed by how much I seem to know Um, so that may that probably will describe my experience with um this tutorial to a certain extent Uh, okay, so it's a real surreal quick. I'll just I'm just gonna jump to the code and explain why I should have started this thing um is the uh That so what if you set this thing to sim equals five Then what it's gonna do is it's gonna actually generate simulated behavior um for six hypothetical participants Where each participant has a different combination of parameter values that's generating um that simulated behavior and then what it's going to do is it's going to um Then apply the estimation algorithm to the resulting simulated behavior um, and it's going to give you a set of um results um, and then what it will do Is it's going to do Bayesian model comparison on those um to identify what the best model um is And then I'm gonna try to do this um, I'm gonna try to set the so I'll just say for for um For the purposes of um, you know, you actually you guys actually doing it yourself at home um I'm going to uh Uh, I've set this thing to 32 trials for each uh, simul for each set of simulated behavior um, but maybe if I set this to 16 Then it will go faster Uh, just to think about what's happening people looking through these equations every time two matrices or two numbers are getting multiplied The computer has to do something so we're kind of nesting matrix multiplications Inside of even bigger ones. So if you want two time steps for two Larger time clicks for four models, you know for four participants. It really starts to balloon rapidly, especially when there's computationally intensive steps Yeah, so here just uh, just to show you guys what's going on. So I just started this thing and if you look In the mat lab window here, it's calculating Log likelihood so l l stands for log likelihood Over and over again under a particular set of parameter values and in this case it's negative 31 and it's trying to Uh minimize that we're in this case. It's moving it closer to zero So maximizing in a sense but bringing closer to zero So now it moved to be a gradient descent to a second set of parameter values Um, and now it's found that okay now the log likelihood is negative 22 So we're even closer to zero And again same thing now we're at negative 19 and it's going to keep going until it settles on a stable value and in this graph That will update every time um Which I'll uh show you here. So this is just showing um Ignore I should say these so something to explain is these routines were originally designed for dcm for dynamic causal modeling with fmri so um a lot of the A bunch of these graphs and also the the sorts of labels that they have are don't really apply in this case. They only apply to dcm um, so You know, so ignore a lot of the labels, but the point here is is that um Each iteration as you go from left to right is a new estimate um of the log likelihood um for every um For the set of parameter values. It's trying during gradient descent And so you'll see that after a while the thing is just going to kind of Plane out it'll converge onto a value. See right now as it's like 17.35 and now it's like 17.16. So the thing is kind of starting to converge on a stable um minimum uh log likelihood So now, you know, I found a set that actually is still at 16. So the thing is still Going for a bit to converge And down here, these are the posterior deviations. I'm gonna again ignore the um Ignore a lot of the units here, but the way that you can read this is just that Zero here corresponds to the prior values that you set um, and if it's going if the bar here is going down Then that means that the posterior parameter estimate is uh is lower at this point It's gone down from the prior value and uh, this kind of red pink thing around it. That's the posterior variance um so in this case and parameter one here is uh the action precision and parameter two is the reward seek or reinforce or uh risk seeking I think I can double check whether those are backwards or not But uh, which order those are but this is just saying that whatever this first parameter is um the the posterior mean estimate here is um, you know, whatever that is that actually corresponds to the real units of the um Of the uh parameter, but it's not very confident in that posterior mean estimate Whereas here the second parameter, um, it's also this negative posterior at the moment But it's very very confident in that posterior estimate um Um, so so that's what that means and these will update with each iteration Um, but so now you can see that actually the thing kept exploring and now it's actually found, um A set of parameter values that um, you know continues to actually explain the behavior the simulated behavior quite a bit better Right, so now it's at like negative 13 So you're gonna see this it almost kind of converged for a bit, but now it's actually kind of going back up um So eventually eventually it will converge, but it's just doing really really well at finding values that explain behavior well Um, one thing I should point out Though is is the actual values of these the absolute value of these are not really informative because they're They're um, they're basically the sums of the action probabilities for each trial So these numbers will be bigger if the task has more trials Um, so in and of themselves. It's only the relative values that that are meaningful um, this reminds me actually it reminds me a lot of Bayesian methods and phylogenetics where it's like what is the likelihood of this phylogenetic tree? It it's given some number by a program which sounds weird to think about the state space of all the possible trees or something like that But it turns out by doing this kind of a gradient descent searching through all the possible trees You actually do converge on A tree that is generative of the kinds of data that are observed And so it's just really interesting to see how this is working and it's fun to watch the number drop down too So so now so in this case it converged for the first person Um, and you know, these were the posterior deviations and then here's the simulated behavior Um of the participant under those parameters So you can see that you know the parameters do fit the behavior really well, right? The probabilities are pretty high under those parameters for each for every action You know, there are some little exceptions, but but it does pretty well Um, so now it's just it's now it's just moving on to the next person. Sorry, what? Oh, it might just be helpful to describe what the blue dots are just for yeah, sorry I'm assuming people have watched three sessions So blue corresponds to the actual chosen action Um on the first at the first action for each participant So basically this is saying and so black means Probability one and as it goes toward white that means probability zero So this is just saying Basically 100 probability under the model that the agent would choose the hint and the blue dot says that's what they actually chose Um, and so on and so forth. So anytime that there's a Light gray, you know, that's on the part where the blue dot is that means the model didn't really do that well at predicting that behavior Um, but you can see in this case Most it's pretty dark around most of these blue dots. So it does pretty good Is the point So now like I said, this is just going to iterate around for the six Agents that I mentioned and it's just going to compare Then it's going to compare the three The three that did have a learning rate and the three that didn't have a learning rate in the model Is what it will do And so just to kind of summarize what you've said so far So the steps would be something like you would actually get your empirical task, right and think about how the The actions that the subject has actually got or the participants actually going to make relate to what the model is going to make Or the actions available the model you come up with some mapping and then you would then presumably in this just translate them in some way and plug those into your you mdp.u Which is the actions participant shows and mdp.o, which is the observations that they actually saw Yeah, and then you would plug into the algorithm essentially, yeah, you plug it into the algorithm and And then it just the algorithm just repeatedly computes the sum of the life likelihoods for each trial Um, and then and then tries to find a minimum of that sum Okay, and so in terms of this is a Something I've wondered about. Um, how do you choose what the best trials are or what the best priors are? Is it okay just to kind of in silico Simulate a bunch of things and just say hey this this seems reasonable um In a lot of in a lot of cases. Yes. I mean, so so there's a couple things to do one is You can you can do the kind of model us there the recoverability stuff Um, you know beforehand and try to find a set of parameter values a set of prior values for which the The the parameter estimates are recoverable Right, so if you set one set of priors then maybe the thing does get stuck in local minima But if you said another set of priors then The generative parameters Actually do end up matching the estimated parameters really well Um Well, so there's there's a couple different things so one is yeah Do some of the simulation recoverability stuff ahead of time to find good priors of another thing is you know, if you just You know estimate participant behavior and you start to see That the posterior your estimates tend to be really far away from the priors you're setting Um, that that's kind of an indication that you're probably not setting very good priors So then you might like try setting new priors that look closer to where everybody's kind of moving Um, yeah, so we do have we do have a footnote about this in the paper from memory But it'll be good to kind of make that explicit another way that I've seen that in the field of evolutionary biology is if you have wildly disparate priors Like so a fallacy is that the uniform or the flat prior is like unbiased There's so much to say about that. I'm sure ryan knows well But like if you have one prior that stacked up against zero and another prior that stacked up against one And then they both converge like from kind of multiple sides That's a simple example, but if different families of priors and models that are very Paired down ranging to ones that have very complex error Models if a lot of different layers of complexity of the model and densities that start stacked up versus one end versus another Then of course, that's a difficult thing to manage But that at least means that given the empirical data you have and the task that you're modeling You have a really predictive model. So again, we're not realists We're not actually getting at the truth with these recursive and iterative and multi-perspectival models We're just fitting more and more of the variance explained in our empirical data Yeah, yeah, so just uh just to kind of give you guys another example So you can see now for this other set of parameter estimates, you know convergence took way less time right it just took, you know, seven iterations and it converged really quickly And one reason to see why is is that for this one it didn't have to move values very far from priors because This was an agent whose behavior was generated by parameters very close to priors What one other point of those even if it looks flat for 10 time steps It could still be trapped Which is why things have to be seated and have really good randomness Because there's no hard and fast rule for when you terminate So for example in a lot of the evolutionary simulations We would do you discard the first like 10,000 or more Time steps like burn in you just discard them because you think that actually it's too much reflecting your prior estimates And then even if it looks like it's converging or as one of my professors called it a fuzzy caterpillar Because it's kind of like the model was sampling From the fullest possible range of variables and it was still coming back home That's the fuzzy caterpillar But even then it could look like a fuzzy caterpillar for like a lot of time steps And then just totally hit on a new combination break into a new route So it's really a part in a challenge So just so people know that's actually not true in this case. So variational bays is a is a deterministic in the sense that Um, you you don't do this kind of you know burn in you don't um, right? Sorry variability you just deterministically like if I ran this over and over again It would give me the exact same parameter estimates each time. Um, so there is there is actually no Variability. I mean the kind of thing you're talking about is more like Monte Carlo Yeah, it's gonna say sorts of approaches. Yeah, sorry. Didn't mean to say that there was a burn in. Um, I agree That was just yeah little analogy, but thanks for clarifying So one advantage of Monte Carlo like sampling methods is that you over Give an infinite time you're guaranteed to margin to approach the true posterior or to obtain the true posterior Uh, it's also extremely computationally expensive Um variational bays is really quick Yeah, like you can run this stuff on your laptop So the Monte Carlo is based upon sampling that was the different domain that I was um referencing there You have to sample in cases where you don't know some of these distributions don't have them defined so perfectly But when you have access to this level of specifying Then there's a whole new range of techniques, which is why it looks more like matrix multiplication And this sort of like convergence to the variational free energy estimate Rather than just sampling endlessly from the landscape. That's a unknown, you know, anything else, but yeah, thanks for that. Sorry So so just like I mean show again just to give you guys a sense So in this so this kind of participant you can see that it's pretty kind of the distributions are a lot less precise And this is a person who has a much lower action precision value So these were generated by a lower action precision value agent And the model is doing a pretty good job just finding a low action precision value that kind of spreads the distributions Flat enough around it that it captures it provides decent evidence for each action And so anyway, so just just to uh again to um to give you guys an intuition for what's going on um But so once this is done Then what will happen is we'll have these The free energies of the winning model for each person for each of the two models Um the one with and the one without learning rate. Um, and so what we want to do is we want to do Bayesian model comparison Um, which is where you're going to compare the free energies for each model for each participant To find the model with the lowest free energy across participants um And the winning model um, so the the little spm function that you know, we include here It'll spit out several things, but the one that you want to focus on just to keep things simple is um The one that has the highest what's called the protected exceedance probability Um, and this is just the probability that each model is the most likely model across all subjects While taking into account the null possibility The differences in model evidence are due to chance. So so it's like I said, it's just it's just which is the best model um When taking into account the null model um as as a possibility um, and I'll um I'll show you that um in a second. Let's see is this thing It's still going it's uh Yeah, okay still going um Hopefully it'll be done soon But so I'll keep I'll keep moving through here and then we can return to this once it's done It's actually going pretty fast since I made it so few trials um But so so that's you know, what we're doing here um and uh, so now You know, we've kind of already touched on this a bit but So now is the point where we would confirm that the best model is identifiable You know or that the parameters are recoverable Um, and this is sorry. I already mentioned this a little bit But that you know, it's it's clear that not all models are necessarily going to have unique parameter solutions Right. So if you have a landscape like this, you might start out priors at very similar places And gradient descent would lead you to Minima different minima, you know different combinations of parameter values that are equally good at reproducing a participant's behavior um So so you always have to show That whatever model you're using and the priors that you're using um, will Give you the same parameters that You fed in To generate the data to begin with um And um, so here um again already talked about this a little bit, but step one is You would specify multiple sets of generative parameter values, which is what we're doing um, and This is important as you should select values that are The same or similar to the actual parameter combinations That you got in your in your true participants um Because it can be the case that parameter estimates are totally recoverable For certain parameter combinations But are not recoverable for other parameter combinations. So you care about the ones you're actually getting for your participants um So then you want to simulate behavior So generate simulated behavior using each of those parameter value combinations Run that simulated behavior through the estimation routine just like you would for real data And then check whether the generative parameters and the estimated parameters are highly correlated um, so When I run this in advance, um, you know, so without having to run it in real time like i'm doing right now Then for this particular case and bear in mind, this is one i'm using 32 trials not like the 16 or whatever I put in now Um, just to make things go faster Um, this is what I get for it for alpha for action precision for the two parameter model So you can see even with only six people the correlation between the generative Uh action precision and the estimated action precision is pretty good. It's 0.81. So even with six people Right, it's significant um Whereas for uh, well, anyway, I have a I have a bunch of them. I can I can I can pull uh, I could probably pull them up Um, let's see that's for a totally different thing um Oh, that was very Tutorial interesting. Uh Yeah, sorry. I'm just trying to find um I'm trying to find the uh The figures that I have is probably here, um Well, no, okay, not positive where I put them actually, but if I can't find them here then It will it will spit them out. Um but um, but point point being, um, these are uh, um, in this case even with just six values All of the parameters, um tend to be really good Right in terms of recoverability the you know the correlations between generative and estimated parameters tend to be You know between 0.7 and 1 Um 0.7 and 0.9 something or other. Um, so they're they're they're good um So then so then finally The last thing that we want to do is once we have parameter estimates for each person in a winning model that is recoverable Then we can take those values and we can put those parameter. We can use those parameters As individual difference measures between subjects Um, you know at that point there's a number of things that you can do so you know the kind of Simplest thing, you know if you if you want to kind of fall back on on frequentist statistics Um is you know, you can do standard, you know t tests and ovos correlations regressions, etc um And um, and you know, I mean, you know, we've done that before It's fine in some cases. Um If you want to kind of stick with the more kind of general Bayesian theme of active inference, then you can also do things like Um, you know do things more like base factor analyses And you know, both of these are often kind of used um Or or even used together um And just to give you a couple examples Sorry, I'm just going to triple check that this thing is not done The sbm just one note there is the sbm textbook has many points of contact between parametric and um Like a standard very classical statistical approaches and then very Bayesian approaches and mixed Methods and it will switch out or show it both ways. So it really is pretty interesting how it comes together here Hmm So so to kind of show you guys a couple examples of how you would use these kind of, you know, use parameters at the group level Um, you know, this is just I'll give you a couple examples from from our papers. Um So in this in this one in journal of uh psychiatry and neuroscience um What we did is we had a this simple kind of approach avoidance conflict task where more or less the uh participant had to kind of choose to move this little avatar guy closer farther from One of these two ends of this little kind of runway thing um and they knew that um The closer they were to one side the higher the probability was that they would get an outcome associated with symbols on the left side And same goes the closer they are to the right the higher probability of getting the outcome associated with the right side um And here a rain cloud means they heard like a really aversive sound and saw like a really aversive picture You know, like it's like hearing girls screaming and seeing a picture of her being like pulled into a van, you know like really negative stuff um And uh, whereas a sun thing here, man, you kind of saw this kind of like neutral maybe slightly happy thing um And uh, you had this kind of red this kind of bar thing on the side and the more filled the red bar was The more points they would win um See if kind of clear cases right where it's just if it's just rain cloud versus sun You should just go to the right if it's just sun and sun plus some reward You should want to go to the right And then you have these conflict cases where it's negative stuff Plus you get some reward negative stuff But you get you plus you get even more reward and negative stuff and even more reward than that um And so and so that's how the task is structured And so you can make A pretty simple model of this Where one state factor is beliefs about the trial type Right, whether it's uh, this kind of trial this kind of trial this kind of trial this kind of trial or that kind of trial um And then you can have a state factor corresponding to um beliefs about the runway position, right? So beliefs about whether the avatar is in position one two three four five, etc That's it And then you just have um, you set up likelihoods that generate the probability of um Of each position Generating what sorts of outcomes And what sorts of runway positions Given beliefs about trial type I'm assuming here Ryan. So you have it phrased as there being five trial types But would it be possible to have like A model where it was like a state estimate left and a state estimate right For the stimuli type and a state estimate left and state estimate right for the reward value Because this frames it in a very behavioral trial Centric way with doing estimate on which one of the five scenarios you're in But I'm just trying to think about cases where you might not know which scenario you're in or even what the total set of scenarios is So you're just doing state estimate on reward and on stimuli Um, yeah, I mean if I mean yes, I mean the well blog isn't going to depends right So I mean if you if you were to give somebody sort of uncertain cues about trial type um, then um Then I mean presumably you would just you wouldn't need to I mean you would just still have Um, if they know what the different trial type possibilities are then you could just have uncertain cues They just don't have a precise Uh belief was that precise prior belief over the different over this state factor? um but if you Didn't if there wasn't any set beliefs about trial types and there's just any possible combination Of you know sun cloud zero points two four and six then um, um, I suppose you could do something where you just have a kind of non factorized state factor that has like Just every possible combination Um, or something like that. Um, you could do that. I mean it would uh There are reasons to actually build like so there are Obviously time saving and model building considerations that mean you should use factorized distributions Um, but there are actual empirical considerations Which means that I think if you want to build a model that's kind of realistically how the brain works You should tend towards In a lot of cases you should tend towards factorized distributions So for example, there's good evidence that there's a factorized representation of task phase In pfc and m m tl. Um, that's pretty good evidence. So I mean the What versus where streams in vision that's a factorized representation um, so Very yeah, so you're kind of using it in like a core screening Sense to say sometimes you don't want to even allow for the all by all because it's more categorical How the task is being modeled anywhere like fight or flight You wouldn't want to have all combinations of elbow and knee movements You're fitting something that's kind of at the wrong Dimensionality and so for a lot of reasons not the least of which it seems like that's what organisms would do would be at that High dimensional manifold or the factorized representation Yeah, well and and in this case, right? I mean just we have experimental The experimental design is such that they are they do know ahead of time what the different possible combinations are Right, so I mean they just they just already know that there are these combinations um, so it's it's consistent with Their beliefs about the generative model that we have given them via the instructions um, so So so in this case, um, you know, we had a a pretty Uh typical, um, you know sort of sort of model graphical model With two parameters. We had our Beta here which corresponds to the expected for energy precision um here, you know just to be intuitive for the clinical audience We were going for we just called this decision uncertainty Um, and then we had a and so that's just this is again. This is the rate prior or rate parameter for a gamma distribution over over this gamma term and Which again is a uh A thing that modulates the expected for energy estimate over policies So it modulates this g thing um, but then the other parameter we had was this, uh emotion conflict parameter, which just said basically how much they, uh Dislike the negative stimulus how how aversive they expect the negative stimulus to be um, and so you can kind of show in simulation so each one of these, uh vertical bars here corresponds to beliefs about, uh What state you are in on the runway? um, so if beta equals one and, uh, the And the emotion conflict equals one Then you should expect that if the you know good thing is on the uh Oh, okay, and this is a this is a, um conflict plus two points, uh trial Um, so basically the thing will approach the reward Even if it's going to see something negative if ec is one and it will do it deterministically um Whereas if ec is three then it will fairly deterministically choose to go away from The negative stimulus even if they would get reward Whereas if they have higher, uh higher values So more decision uncertainty then this distribution becomes they become a lot less confident in this distribution And end up choosing these kinds of values that are more like in the middle um and um And uh the likelihood is is you know pretty clear it's just you need to trial type Each of the different um each column here is a different runway position They will just generate the negative stimulus with an increasing probability as it goes left And the positive stimulus with an increasing probability as it goes right Um, etc. Uh, the only confusing thing here is is that White means a higher probability in these whereas black it means higher probability here But um, but anyway, so that's that's it and just kind of etc etc for the five trial types So in this case what we found was If you look at healthy controls versus people with depression and anxiety versus people with substance use disorders um The emotion conflict is actually interestingly higher in healthy and lowest in substance use disorder Whereas decision uncertainty is highest in substance use disorder um And kind of medium and depression anxiety and low and healthy controls and these are significantly different Um, and that's true in a propensity match propensity match sample and in a larger sample So this interesting thing where it might be more of a what might be more clinically relevant as this kind of uncertainty over options As opposed to just being more sensitive to negative stimuli um Right, how would the second things how would that shape just kind of curious if it doesn't have a if it's not known But how would that shape treatment or conversation or approach? In a given situation From whichever role makes sense like to say oh, it's not actually this psychological construct But it's actually related to something like this Um, I mean in terms of like informing treatment or something like that. Sure like treatment via any modality Um, yeah, I mean so I mean typically the The main source of things you might want to do is you know either talking about um So say, you know at baseline before somebody starts treatment What their beta value was Right, it might be the case you'd have to do a study to show this But it might be the case that given different beta values at baseline You know like people with high beta values might respond well to CBT Cognitive behavioral therapy whereas people with low beta values might respond better to Act or might respond better to An ssri or you know like something like that So either it's either it's something I mean that's kind of the ideal thing as you want to say Can I get this information and it will give me information about how to treat a person? um But there are there are lots of other things you might do besides that but that's that's like a primary kind of You know like ultimate goal example. Yeah, it's it's interesting how there's probably a lot to be said and learned about the actual application of active inference clinically But even here on the beginning of applying it We can use it as just a potential Biomarker just like a summary statistic related to a questionnaire or Some other thing that's estimated from empirical clinical data Fitting a different kind of underlying generative models. So instead of doing a regression Oh people who have higher on this end up doing better or worse in this kind of a program Well, now we can just do that same kind of parameter Testing in the context of a different type of model. It's an active inference generative model So I think that those are just some points of contact. That's really interesting stuff Yeah, definitely. I mean you can do lots of other interesting things with these, right? So this this beta parameter if you remember from previous sessions is what's associated with It's proposed to be associated with dopamine Dynamics in in the brain And I'll show you briefly a study an example study later Which we're filled up short and back like I mentioned before Actually showed that the trial by trial updates in beta that are predicted by the model were correlated with bold signal in fmri In in the midbrain in a midbrain region that's associated with dopamine You know, so whereas whereas the way that we're doing this here, we just have a stable beta estimate for each person But you might look at Say individual differences between contrast values in an fmri analysis at the group level you know, so like people with higher beta values have higher say basal ganglia responses To reward versus no reward or something like that So you could do that kind of thing as well. So both both individual level and group level fmri So it's of approaches as well as much fancier things But those are just two kind of simple examples So as a as another example of doing this in the in the domain of kind of perception and so decision making You know, we we used this is in the context of a task where a person is just told to Push a button every time they feel their heartbeat And so in this model We don't even have like an explicit policy selection part All we do is they start with a precise belief that they're in this start state at time one And then they have prior beliefs about whether or not They're going to transition into a state of feeling their heartbeat versus not so probability of no heartbeat probably a heartbeat And those priors are encoded In this in the b matrix here the transition matrix where sort of the higher the phb is the more they expect To feel a heartbeat, you know more often right on each trial And and then also there's a precision value here which corresponds to beliefs about how precise the actual afferent signal is coming up from the heart Um, and so we can estimate this ip parameter is interceptive precision and this prior over heartbeats Um, I should say we also compared this to a model that included learning in this task And in this case the model including learning didn't win um And so what we found here is And I should say also that they do this task three times They do it once when they're told they're allowed to guess Once when they're told they're not allowed to guess and only to press it when they're sure they felt something And one where aside from in addition to no guessing they're told to hold their breath which kind of makes it On average easier to feel your heartbeat And so it kind of amplifies the afferent signal makes it more precise What we found was that in healthy people the breath hold actually amplified interceptive precision a lot Whereas in all the clinical groups anxiety depression comorbid depression anxiety eating disorders and substance use disorders They just stayed flat low Interceptive precision values So the the actual Changing the actual precision of the afferent signal didn't have any effect on their beliefs about the precision of the signal Um, you know, so it's something like a like a rigidity in the way that the brain treats Uh afferent interceptive signals and psychiatric disorders transdiagnostically And so again, this is just a again kind of a standard like mixed model sort of analysis Whereas if you compare say estimates for prior expectations everybody Showed higher prior values in the guessing condition than in the other two conditions, which you would expect But there were no differences between healthy and clinical groups. So it's kind of interesting It says hey, maybe this is more a precision issue than it is a prior expectation issue In terms of clinical significance Or another task or another task has to be explored I mean if I put in a plug how cool that paper was I think there's been a lot of conceptual work on Interception active inference and there's been a lot of discussion about whether it's priors or precision or anything like that um And to my knowledge, I'm happy to be corrected about this by ryan, but um, that was the first time at that It's really been tested empirically, right? Uh, yes, there's no there's no other there's no other like actual formal fitting models to data Um studies that have tried that before nice epic work. There are there are papers that have tested like more kind of like Quote unquote qualitative or just kind of like go up versus down sorts of predictions that fall out of computational models But not actually fitting a model. Yes. This is the first model based analysis of this kind of thing, right? Yeah, and we've um and it's cool. We've actually replicated the results and healthy controls in a second sample now, so it seems like the effect at least the effect on health is is pretty robust We haven't been able to replicate. We haven't tried to replicate the uh, the lack of uh, of an effect In clinical populations. Yeah, but that's kind of in the works um cool, so So so anyway, so this continues this continues to go here But it should be I think it should be almost done Pretty quick. It's only does six people I'm surprised it's not done an entire reception question and then one question from the chat So what other inter receptive methodologies might exist? So you did a heart rate estimation task What other inter receptive modalities might be amenable to this kind of quantitative analysis? um, so definitely, um, definitely cardiac inter reception tasks are um um Are most common just because it's it's actually quite difficult the methodologies. It's pretty difficult to You know use so for instance in vision, right? You can like very tightly control or in audition You can really tightly control the timing and magnitude You know of like the of the the input signal, right? Whereas it's it's hard to say precisely control Um changes in the signals that you're getting from the inside of your body So so it's um, you know, it's difficult at least the heart is a kind of like signal that has a rate and um, and you kind of know precisely when it Sent the signal upward, you know and things like that. So they're definitely the most common There's lots of different ways that people do cardiac there's lots of different cardiac um perception tasks um A lot of them have come under a lot of recent criticism recently. Um for various reasons. Um Um, there's been some studies that show that um, for instance, like standard What are called heartbeat counting tasks? Um, where people are just kind of asked to like over a period of a couple minutes Just count every time they fill their heartbeat and you just kind of Look at how close their counts are to the sure not to a number of heartbeats. Um, there's a There's a number of papers including our paper actually that show that this looks like it's primarily just tracking prior expectations um It doesn't it doesn't tell you a lot about, you know, what the the way that they're actually treating individual signals um And I mean in ours in ours. We actually showed this that um That if you use the standard heartbeat counting measure instead of the heartbeat tapping, you know Which was our primary measure that we fed into the model um prior expectations in the model predicted the heartbeat counting Uh, like accuracy values like 0.9 something So so it was it was like a nice confirmation that yeah, this is primarily about priors And actually it points towards having a generative model that can then be deployed and modified and corroborated across settings Because then you could say well, what would be the task or what would be the trial and set structure? How many participants would we need to resolve a parametric difference of such and such? So statistical power down to a lot of other features would be influenced Okay, let me ask this question from the chat So that we well hold on I need to finish I need to finish that previous question. Um, so so But aside from cardiac stuff, there are um a number of other methods Um, there's so one that I currently do in my lab is something called um using like breathing uh breathing Uh resistances So basically you can you have these people wear this little kind of Darth Vader mask kind of thing And you can change in really precise subtle ways How is how much resistance there is when they try to breathe in? Um, and you can um, and you can get sort of individual differences in sensitivity You know or some people, you know feel Uh that change in like the how hard their lungs are needing to work basically Um At different at different loads at different resistances than than others And you can also use it as kind of like an anxiety induction Which is how we use it to try to you know precisely induce uncomfortable inter-receptive states at different Intensities and see how that affects things What about muscular? There's also Sorry, what what about muscular like how heavy is this or how what angle is your arm at? Um, I mean we don't consider that inter-reception. That's more like appropriate reception or um or some kind of sensation Um, but certainly I mean there's lots of tasks like that out there Um the kind of thing that that you know, we mean that my inter-reception is things like You know feeling what's going on in your stomach filling what's going on in your You know your heart or your lungs or um, you know like various like effects of hormone levels, you know Like stuff going on inside cool So, um, I mean I should say I can't talk too much about it But we have actually developed a method for Precisely inducing inducing with precise timing Um sensations in your stomach Um, and you we're currently just finishing up a paper on this using that method with EEG um to test some predictions of the neural process theory for uh Gastrointestinal inter-reception. So that's another thing that's kind of common looking forward to that Looking for that. Let me ask the second question if it's okay. Yes What are the similarities and differences between the active inference model of Emotional inference is how it was phrased and lisa feldman barrett's approach to emotion construction. So if you don't know Once I I just want to I'm just reading it off. But um Or That's probably that's probably a question for another time I mean for people who are familiar with you know, my actual like simulation work in this area I've you know, I and my group have published multiple Active inference simulation papers specifically about emotion inference um and those models so the So I should say it's not a straightforward question because you could build active inference models that are That would do more or less something constructionist Or you could build an active inference model that does something less constructionist So so it's it's actually not as though Um Active inference has something really unique to say about whether or not constructionist views of emotion are right um but um, but the um, but I should say that the the most straightforward models, um That have to do with uh constructionist views where where what you're trying to do is learn and infer emotion concepts um um Those the most straightforward ways of doing those are are via the sort of multimodal um inference inference process um, that definitely has a kind of when those mappings between um emotional concept like emotion state concepts and um lower level, you know sorts of like observations like for instance like observing your arousal level You know observing how negative or positive you feel things like that treating those lower level things as as observations um when the likelihood between those and um Emotion concept states at a higher level when the mapping is in the likelihood is uh probabilistic then um, you end up having to learn probabilistic mappings that look a lot like constructionist types of inference stories um, but what's kind of less clear is is that um constructionist views say don't say very much about um, how the kind of states in your body are generated in the first place um You know say like, you know, if I see a predator or something and I have I have this big change in my heart rate and like muscle tension and things like that um And then I feel that and I infer or construct right some kind of belief about what emotion that corresponds to in this context um There's a you also need a a story in an active inference model for the mechanisms that generate those responses before you make sense of them um And you know, we've we've talked about doing things like that with active inference models Um, and it's not so clear whether or not something like that would be consistent or inconsistent with um constructionist views Um, but uh, but yeah, anyway, um, so hopefully that helps the active inference doesn't solve the doesn't doesn't uh Take sides necessarily. I guess it's my point really The interesting I didn't know about that um theory So it sounds like something where we can use active inference to construct various kinds of models So all these orthogonal models and we're like looking always at the two by two on the active stream Is it going to work on this axis and this axis which debates are Relevant which ones aren't for active inference in the free energy principle. So you want to just carry on a little bit Yeah, I mean so I mean just to just to say I mean because this this actually comes back very well to the to the actual topic of You know today's session Um, is that to really solve these questions what you need to do right is you need to construct an active inference model That looks very constructionist and an active inference model that doesn't look very constructionist And see which one fits empirical data better, right? So Um, see it's it's more kind of a way of precisely testing Uh different, you know hypotheses of different models as opposed to you know saying this model must be correct theoretically Yeah, I think it's kind of important to When you're thinking about this kind of stuff people often talk about like active inference as a theory and as we've said like Multiple times there are multiple process theories that could fall out of this I think it's a very general framework under which you can build multiple hopefully competing theories um So yeah, I mean and in addition with computational psychiatry stuff You could just use you could think active inference is literally false Like as a theory, but it would still be you could still also simultaneously think it's an awesome individual differences tool Like that would be an odd set of beliefs to hold but there's no contradiction there Yeah um Yeah, no for sure Yeah, I mean this comes back to the you know, like Whether you think of models this kind of model is getting to like the ground truth versus just using it, you know instrumentally But uh, but yes So okay, so so last kind of thing here is is that You know in addition to being able to do standard, you know sorts of like You know again like inova's regressions, you know frequencies to broaches or even, you know base factor analyses In place of or in addition to them One thing that's really nice About getting parameter estimates the way we've been talking about is you don't just get Point estimates you don't just get posterior means Which is you know what we've been talking about, you know with like analyses like like that, right? Like this is just you know means of posterior means um whereas um The active inference so the parameter estimates and variational bays Also, uh correspond to posterior also have posterior variances, right? So it's not just the mean, but it's also the confidence that the estimation had around that mean Um, and so that's actually additional information that's useful to incorporate When you're doing group level analyses, um and um and uh parametric empirical bays which is um Something that again was initially developed and is primarily used for dynamic causal modeling um happens to work really well um with the uh With the the setup for getting parameter estimates, um and for the way that we get parameter estimates say I again like I said before already uses dcm scripts So you can actually do these parametric empirical bays approaches that are more or less general linear models but that also use the posterior variances um to get kind of Probably the most principled, you know a way of doing, you know like the fully Bayesian um Group level stuff that you know kind of thing that you could do with these individual level parameter estimates Incorporating both the means and the variances And so, uh, I think this Is this thing done? Okay, this has got to be the last one But uh, but um Anyway, uh, I had there is a thing that I can I can if it takes too long. I do have saved versions of this I just wanted to show you, uh Um, I was hoping to be able to show you the actual model comparison part Um, but so, you know as an example of using parametric empirical bays in this paper on substance use We did do this Um, so in this model in this task, it's pretty simple All people do is they just um repeatedly choose option one two or three um And on each one either a green ball falls or a red ball falls green ball means green ball means a win And in advance they don't know Which of the options has the highest payout probability? So again, it's explorer exploit, right? So you have to kind of try out different ones until they become confident which one's best and they kind of keep picking it um And in this case, um, we were able to estimate a number of parameters. We estimated action precision Um, we estimated uh that risk seeking parameter that I mentioned um, we estimated uh separate learning rates for When they had a win versus when they had a loss And we're also to us estimate this kind of a information sensitivity parameter It's probably too much to explain at the moment, but it's it's kind of like a belief rigidity. It's like an initial how uh The higher it is the less you should think you need to seek information basically um And what we were able to do with that is Over here, we did standard kind of group differences and you could see that action precision was a lot lower in substance users and um learning rates for losses were lower But we also did the peb version that the parametric empirical Bayes version over here where we could get these group level effects that included the means and the variances And um, and you could also see that these effects were were there and you could actually sell like the posterior probabilities and things like that um that were uh Um, again, it's just a it's just a kind of nicer thing to do that would give you um That'll give you additional be able you'll be able to incorporate additional information and keep things fully within a Bayesian framework um and um And so if this thing is Still are you done? Uh, I have another question we can ask if you wanted to just let it run a couple minutes. All right so Just curious Christopher said that he'll be using this in his phd. So what is your phd project? Or I know it's it started the phd and everything, but what are you thinking about using it? Um, that's the first question and then there'll be a follow-up part um, so I work in Alex Wargar's lab who's a pi at The mrc cognition brain sciences unit and her lab studies cognitive control um, so I'm planning on doing a lot of work on cognitive control at the moment the So I don't really want to say too much about what projects I have Yeah, most just what questions are you curious about or what what you think are exciting? So mostly I'm really curious that using well using active inference to help us think about and formalize some otherwise kind of less formal hypotheses in The realm of cognitive control and then using those kind of generate some testable predictions Although yeah, one thing I should say is like, I don't think the active inference is always the best tool to answer all questions you might have Depends on what level of analysis you're working at I happen to be really interested in like algorithmic level questions where active inference is really good Um, if you're interested in implantation level stuff, this might not be the best tool to you Great and so the second part of the question was are there any FEP or active oriented supervisors at Cambridge But if there's yes or no to that one, but more generally I'd like to hear both of your perspectives on like Let's say somebody were starting research as a graduate student in a lab that wasn't In this actual line of research already So what would you convey to a starting grad student or starting researcher who was looking for a phd mentor Or who wanted to do a research program that was like aligned with a lab that was studying something cool And interesting, but they wanted to take an active inference perspective on it Um, okay, just so just so people know it is now done. Yeah, yeah, but go ahead if you do want to answer that question quickly Um, I was just gonna say I have a couple things to say. I think it just depends on your personality really uh Are you the type of person who doesn't mind doing a lot of things alone and without much help? In which case you can do something That you or other members of your group aren't doing if that doesn't bother you then go ahead. Um, this tutorial is a great resource um If you are working in a lab I I would genuinely just try and find a lab that studies what you're interested in. That's probably the good advice I mean, so one thing to say is like ryan is I haven't signed the paperwork yet, but um, he's gonna be my associate supervisor, so I do have someone in my like Supervision group who can actually help me with this stuff And so I would always say make sure whatever you're doing. You actually have people who can help you Great point. And I wonder if in the remote world It will be easier to say I remember it was kind of a big thing to have I didn't even have this but like somebody who's a committee member from another university So now it might be a little bit more possible to connect remotely anyways ryan I'd be curious to hear your thoughts on that and then we can go back to the model Um, sorry, I don't think I fully uh, just just a starting researcher somebody who asked to be in your lab But they were you know, there's not room for you them in your lab or they're in a different lab already But they want to do something with Research in this area. Oh, I see. I see. Yeah. I mean, so that's a very tricky question I think I mean when you're deciding where to go to grad school I think it's really really important that you find somebody and that your company you found somebody Where you uh, where there's a good match between you and your and your supervisor You know, I've seen cases where people Start in labs and there's just a not a good match between your supervisor's style And your own style as a student And you know, a lot of times that's a recipe for Either a really tough time getting through grad school or even deciding that maybe you don't want to be in science anymore I've seen I've seen that personally You know like the case with other people whereas if you find somebody who You know kind of matches your style, right in terms of like how much direct Micro management and supervision you want versus how much independence you want You know like having a supervisor that really allows you to be creative and come up with your own projects versus someone who You know would prefer that you're working a lot a lot of stuff. That's sort of directly Sort of more narrowly in line with what the lab's already doing Um, you know, I mean all sorts of supervisors will have Will have You know types of grad students that work well with them, but finding the right matches is really really important And part of that, you know does also correspond to common interests I mean if the question is, you know, I absolutely cannot find somebody who does what I want to do That is really tough But I would say in that case, uh, you know what you were mentioning about, um You know kind of Uh remote or uh co-supervisor sorts of situations where as long as you can get Your primary supervisor who's a decent match to you in general to be okay with you having some remote other supervisor More mentor of some sort that will do the sort of thing that you want to be doing methodologically And there's enough kind of overlap and interest between Really, you know primary mentor and co-mentor Um, then I think that can work well, um, you know in those cases, but um But if at all possible identifying a primary supervisor who does the the methods and is and has the same interests, um is is You know, definitely the best option if you can um great advice But um, okay, so all right, so this is now done so you can actually see what the script does once it's done Um, so you can see so what it will spit out. Um, I should say this is also a new feature I just added this to the uh to the to the um tutorial scripts to this main tutorial script So if you go to the github and download the newest version, then it will generate these these uh scatter plots for you. Um But um, so it will show you for instance So when we only did 16 trials, right? Then uh recoverability for learning rate for the three-parameter model wasn't awesome, right? The correlation between true and uh estimated parameters was only 0.58 um Whereas for risk seeking the correlation was 0.95. So it's super recoverable and for alpha it was um I don't know. It's not showing up. Uh, but it was 0.42. So not great um, whereas for the two-parameter model risk seeking was 0.95 and alpha uh was um 0.94 so I should say Depending on the values the r and p values might not show up on those graphs The way I have it set up, but it will print them out for you here Say alpha recoverability 0.94 p equals 0.046, etc. So it will give you all of that. It'll just spit that out for you um, it will also tell you the average law of likelihood under the two-parameter model and the average um action probabilities Of the uh two-parameter model and the same thing for the three So the fits are actually very similar In terms of the uh, log likelihoods and the um action probabilities um And so when you actually calculate the protected exceedance probability, which it also will do for you pxp um It will show that the um in this case the um the second model So the model with learning has a uh has a higher exceedance probability than the first model But this is actually not necessarily a very clear winner here I mean what you're hoping for is something like one and zero or you know point point eight point two point nine point one something like that um, so um And again, this is just an example. This is not going to be reliable at all because I did 16 trials Right, you know if you did a real task if you just use the 32 trials that I actually have in the in the real code on the github It will be better than this, but you know most actual behavioral tasks You know like for computational modeling studies are going to have You know like 100, you know or more trials Um to really get kind of precise estimates and and know how good these models are actually doing um, so But um, so so that's the kind of thing that you get here at the end So you'll get so I should say this trajectory thing is not that meaningful It's just showing for the first and second parameter What the trajectory of the change and the parameters are as it converges toward the posterior values But when you have more than two parameters, it's not that informative. Um, so So, um You know we have in the paper, you know, we have this figure that more or less just kind of depicts An example of what this kind of thing looks like Um and more or less as long as this kind of you know iterations convergent has this kind of nice increasing Slopey thing to it and you're probably fine, but if you're doing this and you find that Through iterations it kind of bounces up and then bounces way back down and then bounces up and it looks really kind of inconsistent Does that this nice kind of slope to convergence? That's usually a telltale sign that you're running into some local minima or you know something's going wrong Um, in which case you might want to like tweak your priors and see if it works better or you might that might tell you There's just something wrong with your model So it's a good kind of diagnostic I just would love to hear like sometimes it's hard to grasp How can it be converging downhill like you like we were talking earlier about the difference between sampling based methods and the factorized methods So there's obviously a world of difference in the setup and in the computation But like how is it that just doing something like minimizing free energy does take us to these Acceptable parameter ranges if there's any Sense that you have of working so closely with it um Like what is it that that what is it about that accumulating bars? That leads us, you know inevitably seemingly at least reasonably to an acceptable set of parameters in a vast space What is actually happening there that somehow it doesn't converge into a local minima or you know, which way you go Doesn't converge on a local solution set Oh, there's no there's no guarantee that it doesn't do that This could easily this could easily converge into a local minimum. That's not the global minimum Um, you know, I mean all this is literally all this is saying is that there's a kind of smooth You know like locally convex, right like gradient descent towards some stable value Um, but no, I mean that could totally be a local minimum. There's nothing that that stops it from being that It just means that there's a kind of clear slope. It's not like a super kind of like bumpy weird You know parameter landscape, but then that we need to inject some layer of sampling first with many starting positions in order to to get Good meta conversions because it's not enough to look at one model trace And say like yep, well this one converged that model converged somewhere and it settled in given A very complex free energy function But then we need to pull back another level, right to get traces from many points in the landscape Um, well, I mean that's just uh, that's just another approach that is a potential, you know, like not You know like trying to not testing out a bunch of different prior values and seeing if everything converges from a bunch of different prior values to the same spot Um, like yes, it has a limitation because this could represent a local minimum but if it if it um stably converges on um like different Reliable, you know parameter estimates for each person, you know, which which corresponds to a range of places in a landscape then um Then like those will be stable individual differences You'll probably get slightly different values if you start your priors at a different place Um almost inevitably because the complexity cost will be less if you don't have to move as far to find accurate parameters but um So long as you're getting interesting variability and convergence for each person has a nice kind of slope like this then It's not like everybody is is getting stuck in some particular You know single well or a tractor of some kind Um, but again the question isn't necessarily what the true one is per se. It's You know, like can you find estimates that you know provide interesting individual differences given a set of priors? um You know, I mean, it's it's it's important to realize too and we mentioned this in the paper that um You know typically the model the model that will win the model They'll have the best parameter estimates will typically be simpler Then whatever the true underlying generative process was in the person that generated the data because typically there are simpler explanations um Then the ones that generate the ones that actually generate Um data in a complicated system like the brain Um, you know, so so you're not necessarily treating this as though it's the true one But if it gives you nice convergence for each person and you're getting nice individual differences Then it still is a it still is a reasonable measure of you know of individual of interesting individual differences um And if you can simultaneously do things like Show that parameter estimate parameter estimates correlate with other things like um, so just you know, I happen to have it up um You know, so this is this, you know ee g It's ee g study that uh, I'm uh in the process of putting a paper together before and we have parameter estimates here um for interceptive precision um in that gastrointestinal task task. I mentioned now Precision here correlates with reaction times negative point negative point seven four despite the fact that the model is not fit to reaction times Um, and it's a direct prediction of the neural process theory that as interest, you know Higher interceptive precision should mean faster evidence accumulation Which should mean faster convergence time. Which should mean shorter reaction times Right, so, you know if you have you know, so you can separately validate that your parameter estimates are tracking What you want them to be by seeing what they correlate with? Um, you know in a way that they ought to correlate with for construct validity for things that didn't get fed into the model Um, you know, so there's there's there's lots of ways of doing this. Um Um, but but you're right. I mean if you using other approaches that use multiple seeds the kind of like the Monte Carlo stuff That you were talking about before um, that is another approach and um, you might be more likely to um, You know find like a global minimum But uh, but again, they're just they're just different approaches There's just a quick plug So I think the model that's kind of most closely related to active inference and is actually fitable Fitted can be fitted to data is probably the hierarchical Gaussian filter Um, and I think they're actually Gen at the moment it's estimated the parameter they're estimated they're using variational bays But as I understand that they're also I don't know if they have developed it or are still developing But they're also developing Monte Carlo Markov chain version Um, very interesting Yeah, this is all just Yeah so So okay, so last thing here just uh, just to um to You know to kind of you know wrap up everything so at least people know what Know what's available, you know for them to kind of expand on this in our example code that we've um, you know set up is um So down here, uh when we actually do the um Whereas this uh So this this right here Is the function for doing model comparison So SPM under score bms Bayesian model selection You just feed in the free energies for each person. So for instance if you You know if I just uh If I so I just it will just store these right so for instance for um So these Are the negative free energies for um my six simulated people for their best fit models Best fit parameters for the two parameter model whereas that's it for the three parameter model and And so Bayesian model comparison is just um Just feeding you just feeding both of those vectors into column vectors into that function And it will spit out the Expected or predicted exceedance probability for you as well as the number of other sort of diagnostic checks And you can um Read uh, there's good papers, um my class stuff on and um Anyway, we we reference them and then we cite them in the paper um that Describe this in detail. Um, if you go into the actual, uh, you know function itself It also will tell you exactly what the inputs are and exactly what the outputs Mean right so alpha is just a vector of the model probabilities Again xp is the exceedance probabilities before doing the protected part and so forth um so Once that's all done um And I should say I'm not going through every kind of line in this but we comment to You know as well as we could and we hope it's we hope it's clear as you go through, you know, what we're doing But it also will spit out right cleanly into the Into you know, the main uh terminal here what uh, you know, what the outputs are um So if you want to you can kind of hopefully adapt this code for your own purposes to do recoverability and Um do model estimation and stuff like that. Um um But once you've done that um, then You can do I was just going to show you briefly the um the hierarchical base the peb Stuff and for that we put things into these gcm Structures so gcm for model 2 gcm for model 3 And you can just pick Which one um here that you want to that you want to use Right, whether you want to do peb on the two parameter model outputs or the three parameter model outputs um and This just sets a bunch of defaults in peb. Um, you shouldn't have to mess with any of those um But if you but if you run it so I've so what I've done is in the glm and the general one of your model peb I've just set the mean value across subjects. I've put a Uh regressor for group. So this is saying compare groups Um And then I've put in this sort of fake age variable that just with random numbers basically Um, so if I run this section on the three parameter model Then I'll get you know, it'll do this kind of estimation thing And it will spit out a bunch of things um And um, basically the way to read this is um Uh ignore that so this one just ignore Uh when it spits it out this one will tell you Um, for instance, this is saying like for uh, so this one is probably the best one Um, so these are the reduced models. So the best model Removing the parameters parameter differences that Didn't matter didn't win in Bayesian model comparison. Um, so this is saying parameter two and parameter three Were different between groups Um, and were and remain different between groups in the best fit model Um, and again, we we explain this all in more detail in the paper. You can really ignore These ones at the bottom In the in the figure in the paper, we show you which ones are Which ones matter to look at? So it's just these more or less those top three So the you know the estimated differences and the estimated differences that survive model comparison So kind of pruning away the parameters that don't stick around in the best fit model So in this one, this is just say what the pink bars are because to me it wasn't immediate They're they're just the variances. They're just the variances in the posterior estimates Um, so so uh, so yeah, so I mean for instance, so you would expect right this one has a really high variance So you might expect this one wouldn't survive Um, you know as a winner in the in the reduced model. Um But even even this isn't probably the most useful the most useful Um, is the actual like peb review parameter gooey See, so this is the actual GLM. So this is just saying the mean the group difference So this gray light gray versus dark gray because the first three subjects are a group the second three are a group This third column is just the randomly generated age values Um, so you can just say For instance, you know, which of these which of the means are different. Well the means for one and two are different Um, you know, so there's a main effects essentially of those two um You could go to group And you could say okay. These are the two that are different Um, but you can put a threshold on it, right? So Weak evidence, you know posterior probability has to be above You know some ever some value for positive evidence Right, so if I make it if I give it a higher it has to there has to be decent evidence for it Then only the second parameter here Is the one that actually has good evidence for a group difference um You know for example, so you can You know, but if I make it required a very strong evidence So a posterior probability greater than 0.99 then the whole thing gets removed um, so So this is a nice and you can you know, just do it in terms of free energy or just in terms of uh probability That a parameter is greater than zero You know, so there's different ways to uh different ways to do it But so this is actually probably the nicest thing for navigating um the The results of pebb that will matter if you're doing group analyses using it um um, so um, so that um, so that is pebb um And um, yeah, I'm trying to think if there's anything um Anything else really um because that's I mean that's primarily where where things end. So I mean at that point you should be able to um Should be able to do everything um I guess the Yeah, I mean I was gonna briefly I did skip over this just the This was the thing I mentioned that philip um did where he was able to do kind of like within trial by trial um model predictions as opposed to these just group differences and show the trial by trial to Beta updates um correlated with this midbrain dopamine area, but um But yeah, I mean Really, I mean that's more or less The steps you need to know right so so what we talked about was you have to have participants perform the task You build one or more models. We did two Find the parameter values in each model the best reproduce The data right the behavior we did that we did model comparison We use the correlations between true and generative parameter or generated and as generative and estimated parameters to make sure the winning model was identifiable and Then you could do either normal group levels frequent of statistics or something like pep um to test for between group um or uh just uh Yeah group level You know between subjects sorts of analyses on parameters at the end of the day um So I think I think that um Yeah, I think that really covers um Most of the you know kind of meat of it. I mean the only oh, okay. Sorry one other thing that I do need to show you um is that the um To actually do the parameter estimation um that calls Where is it? Um That calls this estimate parameter script, which is one that we also included So if you open that just right click it and say open um Then this is where you set your prior means and variances So here So it says here we specify our prior expectations for parameter means and variances, right? So we say that um So this is where so in this case I just set the same prior variance for all the parameters You know one over four you don't have to do that But so here you can think about it as the smaller the value you put here the greater the complexity penalty that you're adding Right, so the smaller that is the more you're going to prevent overfitting um But if you make it too tight then then the posterior estimates probably won't be that accurate um So so in this case what you can see that we've done is Um, so if you're estimating alpha so action precision Then this is where you'd put the prior so we put a prior here of 16 um We log it here. Um, just so it's in log space And that just makes it so during estimation It doesn't allow that number to ever um become a negative value So it keeps it it needs to stay a positive value During estimation if if estimation ever tried to test out a negative value for that parameter then the thing would break Um Same thing if uh if we were doing beta, so that would be um, which we didn't try here But I just let we include it if you want to so this would be the uh policy expected for energy precision So there the priors one and again, we're just keeping the prior variance Um equal to one fourth, you know up there for everybody Um, this is the loss aversion which we didn't include But is in there if you want to that's the risk seeking parameter and we gave it a prior of five And then um Atta the learning rate. This has to be not just any positive value. It has to be a value between zero and one Which means that you have to put it into a logic space as opposed to log space Um, and that just means that to set the prior you have to make that Whatever your prior is and then also Put that same number here So in this case, we've given it a prior for a learning rate of point five Which is just kind of halfway between zero and one So it's just kind of middle of the road, you know, which you know would make sense in a lot of cases um, and then More or less this function down here Just turns those back into their normal values. So it exponentiates the um the log values Um, you know and so forth. Um, so it just brings them back retransforms them Um And uh, this is just where we um for technical reasons. This is where we set what the uh, what the um values are for risk seeking, um And uh, and then this is the actual log likelihood So it will just for each trial it will add the log likelihood So the mdp.p here is the probability of each action Um that it will show that just is in the mdp structure, which again, we Showed last time in in other sessions and in the the actual um, we have a table in the um table Three I think in the tutorial that says what each mdp field is So it just takes those and just adds them just logs them and then just adds them up Across trials to get the total probability and then the the less negative this is at the end the better the the Um The better the the better the fit is Um, so it's important to know about this one just because this is where you would change your prior means and your prior variances um so so um, so yeah, and then and then Uh, this for instance, this dcm field thing It's just set up so that you would just enter the name of whichever parameters you want to estimate for that model So here if you put an alpha and rs it'll estimate Alpha and rs But you could also add, you know add a here right if you wanted to you could add Um That then it would also estimate Atta Anyway, I mean, I'm sure there's other little things in the in the code here that could be explained in more detail But um, you know this this hopefully will be enough to to get people going Well, thanks so much Ryan and Christopher this was really an awesome series It was our first model stream and it was really just A great learning experience for all of us. So i'm going to give some final um comments from the chat. So Uh Someone wrote thanks for the author's continuous updating of the paper It will be great if these uh slides as in their current version are somehow made available at the end of the episode So maybe like you mentioned doing um a supplemental file or something for the paper. So um That was one thing for the For these just for these slides. I like to put them Maybe so someone could say oh that was slide 87 at hour three of this thing And then we could just have this version So that people could know where that figure was because I know that there are probably in other places Are just copied over but that would be one. Is that like I mean, I guess I can imagine I could imagine putting together a power point that just kind of You know kind of concatenates all of the slides that we've gone through over these um over these sessions The vast majority of them will be just the same as the figures in the tutorial itself um, but the best the best I could do probably is to Put that as a supplemental file um in the um for the supplement or the supplementary materials on the sci archive Uh version That's probably that's probably the only thing we could imagine doing We could have a first um page on the pdf or something that just says here's a link to the these streams as this version So that'll be one thing and then a related question was um, just that future people wanted future tutorials and model streams. So Either both of you it's an open invitation to Just come on whenever the time is right and talk about any kind of modeling or all these other cool ideas we've been bringing up So, I mean, I'm happy to continue this um, you know and a more kind of You know on the fly, you know way as people Request things. I mean, but but at this point, I mean, I would I would probably need Or we would probably need Any kind of requests, right? Like what is it that people want to cover because I mean now we've kind of gone through the tutorial start to finish At least in broad strokes Um, so I would need to know at that point what you know from now forward I need to know what else people would want us to cover because you know, we've gone through most of it Cool. Well, this tutorial definitely took us to kind of the brink of all of these papers. I maybe it'd be interesting to Walk through a paper like a tutorial of a paper, especially this interceptive one Like how would we adapt that for other Do kind of walk through that'd be one thought or we could go into the formalism side We could invite a colleague or collaborator who came from a different perspective more on the analytical More from some other dimension That we're that we are just also all learning about Yeah, I mean, I should say I mean, there's certainly a lot of areas of like the broader kind of free energy Liter free energy principle literature that we're totally not covering here intentionally Right, like our focus is this is what you need to know to build models and use them experimentally, right? So, you know, for instance, there's a lot of other stuff in for instance, like the the physics formulation For for, you know, that also gets called right part of active inference Um or the free energy principle more broadly that includes things like for instance, like Carl Friston's kind of variant on talking about Markov blankets Um, you know, or I mean, there's a lot of stuff on like a non-equal or brim steady states, right? And how that how that can be talked about in terms of Minimizing variational free energy, but also a lot of the physics You know, like thermodynamic, right like free energy and things like that also come in and that stuff's quite a bit more theoretical and It's certainly interesting, but it's not Not things that you need to know or all that directly relevant to actually build models and do the stuff in practice. So Yeah, there's other places other places to look to that can they get that kind of stuff? I guess this is what i'm pointing out is ours This is not true completely comprehensive of what gets called the active inference or at least free energy principle or sure So any other final thoughts from either of you? No, um, it's been really fun. Thank you. Yay. Great. Well, it was really fun Yeah, and like I said, I mean if if people want to like, you know, like go through get walks through specific You know papers either like, you know simulation either some of like our simulation papers or empirical papers in more detail So they could get a sense of how they would do that in practice. I'm happy to do that But um, I guess that's something that we can just kind of Uh determine at a later date Nice, so we will wait for somebody to stimulate us to do a walkthrough on a certain topic or on a certain um With a certain distribution of people we'll figure it out, but we'll um We'll figure it out then so really thanks again both. I'm going to finish the live stream So peace out everyone. Thanks for watching active inference.org Thanks so much. Ryan Smith, Christopher white