 Hello, everyone. Welcome to the Active Inference Lab. This is the first Active Inference Model Stream, Active Inference Model Stream 1.0. And I'm really excited for today's conversation. I'm Daniel Friedman, and just to introduce the other participants today, Ryan, go for it. Yep. Hi. I'm Ryan. I'm from the Lurid Institute of Brain Research. Hi. I'm Christopher. I'm a PhD student at the MRC Cognition Brain Sciences Unit, which is based at the University of Cambridge. Hi. I'm Max Murphy. I just completed my PhD at University of Kansas in bioengineering with a focus on neural engineering. Awesome. Thank you, everyone, for participating and for Ryan and Christopher, two of the authors of this awesome work we're going to be exploring. So this is the first in a several-part series that is going to be highlighting several perspectives and addressing questions related to the Active Inference Tutorial Paper of Smith et al. called A Step-by-Step Tutorial on Active Inference and its Applications to Empirical Data. So the idea here is for those who are working with empirical data to learn about Active Inference method and also for those in the Active Inference community to be learning about some of the methods that apply Active Inference. If you're listening, you're participating, and if you have any questions during the live stream, feel free to post it in the YouTube live chat and we'll try to address it during or after this presentation that we're about to get. If you have questions after the live stream, please feel free to leave it in a comment form and we'll try to address it and integrate your input in future sessions. And to learn more and to participate, check out activeinference.org or any of the information in the video's description. So that's all the information or metadata for this video. The way this is going to work today is we're going to do some introductory questions, just sort of like asking what in general is this work about, what motivated the authors to write the paper the way that they did, and then both Ryan and Christopher are going to share their screens for part of the presentation and they're going to show us a few different things about the work that they've done. And then we have a couple of questions prepared on our side, but also we're going to be looking at a live chat if anyone has questions. So just post it whenever you feel like it in the live chat and then we'll again try to address it. So as I stated, the intro questions and then we'll go to the presentations. So first intro question to the authors is what is this work? What is exciting about it? What motivated you to work on it? Okay. So just to kind of reintroduce myself a little bit more. So I'm Ryan Smith. So I'm an investigator at the Laureate Institute for Brain Research in Tulsa, Oklahoma. And the focus of our Institute is primarily on neuroimaging and sort of neuroscience approaches to understanding psychology and psychiatry with the focus on sort of treating psychiatric disorders. And so for a while now there has been the use of simpler computational models, primarily reinforcement learning models or drift diffusion models, things like that out there for researchers who are working with empirical data. So there are good resources out there for people to learn those methods to apply them to data in their own research. However, at the moment, so active inference is a sort of much newer field, especially it's sort of a formulation in terms of partially observable markup decision processes. And there isn't really to date a really clear sort of combined place to learn the sort of practical methods to build these sorts of models and to then sort of apply them to task behavior in empirical studies. So the kind of motivation for this paper, this tutorial was to allow somebody, so new students or somebody who's sort of like an early junior faculty, things like that, who wants to go in this direction to give them the resources they need to do that as easily and as thoroughly as possible. So even if you start out not really having any background in this stuff, the hope was that the first section of the paper would allow you to kind of get enough of a background in theory such that you would understand how to then learn how to build these models, specifically for empirical tasks, and eventually to learn how to fit those models to data so that they could be used in your own sort of research, even for across many different fields. So I would say what's exciting or motivating is primarily just that if you're a new researcher and you're interested in this, but you don't know how to do it, the hope is this is kind of going to be like a one-stop shop where if you can get beginning to end, then you'll know how to do what you need to do to start using it in your own research. So that's the major driving motivation. Awesome. I love that one-stop shop for a researcher who wants to apply the methods but doesn't really know where to start. So that's really a great idea. And you have versioned the document many times to really address people's input so it's cool how it's an evolving document as well. So Christopher, what would you say in response to those intro questions? So I think there are two parts of my motivation for writing this. The first was the selfish motivation in that I think if you want to understand something, the best way to do it is actually to write a tutorial on it. And so when Ryan asked me to do this tutorial with him, I was jumped at the opportunity just because it would be an opportunity to kind of actually learn this stuff better myself and really get into some of the more technical details. And so then more broadly speaking though, I think there are lots, in principle there's nothing kind of, applying these models in practice is no more complicated than doing like model-based reinforcement learning. And yet it's incredibly more common to see researchers working with various more sophisticated RL schemes. And I think if you are a graduate student maybe starting in psychology or neuroscience you're like, hey, I'm interested in whatever it is. Decision-making under uncertainty. And I mean, we have a finite time on this Earth right and a finite amount of time to actually dedicate into learning things. And so I think it would be perfectly rational if you had someone to look at active inference research like, wow, this is super technical. I wish I could learn it, but here's this other thing where I can go on Neuromatch, I can read Sutton and Bartow and really totally get up to scratch and use it in my research fairly easily. And there just wasn't something really accessible to help people actually apply in their own research. So that was kind of my direction. Yeah, I mean, I should say just as a little background, you know, when I wanted to learn this stuff originally at the end of my postdoc before I took my current faculty position, I, the only way to do it, the only way that I could learn it was actually to go, you know, hang out at the Phil in London with Carl for about four months and, you know, just ask people a ton of questions and sit there. And that was the only way, you know, so without having to, you know, physically go to one location in the world that was very far away from, you know, my current institution and without doing that, there really was no way for me to do it. So I mean, this is sort of out of empathy for my past self that, you know, this is the sort of thing I wish I would have had available to me so I could learn this stuff independently. Cool. So Max, what is your background or what got you really excited to work through the paper and develop a lot of the examples out on your own computer and everything? So what got you excited or where are you coming from today? Sure. Yeah, so maybe I'm kind of your target demographic in some sense. I don't know that I could really be considered a junior investigator, although I hope I could someday be that. And I'm interested in, I'm much more on the motor system side of neuroscience as opposed to the decision making. So this, just the whole lexicon, it's a little different, but I am seeing so many commonalities when I'm coming from a signal processing background, I think of things in terms of Kalman formulation and trying to integrate sensory and motor information. I see a lot of similarities there. And so for somebody like myself, then looking at some of Friston's work and seeing how he's described, you know, the anatomy of inference and could this apply also in motor systems? I'm very interested to see how this might apply in my own future research and work. Cool. So let's just say that you were speaking Ryan first and then Christopher to that early investigator, researcher of any age who is like, okay, I get it. In reinforcement learning, it's about reward or reward centric learning is about reward. What is active inference? How is it taking a different perspective on what it is that organisms are doing than that kind of classical account of reward based learning, for example, or whatever you'd like to contrast it with. But what is the bridge from the kind of assumptions and implications that lead somebody to work on that classical framing versus what is the difference in thinking that is active inference and how is that manifested in the model? So I think this is actually a really important thing to bring up right from the start is that there's a kind of big distinction between when people talk about the free energy principle broadly or the philosophy of the free energy principle. There's a very kind of somewhat wide chasm between that and what gets called active inference nowadays where active inference is a corollary of the free energy principle, but active inference as formulated in terms of partially observable, partially observable Markov decision processes is quite a bit narrower. It's a very specific sort of discrete state space generative model that has particular sorts of elements to it and it doesn't actually require knowing a lot of the things that people talk about with respect to the free energy principle more broadly. It appeals to free energy and expected free energy as functions, as more or less cost functions for figuring out what the best choice to make is. But at the end of the day, I mean, when you apply these sorts of models to behavioral tasks that you used in studies, it's actually there's quite a bit of similarity. I mean, you can really see active inference models or at least the MDP formulation as just a particular, it's kind of like what Chris said, as just a particular kind of flavor almost of model-based reinforcement learning. So instead of, I mean, there's some sort of technical differences right where the way that sort of Carl and them have set it up, what you're doing is kind of like it allows for kind of a fully unified Bayesian way of doing reinforcement learning and decision-making processes. And you do that by instead of calling something a reward per se, you define this probability distribution over observations that gets called a preference distribution. And so reward then becomes a kind of probabilistic preference to observe some things over others. So if you call a particular observation a reward, then calling it a reward just amounts to having a precise preference distribution that has a precise high value over whatever that rewarding outcome is. And then the agent is simply driven to make decisions that it thinks is most likely to get it to observe the thing it prefers. So this preference distribution is more or less a way of specifying what is rewarding to the agent, what the agent is seeking. One thing that is kind of nice about active inference that's a little different from reinforcement learning per se, or there's a few things I should say. One is that making decisions based on trying to minimize expected free energy, which Chris will cover, doesn't just try to maximize reward. It also simultaneously tries to maximize information gain. So what will happen is, say an agent starts out in a dark room just to take a very kind of contentious philosophy of the FEP sort of example. Say an agent starts out in a dark room, which means that it doesn't know where anything is. So that means it has a lot of uncertainty over what state it's in. And let's say simultaneously in another room, there's a fridge with some food in it, then there will be two different sorts of drives there. There'll be this epistemic drive to do whatever it thinks is going to give it the most information gain, which will be go turn on the light. So then it'll know what state it's in, whether it's in room A or room B, or whether there's a couch in the room or whether there's a TV in the room or whatever. And that's not reward per se. That's just trying to choose the action that's going to maximize the amount of information you get. The other chunk of it though is to maximize observing preferred outcomes. And so the agent will also be driven to just leave the room to go to the fridge to observe itself eating food. So in practice what will happen that typically won't happen, at least in a standard reinforcement learning setting is that the agent won't just try to maximize reward directly. A lot of times it will first choose the action that will help it figure out where it is so that it's more confident what to do to get the reward later. So it's really not, again, in practice, these models are just a nice, fully Bayesian way to integrate perception, learning, and decision-making where decision-making is driven to both maxima's information gain and maxima's reward. Christopher, thanks a lot. Anything else to add there? No, not particularly. I think the only thing I would say, and I'm going to say this in a little bit if you're interested in the technical details of what Ryan just said, there are two really good papers. Maybe we can make these slides available afterwards as well. One's by Norsajid, and I'm not sure if I'm sorry if I butchered your name there. The other one's Lancelot de Costa. And they have two excellent papers comparing it to reinforcement learning. Formulations. Excellent. Just to draw out a couple points, Ryan, from what you said. One is that the active inference framework and specifically the POMDP, the partially observable Markov Decision Process instantiation of active inference is a corollary or like a derivative of the free energy principle. And there's a lot of philosophy and contentiousness around different aspects of that. We're going to read a paper in active inference stream 14 by Mel Andrews about the math is not the territory, about the philosophy and the status. Yes, if you want to go that way, there's a whole rabbit hole there. But this is kind of like a tool, like a linear regression. You don't get caught up in the weeds on number theory or on the information space. It's about where the calculation is useful and tractable. It's going to be a tool. So that's kind of where we're coming at, at least today and in this series. Another interesting thing and in contrast with reinforcement learning, which is kind of like reinforced what works and don't reinforce what doesn't work, that neurons that fire together, wire together or positive reinforcement schemes. There is a reward preference built into that kind of model. But when there's a basin of low reward, it's often difficult for those models to latch on. So we saw that in the paper of Alec Chance et al. in active inference stream eight, scaling active inference that kind of showed how even up against these very, very large scale machine learning models like Q reinforcement learning and things like that, other state of the art deep learning models that the active inference trolley car and control theory robots were able to work really well because first they did what you just described, Ryan, which was they kind of went into an explore mode before going into a more fine tuning mode. And so in doing so, they transcended the explore, exploit, simple tradeoff. It's not just a knob in this model. It's not a coefficient to balance explore versus exploit. I hope we'll draw out an understanding that they're actually related in a different way because there's something that's model based and generative that's happening. And so really a ton of interesting stuff. So at this point, I would think we could go into the presentations. If people have any questions that arise, put them in the YouTube chat and I'll be copying those out for addressing them later. And other than that, let's just work through the presentations. So whatever Ryan and Christopher have set up and preferred. Cool. Go ahead and just share your screen for us. You can just go ahead. Yeah. Just kidding. Okay. So I'm using a dual model to set up. We'll see how this goes. Okay. Sorry. I now have to open system preferences. Sorry to allow Microsoft Teams to use it. Sorry about that. Okay. Good to digest. Really. It's really. It's interesting stuff and we're all learning by doing here. So I just really appreciate and have just seen a lot of appreciation for this kind of work because in some ways it's like the missing piece between a lot of these hypothetical or abstract or scaffold models or what people have heard about with respect to what active inference is and then how it's deployed, how it's enacted. it's enacted. So to take the inactivist insight seriously as practitioners of science and as communicators as well, it means making this kind of work and showing us how you do it as we all are trying to like figure out every level from literally the tech to the best way to make it rigorous and accessible, the best ways to talk about it. So yeah, well what do you think about that, Ryan? Yeah, I mean, I totally agree. Hold on to Chris. Yeah, actually, I've never used this. So there should be a little thing in the upright corner. It's like an arrow pointing into a square. Share content. Are you guys both on PCs? Yeah, yeah, okay. So yeah, so it should be upright corner like right next to the leave button. Okay. Or Ryan, you could present first or? Yeah. I have a question for me to present first. I think I think, you know, yeah, what I'm gonna do is not gonna be so comprehensible before Chris. Got it. I actually have a question that will take a couple of minutes. Christopher, while you're figuring out is what tools would somebody need to follow along? So what programming language or what interface do they need? What background knowledge do they need? Is there a software to download? Is there a course? What are the prerequisites for doing what we're about to basically jump into? Yeah, so I mean, I'll just say that that, you know, a lot of this stuff was originally designed by Carl Friston. And if you know a lot about Carl, you know, before he proposed a lot of this act of inference stuff, you know, he was sort of most famous or his biggest contribution was putting together SPM within MATLAB. SPM is like a, it's called statistical parametric mapping software, but it was more or less the original way of doing fMRI. So like analyzing functional and like neuroimaging data. And so a lot of the scripts and dynamic causal modeling, I should say, which is a particular sort of approach to doing neuroimaging. And so a lot of basically all the standard resources, all the standard coding routines, everything for actually building active inference models right now is primarily an SPM, which runs within MATLAB. So and all the kind of supplementary tutorial code that we've provided, we've provided I think six, six or seven different, different supplementary scripts where you can actually run the simulations and build these models yourself. They are, they're all in MATLAB. So if you have MATLAB, then you need to have downloaded SPM. And with MATLAB and SPM, you will have everything you need to do to open our supplementary code and to and to follow along. So a lot of the tutorial is actually set up. So it assumes that you're kind of have the paper open side by side with the code. So you can actually click on, you know, different options to simulate and reproduce all the figures in the tutorial paper. So it's, yeah, so the short answer is you need MATLAB and you need SPM. Thanks for being here. And you need to have some minimal ability to, you know, work with MATLAB. Just to just to piggyback onto that for anybody's benefit watching this in the future whenever you're watching this. Once you get the SPM folder, you know, you'll unzip that, you'll get that folder, you'll put that somewhere on your computer, you'll want to make sure that you add path from your MATLAB workspace when you're in the supplementary code provided by Ryan and Christopher. When you're in that workspace, add path of that folder wherever you put it, add path of that folder slash toolbox slash DEM. And then you should have access to the additional SPM12 scripts that you need in order to make use of the functions they've provided. Yeah, no, thank you very much. I mean, that's that, yeah, for someone who's never used MATLAB, yes, there's a few steps like that, which are super important. Yeah, because basically all these scripts are actually within, yeah, within the DEM toolbox of SPM. And all the scripts we've provided call on sub functions within that are within SPM. So, so yeah, it won't, unfortunately, none of them will work without, without SPM at the moment. Although there are certainly efforts to try to make some of these routines more sort of generally accessible and like free software, like Alec, Alec Chance, for example, is in the process of, I think, writing a like a Python version of the of the main active inference MDP script, which is called the, it's called a SPM underscore MDP underscore VB underscore X. And that just corresponds, that just MDP underscore VB underscore X corresponds to Markov decision process underscore variational Bayes underscore X, which stands for factorized. And I'll I'll go over what factorization is and how you actually code that in once we get to my section. Okay, I'm going to ask a follow up there. So I have my desktop computer. I have MATLAB and it's all working with referencing SPM most updated version. And I've run through the examples that we're going to be working through maybe today or the ones that are in the paper. So I kind of hit enter and I got things to work. Everything's all the scripts are working fine. What information from my system do I need to know? Like what measurements are going to be relevant for me to bring to the table? If I'm going to do a t test for whether two aunt colonies are a different size, you know, I know what kind of machinery I'm going to need, you know, two columns in the Excel spreadsheet or something. But what kind of data or what attributes of the system are important to bring to the table for an investigator to utilize the examples that you're providing? Yeah, so I mean, so I mean, there's a couple of things. I mean, so so first the, you know, so the first goal is, you know, you take a particular behavioral task and you have to figure out what the right generative model is for that task. And this is the sort of thing that I'll give an example of. I can actually show you multiple examples of this. And so once you have this sort of generative model set up, then you can run simulations and those simulations will generate observations and generate expected actions. So so what you need then is you need data from an actual participant performing that task, where you'll know on each trial what the person observed, right, what the tasks stimuli were, and you'll know what action they actually chose. And so then what you'll do ultimately is you'll do what's called parameter estimation, which is and again, this will be a lot more comprehensible once I actually show you. But but basically what you try to do is you try to find the set of parameter values in the model that generates behavior that's most similar to what the participant actually did. And so once you find the parameters, you know, say one has a value of two and the other has a value of four, and that generates behavior that's identical to whatever a given participant did, then those values of two and four, you can use those as individual difference estimates. So if you have those values for a bunch of different people who behaved in different ways on the task, then you could say, hey, is there something different about the people that have a parameter value of four versus a parameter value of eight, right? Can I use that to predict, for example, how well somebody's going to respond to a particular treatment, say, in computational psychiatry? I curse twice. Awesome. But so but so but so that's that's kind of the idea. You know, and I mean, I can't I can't share my screen in the meantime. I don't know, we'll hear I can I can perhaps go through some of this. So I'll just share my screen and tell I'm good. Yeah, sorry. That was this whole thing was Oh, you're good. Okay. I think I am at least. Can you guys see my screen? I was perfect. Right. Yeah. Okay, great. Sorry about that. There was some a whole bunch of privacy issues. I had to pull the passwords there. Anyway, okay, we are here. So okay, so you guys seeing my present of you are you seeing the full thing? Looks, we see the we see the full slide. Yeah, just a awesome. Okay. So this is just kind of part one. After I've done gone through this, Ryan's going to take over and go through some of the practical aspects of it. And so then just kind of reiterate the scope and purpose. Our target audience really is kind of researchers in neuroscience and psychology that don't have a strong kind of quantitative background in maths or ML and particular early career researchers. And we just really want to provide people with the requisite background to actually apply this in the context of their own research. Okay, and then so just to quickly highlight some really fantastic other resources. The first papers by Lancelot DeCosta, which is an incredible technical review just came out in mathematical psychology. Nossajid has a really other incredible paper just came out in neural computation. And I think the comparison to dynamic programming, Bellman formulations is still in preprint if that's right, Ryan. But as well, there's some really phenomenal informal tutorials with Oleg Solokchok. I'm sorry if I butchered your name. But anyway, they're fantastic kind of informal tutorials on medium. And then lastly is kind of the closest thing to what we're doing today is these lectures by Philip Schwartenbeck in the computational psychiatry summer school. And these really are fantastic. The only difference is that they work with to my knowledge, I think they work with the unfactorized MDP scheme, which is a little less flexible than we're working with. Okay, and so there are a lot of ways of kind of motivating active inference. I think kind of the way that's most intuitive to lots of people in cognitive science or backgrounds and cognitive sciences, I kind of start from the perspective of the Bayesian brain. And so interrupting if anything's kind of moving too quickly or too slowly or anything like that. But broadly speaking, the idea is that the brain encodes a generative model of the environment where this generative model is just kind of a joint probability distribution. And then outside or in the outside world, as it were, there are states and observe and states give rise to observations. And we use the generative model in combination with Bayesian updating to infer the hidden causes of our sensations from the observations. And then we take actions based upon our kind of internal model of the world that couples us back to the generative process. That this that makes sense. And so there's this kind of perception action loop that's always going on. So Chris, I think I think you might want to tell people just what the PO comma s comma pi is. All right. Okay, yeah. So, okay. So that is a joint probability distribution over observations, states and policies where policies are just actions or sequences of actions that are available kind of all of the actions that are afforded to the agent, essentially. Okay. And so then kind of just we start with like a re illustrate this will just give like a really, really basic example. And so you might be imagined being presented with kind of a shadowy shape like this. And then you might and what you want to do is infer the causes of that shape based just based upon observations. But you don't just have observations, right? You have some prior knowledge about the world. And so we can specify that here. So we might say we're two possible causes in this very limited example. There is we might say that the shadow is caused by convex surface or a concave surface. And they're fairly similar prior probabilities. However, we also have the structural prior that we've kind of acquired through a lifetime experience of just light emanating from above. And under the kind of a structural prior that light emanates from above, the likelihood of the observation of that particular shadow kind of conditional up in concave is much, much higher. Okay. And so then we can then combine the two, the prior and the likelihood to give us a joint distribution where this is just kind of conditional on a specific observation. We can then and then with some over the states in that likelihood to give us our marginal likelihood and then divide our joint distribution or our generative model by our marginal likelihood. And that's called model inversion or is often colloquially called model inversion. And from there we get our posterior distribution. This is the probability of states conditional on observations. And so we started with a prior and a likelihood and got to a posterior through Bayes rule. And that is kind of formally speaking the optimal way to infer the probability of hidden causes given observations. And but the complication to all this is that the marginal likelihood is generally speaking computationally intractable. So I in the case kind of these very simple discrete distributions, the number of sums that you have to perform scales exponentially with the size of your hypothesis space, that's extremely impractical. And kind of more realistically, when you're working with in continuous state spaces, if you have non Gaussian or non linear signals, they're just analytically the distributions is analytically intractable. So I think I think Max had a question. Yeah, yeah, sorry. Yeah, real quick, just just to really explicitly touch on the idea of hidden hidden states and hidden variables in this particular context. Would it be fair to say that it's not only the convexity or concavity of the the potential, you know, thing that I'm looking at, but also I could think of the light source, the direction from where the light sources as being another hidden variable. And those things together, would it be fair to term those as under my Markov blanket? Or is that getting a little too far out from where we want to be talking right now? Yeah, so Markov Markov blankets don't really the kind of Markov blanket concept here doesn't really come in in any really interesting way. I mean, you can formulate it that way. But really, the idea is just you have this prior expectation. So when you're looking at the little gray disc right there under observation, most people when you look at it, even though it's actually just a flat shape with some, you know, it's just a flat two dimensional thing that has a little bit of darkness in one place, a little bit of lightness in another. You know, if most people see that as being concave, even though it's a flat 2D thing, right? So the idea is just that there's a reason, right? Why people typically see it as being concave as it's kind of like popping in as opposed to out. And that's specifically because there's this prior belief that that light sources tend to come from above. So the hidden states here are kind of there is a joint in the likelihood, which just means that there's a in the upright, which is likelihood, there's a concave comma light from above, right? So those are two different hidden states. What it's saying is if the hidden states are concave and light from above, if the combination of those things is what's generating what I'm seeing with that gray disk thing, the probability of the shadow shape, the shadow pattern that you see is 0.9. Whereas if it was convex and light from above, if those were the two hidden states, then the probability you get that same shadow pattern is only 0.1, right? In other words, it'd be really hard to get that pattern of shading if the thing was popping out under the assumption that light's coming from above. So the observation here is shadow, and the hidden states are concave and light from a concave or convex and light from above and light from below. Okay, thanks. Yeah, just one other quick example there would be like the chessboard with a shadow and a dark square. It's perceived under a deep cultural prior that chessboards are regular alternating grids. It makes an ambiguous shaded square appear as something that's quite different than it is. People can look that up. And then another thing this reminds me of is like using a t-test to test whether group A versus B is taller or something like that. It assumes, but it can be an assumption that could be slightly bent, is it assumes that other than the variable being considered, other things are sort of fixed or not mattering. Otherwise, your statistical test is basically misleading because it could be capturing some totally confounding variable in the framework you put it in. And so whenever you're talking about a real biological organism, you're talking about conditioning on certain things. And then given what is totally conditioned out of the picture, so I kind of see where you're going with the blanket, but it's not specifically in this place, but a good question. But it's like, depending on the irrelevancies that we can condition out, then we're looking at the conditional relationships between different observed states versus the hidden things. So cool stuff. And again in the live chat, people can post any questions on it. Continue, Christopher. Yeah, thank you. So yeah, thanks for covering that, Ryan. Okay, so then the idea is that the marginal likelihood is generally computationally intractable, but kind of borrowing some ideas from statistical physics. What you can do is actually use some approximation techniques. And instead of evaluating the marginal likelihood directly, we can evaluate something that's always provably greater than or equal to the marginal likelihood. I'm going through that right now. Okay, and so we're just going to take the logs kind of for mathematical convenience. The reason we always work with logs or generally speaking is because log algebra is just a lot easier just because it turns multiplications into arithmetic. So things, it's just easier to work with essentially. So we'll take the negative log of our marginal likelihood. This is also sometimes called Bayesian model evidence. And when we have a negative log of a probability, it's also called surprise and information theory, which is kind of the terminology that I'm going to use the rest of this kind of presentation. And so you can see on the right hand side of the equality, this is kind of the sum rule of probability. We can get back our surprise by just summing over kind of all of the states under our joint distribution over states and outcomes. And then we can kind of do a little bit of a trick here. So what I'm going to do now or show you is that we are going to multiply this joint distribution or generative model by some arbitrary distribution. We multiply and divide by the same distribution. And so nothing has changed here. This quality still holds. I could cancel these if we wanted to, but I don't want to. And that's in particular because I want to take advantage of something called Jensen's inequality. And that is that a result showing that the expectation of a logarithm is always less than or equal to the logarithm of an expectation. And so the idea is is that this sum here where the logarithm is inside this summation where, sorry, so we're taking, we're summing over this kind of difference between a generative model and this approximate or target distribution. When the logarithm is inside that it is always going to be and we're in kind of negative territory. It's always going to be greater than this quantity on the left-hand side where this left-hand quantity is equal to surprise because the log is outside the sum we can cancel these kind of approximate distributions, right, and end up back at surprise. On the right-hand side we can't do that. This right-hand side will only be equal to surprise when our generative distribution perfectly matches our approximate distribution. And so this quantity on the right-hand side here, this is variational free energy. And so the idea is is that we find the probability distribute, this approximate probability distribution, which we make simplifying assumptions about and so we can kind of evaluate analytically. We make simplifying, we find the value of this distribution that best minimizes F, where F is variational for energy. Does that kind of make sense? Hey, Chris, just one thing. Are you like trying to like move your mouse around to point to things? No. I'm actually putting my mouse. I'm scribbling my mouse a little bit on the live stream. They should be able to see that. But Chris, deferred, thank you for that really awesome example because it really clarified a few things. Any other notes to add here? Otherwise, this is cool to continue. Yeah. Okay. So then just kind of give a really toy example of this. So we can get to find our approximate posterior. Kind of arbitrarily is a flat distribution. We'll have a true posterior. Generally speaking, we don't know what this is, but the illustration sake, we're going to give kind of give it to you. And then we can have a joint distribution and an observation. And what the observation will do is it's just going to select a kind of a column from this joint distribution. And so then we have we can have enter this into the equation we saw on the last slide and will slowly but surely in each update nudge our posterior distribution such that it minimizes F on each step. And at the third step, you see what we should see is that when F is at a minimum, it is equal to surprise. And when F is at a minimum as well, it's at a minimum because the true posterior and our approximate posterior match. So that kind of makes sense. And so the idea is that by performing Bayesian inference and having and kind of forcing some arbitrary distribution to approximate our posterior, we come up with an upper bound on surprise or we come to an upper bound on surprise. Let me kind of just walk through and just double check here. So Q of S is our estimate of whether the coin is a fair coin. 50-50 means that it's 50-50 and then 80-20 is the reality on the table, so to speak. So it's almost like first we click from 50-50 we see heads come up and that contributes a little bit of evidence that maybe heads is more likely. So we go from 50-50 to 60-40 and then something happens again and we click on the second update to 7.7.3 we click eventually to 0.8.2 and then if we click all the way to 0.9.1 so like a 90-10 all of a sudden we go back up from 0.693 back up. So if we were to plot this final estimated F we'd find that we were like going downhill, getting better, get to 0.693 in this discrete 0.1 movement space and then we pop up a little bit too far, we overshoot and we get a little bit more surprised by the distributions that we're drawing. Yeah exactly that's perfect. Well the only the only thing that I would say that's I mean a little different just to note is that there's nothing in this example where you observe something over and over again. Yeah. This is just a single observation once and you're trying to figure out what what is the you're trying to figure out what the true posterior is trying to get as close to it as possible. So all you're doing is saying here's the single observation and I'm going to try out a bunch of different values for qs a bunch of different values for that approximate distribution and I'm just going to find the one you know through this kind of iterative updating thing where I slowly move qs you know to different values you just do that for a single observation you update until you find the minimum free energy value which just tells you that your approximate posterior the qs thing is as close is really close to the true posterior that you couldn't figure out on your own because the problem was too tough to solve with exact Bayesian inference. So this this is kind of corresponds to what people might talk about as prediction error minimization right. So you just see this observation once and the brain tries to minimize prediction error by minimizing f by just moving beliefs until it finds the the belief that minimizes f which is the same thing as minimizing prediction error. Love that if that makes sense. Let me clear because I think that really helped me understand it and at first I thought wait with one observation because at first when I described it I was thinking update was tied to a new observation from the coin but then I thought well if you're only getting one observation the maximum likelihood model is a coin that only comes up heads because you only have one observation but it's not one-shot parameter learning naive it's actually a a tethered estimate that's tethered even loosely but non-zero tethering to a prior 0.5 0.5 some people will say uninformative but all priors are informative they're all what they are and then it's almost like because of the logs even though you only observed the coin come up once it's like all right it's a little too far to update to 99.1 just from seeing one coin flip I would need extraordinary evidence for extraordinary updates and so it's like that one heads observation updates you from 50.50 to near 80.20 which does happen to be close to the actual probability of the coin and it also happens to navigate this explorer exploit in an interesting way because it updates it but not all the way so it's just kind of showing how the Bayesian updating brings some of that wisdom of multi-observation learning like slow updating of parameters and sequential updating to a little bit of a different context so I do want to clarify here a little bit I mean there's there's no actions here so there can't really be explorer exploit right you can't you can't choose to look over here versus over there to gain information right so explorer exploit is you know specifically in the realm of make it choosing actions that will minimize uncertainty versus maximize reward this is this is really more just with with normal free energy right we're not too expected free energy yet which is the decision-making part with normal free energy there's you can think the simplest way to think about free energy is just in terms of complexity minus accuracy which is equivalently complexity plus prediction error and all that all that ultimately means is so the complexity thing is basically how much you have to change your beliefs so it's basically saying what's the minimum change in my beliefs that will make my new beliefs as accurate as possible right so I have to move my beliefs as little as possible while also minimizing prediction error if that if that makes sense so this what this generative model is saying is just that the probability that we're in state one is point eight if I were to choose if if I were to get this observation where the probability of being in state two was only point two where to get this observation right so so I mean the probably a better example than the than the like heads and tails thing would just be would just be something like the concave convex example we gave before in the state there is some possibility where where a life from above you know where a life from below could cause the shading pattern but it's just much less likely right so you just find to find the the belief right that it's convex and life from above that that is the one that most likely generated what you're what you're seeing and it might take a bunch of updates like that given just what you see once to kind of arrive at the best fitting belief there's a related question in the live chat that I'm just gonna ask because it's on the topic they asked is this approximate Bayesian inference thing called something else in stats outside of active inference or is this unique to active inference because this isn't sequential Bayesian updating as you mentioned this isn't a standard Bayesian filter what is this called outside of the active inference field so yeah it's just variational base variational base where does the partially observable markoff decision process come into play we'll cover that in a moment's time actually yeah it doesn't yeah it doesn't come in yet so I think one thing to say is kind of writing this tutorial it's hard to please everyone right like in going through some of the feedback like we get we got some feedback where people were just like utterly confused and wanting clear explanations and then at the other end of the spectrum we wanted there were people wanting much more technical details like how does this relate to things like gradient descent because this is just kind of a very simplified cartoon example of something called gradient of a gradient descent scheme where you're doing a gradient descent on free energy um and so I would just say kind of to flag all of these issues is one I would say if something's unclear in this presentation read the paper because we gave things we cover things in a lot more detail in the paper and if that isn't technical enough for you actually go and just look at our code we supply a kind of standalone script where we it's extreme and it's extremely well commented and from there you should be able to figure out everything that's going on um so all of this stuff in this presentation is kind of necessarily simplified because you don't really need those technical details to kind of start using the framework or start getting intuition for how things work um but once you do have an intuition you want more just go and see the code I would say awesome it's like you can use the ANOVA package in R without going into the source code it's helpful it's a tool for scientists and then if you're curious about the underpinnings of statistics and perennial philosophical debates there's a literature and a search bar but today it's about the applications of these methods which is awesome so thanks everyone for the questions keep them coming but this is great discussion I really appreciate it okay and so just kind of to recap that I haven't covered action yet I'm gonna cover action a little bit but just kind of to recap the idea is under active inference organisms are kind of we model them as if their phenotype is in their body and their brain kind of embodies a generative model of the environment and organisms kind of invert a generative model to arrive in an approximate posterior distribution over the hidden cause of sensory input and they do this by minimizing variational free energy so that's should that much should hopefully be clear by this point and so then on to the generative model and it is a very specific type of generative model namely it's a partially observable markup decision process and so there's kind of a graphical representation of this as a Bayesian network on the right hand side and I'm not going to kind of give kind of give too much detail about this right now I will actually build it up step by step but just kind of prelude the idea with POMDPs is that they describe transitions among hidden unobservable variables and the sensory data that's generated by the variables whoops so those arrows mapping from states to observations are kind of give information about the direction of influence or conditional conditional influence so kind of the arrow between the purple node O and the green node S is mediated by the A matrix which is a likelihood and the transitions between states are mediated by a B matrix which is a transition probability and so the goal of active inference with POMDPs is to infer states and action sequences or policies by minimizing various forms of variational free energy okay and so to start with a really really simple example we have static inference and the idea with static inference is this this is just a graphical representation of Bayes rule essentially we have a prior which is encoded in our D vector we have a likelihood encoded in our A matrix and we end up with an update equation which is a softmax function a softmax function is a normalized exponential whoops and so I'm not going to kind of run through that explanation in detail I'm just going to kind of leave the slide here so people can figure it out for themselves and pause and go back and convince themselves this is true but for this very simple example the active the inference scheme that we're using here is formally equivalent it's just an exact inference scheme so moving into dynamic models specifically so where states change over time there's also called hidden Markov models this is when we have to start making approximate or when things are no longer equivalent to a problem using active inference are no longer equivalent to exact inference and so here we have our transition probabilities encoded in a B matrix and this is just essentially like the probability of some state of t plus one conditioned on the previous state and so you can see over here in terms of the update equations we're now in we're in log space and this little sigma thing here is a softmax function which just normalizes this equation and the idea is is that this is kind of sorry my screen keeps flipping anyway the combination of D and B so our two priors in addition to our likelihood will give us the approximate posterior and the reason we have this one half in front is just because in practice the approximation scheme that's used by active inference namely variational message passing tends to overestimate kind of the value of the posterior and so this is kind of just a way of compensating for that and so anyone's kind of interested in technical details there see Thomas Parr has a really excellent paper out in scientific reports on neuronal message passing schemes under various approximations to free energy okay and so then the form of the free energy is just down here and so the idea is is that by iteratively applying these updates we minimize free energy and you iteratively kind of every you start you do a full round of updating every time you get a new observation okay but what about policy selection so the idea with policy selection is roughly speaking that policies are just state transitions that the agent has control over so imagine you are a psychophysics experimenter and you have just a very simple boring example of someone kind of say estimating the orientation of some hard to see stimulus and so the policy space there is tiny it's two options it's left or right let's say or you could have an agent that you could be simulating something more interesting psychophysics you could be simulating an agent navigating a maze and then the policy space is much larger they could go forward they could move forwards backwards left right etc etc and so the idea is that active inference agents by definition need to select policies that will minimize variational free energy but that relies upon observations that have yet to come so that's kind of a problem but the way around this is to treat observations as random variables and then what you do is you minimize an approximation not to surprise which is variational free energy but to expected surprise and that approximation to expected surprise is expected free energy and so this expected free energy has two key components the first is the expected cost and this is kind of minimize the idea here is to minimize expected cost you need to minimize the deviation between our predicted and our preferred outcomes and so this is kind of what we were talking about Ryan was talking about before so this c vector encodes a distribution over an agent's preferences so for example policies that minimize expected cost are policies that kind of bring about observations that the agent prefers for example so if I prefer to kind of have my body temperature within a certain range the policy would be maybe staying inside because I live in England and it's like negative one outside for example now expected ambiguity is a little bit this is kind of the epistemic drive or the information gain term this is kind of the expected entropy of our likelihood distribution so the idea here is is that to minimize ambiguity you will have an a matrix right and a matrix if the idea is to minimize kind of ambiguity you need to select observations in that a matrix that are maximally precise and so to return to or take actions even that will make those things that a matrix maximally precise and so if I'm in a dark room to give Ryan example before the thing that will make or select kind of or policies that will make that distribution maximally precise are policies like turning light on and so then to minimize fringe is a whole you have to minimize both of these things and you can do this in terms of one step policies just looking one step ahead or you can do it look in using deep policies looking many many steps ahead hopefully that's pretty clear is there anything you'd like to add to that Ryan or Daniel anyone else um yeah I mean I guess I'll I guess I'll just say I mean to just to make it kind of as as clear and explicit as possible for people who don't have a background and you know and what this whole thing you know what this equation means right so the idea is just if you look at that term above expected cost that's called a KL divergence and so basically it's just a value that encodes the difference between two distributions or the dissimilarity between two distributions and so that first one that q o given pi that's just saying what observations do I expect given that I choose to do this versus that then that second one that p of o that's the preferences so basically what it's trying to do is just minimize the difference between your preferred observations and the observations that you expect given that you choose to do you know thing one versus thing two so it's really just choosing the thing that's going to get you as close to that you think is going to get you as close to what you want as possible that's very just you know reward seeking more or less and that second one the expected ambiguity that h thing stands for entropy and basically entropy is just the higher the entropy the flatter a distribution is so think of a distribution that's like one is like 0.5 0.5 and the other one's like 0.8 0.2 um if you chose the state that would generate 0.5 0.5 over observations then it wouldn't tell you anything because either observation you've got it tell you there's a 0.5 probability that you're in one state and a point of probability you're in the other state right whereas the other state would generate a 0.8 or a 0.2 then that one will give you a lot more information because you observe if you observe the thing that indicates 0.8 then you're really confident what state you're in and if or if you observe 0.2 you're really confident you're not in that state right so you're just seeking out the thing that you think will get you what you want as much as possible but also moving to the states that are gonna give you the observation that's gonna tell you the most about where you are if that if that if that makes sense so just again just for people who who aren't necessarily as familiar with reading the the kind of uh notation in these equations thank you max yeah just one one point of clarification and then I just wanted to tie it back to what we the example we had discussed previously so first I want to just make sure I'm understanding correctly that the d in that equation is the the k l divergent so that's the that's an operator it's not the same as the d that's in the figure in the block diagram no that's that's the k l divergence that's a yeah that's kind of a standard way of representing it in other papers that's just kind of pasted in but it's not very clear that should be the k l there's nothing to do with d d sub fail and then the d the d the d that's in this graphic in the block diagram would that be an alias in our previous example where we were talking at when when we illustrated step by step that would be our 0.50 0.5 but it in this context could be much more complicated than that yeah in a simple example yeah it would just be your your prior right so if ahead of time you think that like ice cream is more likely than donuts right then that could just be like 0.8.2 you know that's or if you have no idea whether it's going to be ice cream or donuts it could be 0.5.5 and for and for anyone who is like wanting to get into optimization and how do we know whether or not this is going to converge when we do our message passage or message passing algorithm back and forth there was a really good citation and that's the one that was alluded to at the beginning of this on the slide the technical papers is that correct so the citation to do with message passing in particular is a paper by Thomas Parr so that is in scientific reports yeah we can I think do you have show notes I assume we can yes please I have a couple other general questions from the live chat but I think we'll take them at the end of your presentation as we turn towards some of the neurobiology beginnings and a few other aspects so continue Christopher yeah cool thank you okay and so then kind of just to close or just to briefly recap so there are multiple stages to this to policy selection so under a generative model that has multiple policies you need to minimize you minimize your free energy with respect to each policy your variational free energy and so you might think about this as you say you're coming to a set of traffic lights and you have a whole but you could turn left right or go straight ahead those actions are possible but you then get sensory input that says there's a no left turn sign so that would make that policy give an extremely high free energy value and so that would eliminate it from the plausible policies that you can kind of evaluate right and so then expected free energy is what we just talked about and the posterior distribution is actually the net softmax function over both of these and so this because kind of this is these are both negatives policies that best minimize both variational free energy and expected free energy will have the highest posterior value and then kind of is that is that clear there are a couple of caveats to all of this but just wanting to make sure Ryan do you want to add the thing here well I just say just just so anybody who doesn't know like what a what a softmax function is so so all that all that means is that like for instance when you take minus f minus g that's not going to give you something that's a true probability distribution right it's not going to be a thing that sums to one right where probabilities all together sum to one what it's going to do is give you this negative you know a bunch of negative numbers right so the softmax function does is it normalizes that which just means that it takes the kind of relative values you know of each of those things and turns it into a probability distribution that sums to one so so you end up with that bolded pie symbol is just a a probability distribution that's assigned that assigns a probability to each possible policy and a policy by the way is just a sequence of actions right so one policy might be like turn left turn right and another policy might be turn right turn left and so it's just saying that that minus F minus G thing will just be turned into a probability distribution and that probability distribution over different policies is what the agent will sample from right so if the if pie says this policy is 0.8 and this other policy is 0.2 then the agent will be a lot more likely to choose the policy that's 0.8 yeah maybe go to the last caveat slide and then I have a few other questions where we'll just try to like have a question simple answer question simple answer and then we'll move into the part two screen share okay so the two caveats are just a there are two extra components to the PLMDP that is namely the E vector which you can see is kind of pointing to or there's that E block that's pointing to policies and then there's our gamma as well that's also pointing to policies and so E is how we generally speaking how one models habit formation or one way of modeling habit formation and so I'm not going to talk about that now I just kind of wanted to flag it and when we come up things come up in learning you won't be surprised there's now this extra term in the policies and what gamma does gamma is essentially weights the contribution it's a of G to policy selection and so the idea is roughly speaking is that G is essentially a you have your prior distribution of policies which doesn't take in F at all and then you take have your posterior distribution which takes in F and then you look at the difference between those and if there's a large difference between those roughly speaking then gamma will go down it will essentially down weight the contribution of expected for energy to your posterior over policies and this is kind of linked to has some really this is how we model phasic dopamine spiking for example I'm not going to get into any more details there because I think it's actually much easier to just have like a fully worked example but I did just want to flag all that stuff okay so I think that was my last so that's like this tightening of strategy while things are effective but not overfitting but a tightening while things are working and then during a mismatch a period of high uncertainty or ambiguity there's actually a movement towards more exploratory behavior okay let me try to go through a couple of these questions so we'll just try to have the like sort of clippable one further thing I just wanted to clarify about though is that so gamma thing really you should think about that as kind of like the precision estimate for expected for energy so all it's doing is just saying hey if things were super different than what I thought they were before policy wise then I mean in some cases it's a little more complex than that but but basically I can't trust my expected for energy estimate as much right so I should down weight I should sort of down weight how much me expected for energy contributes to what I choose to do because it's not as reliable right so it's just saying how much should my habits and my current observations affect what I do versus what I expected ahead of time how rewarding they were going to be right so if the clock has always been accurate and then you look at it you're like wow I didn't know it was 1 p.m. you're like I believe the clock but if you know that the clock is inaccurate and then you think I thought it was noon I looked at it it said one but you know who knows with this clock these days so that's about the confidence in the observation okay so policy selection two quick questions on that the first one is what is the time scale of policy selection and does this model assume any particular temporal scale so I mean temporal temporal scales are kind of just whatever you want them to be right so so like you know in the context of a task right you might just say here's time point one which is where you know a participant gets some observation right and here's time point two which is where they choose an action right in which case that would be a two time step trial and I mean it doesn't really matter right I mean it might the stimulus might be on for five seconds or might be on for one second in the first you know time step in the trial and maybe their decision takes half a second right so time step two could be half a second or it could be like a minute right I mean like so it's time steps are kind of whatever you define them to be relative to a task you know in another session when we get to hierarchical models you will have higher levels that operate on slower time scales than the lower levels but that's a kind of more complex thing that we'll have to cover later okay another policy related question is can you comment on different forms of policies such as plans from tau equals zero to big time t so from the beginning t and then tau equals little t to big t belief action policies tree search followed by belief action policies and then how is this related to different constructs such as working memory and habits so maybe just first part of the question was how are different kinds of policy estimation undertaken and the second part is how is that related to maybe different constructs we have with regular vocabulary like working memory or habits do you want me to take that Chris or do you want to um yeah I'm happy to take it I suppose so you can have shallow policies or you can have deep policies where deep policies are just kind of time point whatever time point one to t and you're just summing over basically all of you basically it's the path integral formally speaking just over expected free energy and that is updated at each time point or you can have shallow policies versus one time step how does this relate to like things like tree search and all that stuff well it depends so I I'm assuming that that question means that you're kind of familiar with like bellman formulations and that kind of thing so expected free energy is bellman optimal for one step policies and it's sub it's not bellman optimal for deep policies but there are sophisticated what's called a sophisticated inference scheme which is is this correct Ryan that's formally equivalent to backward induction yeah yeah so yeah sophisticated version of active inference which is more sort of explicitly like a deep tree search is is equivalent to backward induction which is bellman optimal yeah exactly and so there are forms of this that do relate to kind of tree search in in regard to working memory depends on how you want to set up the task so there are like models of say working memory where you have two level models where you what model working memory is kind of the thing that has stable maintenance you can then or you can model it with kind of one level models it just depends on how you want to set things up generally speaking I think of working memory as being functionally defined there are lots of definitions I won't put one out there right now but like something to do with maintenance right and then it just depends on what you're trying to model about working memory what type of models appropriate yeah I mean one additional thing to say that it's just really kind of basic is that working memory is going to come down to what your transition matrices look like right so if you're if you're put into state one and you have a really strong belief in your transition matrices that you're going to if you start in state one you're going to stay in state one right then that more or less amounts to that state remaining active over over several time steps and if you learn that at time step one and then at time step four believing in that same believing you're in that same state tells you what action to do then that's a type of working memory so Thomas Parr has a really awesome paper prefrontal computation is active inference where he kind of thinks or use thinks about transition precision in terms of excited tree recurrent connections in lateral PFC essentially which have been linked to things like maintenance cool one last quick take before we go to Ryan sharing your screen is related to modeling a given behavioral task what are the criteria for setting the factors where the model will be embedded so how do we just sort of operationalize the kinds of things that we take into account in this model I believe is what the question is asking I mean there's in a lot of tasks it's probably not one unique way right I mean that's kind of the where the kind of creativity problem-solving aspect of this comes in is because you know you kind of have to figure out right what what factorization structure what generative model structure whether it's factorized or not right or what different sets of ways to set up a model that can generate behavior that you would see in a task right I mean I'll give you I'll give you an example right when we build this task build this task model together in my part here but you know generally in a lot of cases you can make something factorized or you can make it not factorized and you know sometimes you can set actions up so that it has to do with probabilities in the B matrix and then your transition beliefs and in other cases you can set it up so it's in the likelihood in the A matrix so it's not there isn't my my point is there isn't one unique solution always for for what generative model to use and a lot of times you might try out multiple generative models and then do model comparison to figure out which one's best yeah all right maybe while you're setting up your screen I'm going to ask one more question somebody wrote I'm still not exactly clear on why you would want to maximize information and minimize the difference between your desired outcomes and your expectation what advantage is afforded by maximizing information if it doesn't enhance the likelihood of receiving the desired outcome well it won't it won't in that case like typically what will happen is is if there's so the agent I mean it depends a little bit on the exact tasks set up but what will usually happen is that if the agent knows what to do to get the reward then that just means that that cost term or that kind of reward probability term will just be the one that dominates like the value of that will dominate the thing will just select reward automatically the information seeking will typically only have a high weight if it's the case of the agent doesn't know what to do yet to get to get the reward you know that reminds me a lot it reminds me a lot of the game theory strategy which is like tit for tat but then you start out playing nice it's like default to being cooperative but then have a strategy and so it's sort of like a meta approach which is if it's if it's in the rule if you're winning the game of go or chess at whatever training level then what is there to do your strategy is working but then when there isn't a victory observation then it entails an exploratory search that percolates through higher and higher abstractions of learning in the system so I hope that we can make that a little bit more tangible with this session right here so we have 40 minutes left and again this is just part one of multiple so let's go for you know 20 to 30 minutes followed by 10 to 20 minutes of wrap up and final questions from the chat and a couple other ones I have stored up but thanks and take it away Ryan okay so I want to just ask real quick can you guys see my like mouse if I like point to things yes okay okay because I need to point to things what I'm presenting this stuff so okay so just kind of again to get to scope here I mean the end goal of the tutorial is for people to be able to actually do research with this with these things right so the end goal here is to learn how to build task models for empirical studies and the basic idea is that you're going to have participants perform a task and then what you want to do is find the parameter values in the model say like values for precision or values for prior expectations or values for habit distributions right that E thing that I'm not going to probably I won't use E very in this example but but just things like that right where different things in a model could take on different values and based on those values the agent will act differently it'll make different choices and so what you want to do is take the behavior of an actual participant and then find the values for those parameters there is that it'll best reproduce their behavior and so and then once you have those parameters that best reproduce a given person's behavior and you do that for all the participants and you can just use those values those parameter values as individual difference measures you might say like if one person has a higher precision a higher a matrix precision than another person does that predict something about for instance how well they're going to respond to a treatment like in computational psychiatry or can that tell you something about you know some other cognitive function or I mean you know etc etc and so the goal again is to get people to that point where they can do that and like I said before this is the kind of thing that people have been doing for a long time in reinforcement learning but it's new it's a novel sort of not very common thing yet to do an active inference because not enough people know how to do it and so you know I can give you examples here if I can get my what the well it's I don't know why this is not putting me in advance so you know and these tasks and vary you know in another session we'll you know walk you through how to build like a perceptual task that's like widely used for like EEG research as well as neuroimaging research more broadly but called like an oddball task which is primarily a perception task and there are also sorts of inferential like prospective decision making tasks that you can use that don't involve learning and then there are tasks that you know that use reinforcement learning or explore exploit dynamics and then you can also use it with neuroimaging right you could say like trial by trial where are the prediction errors you know in the brain what parts of the brain look like they're doing the free energy minimization you know trial by trial so there's things like that that you can do okay there that that and so sorry what just that that really reminds me of just to pull back a level making measurements of a biological system like gene expression or neuroimaging and then saying well what gene is upregulated in people who have this condition or what brain region is activated in this condition and we're taking that beyond the simply descriptive beyond the just measure and do a t-test and we're thinking about these observations these behavioral phenotypes as being emitted often across tasks potentially or across modalities emitted by a generative model that has these higher level parameters that shape the way that the person or the organism responds to stimuli under a certain specific framing of a time and task-dependent model just to sort of pull back a level and give that but continue Ryan so so I just want to say that you know there are this is kind of a new a new thing to do not very many people have done it you know my lab has kind of been trying to kind of get this approach out there more and so I mean here's some example papers that just do each of the things that I mentioned right so this one on the right here so recent plus computational biology paper is using active inference primarily just for a perceptual model and with that one we were just looking at actually interceptive processes right so for instance every time people feel a heartbeat and then indicate by pushing a button that they feel their heartbeat you know who who does better than that than others and why right and some people might have stronger prior expectations that they are or not going to feel a heartbeat and some people instead might just have different levels of sensory precision right like they treat the signal coming up from their body as more or less precise right so that's like one example this paper in drug and alcohol dependence was a learning an action so or a explore exploit and reinforcement learning task that was applied to people in substance use and then this one in the lower left it's greater decision uncertainty paper is an example of purely just sort of planning like decision making inferential model without learning and then Philip Schwartenbeck's paper up here on the right is an example of using active inference to look for the neural correlates of particular model parameters you know so for instance with the with the drug and alcohol dependence paper just use this simple explore exploit task where people had three options they could push button one two or three and then you could either a green or a red ball would kind of fall and green meant they won and red meant they lost and initially they didn't know you know what the probability was of winning or lossing for each winning or losing for each button so the first few choices they had to kind of explore right to figure out which one is giving the most greens I mean eventually they become confident that one of them is best and they kind of stick to that one and I won't go through it but you know you can specify a particular version of the graphical models that Chris showed that generate appropriate behavior like participants do on this task and again I won't really go through this but this is what the likelihood or a matrix would look like that they would wear over time they would learn these a zero values that basically say if I choose a slot machine one two or three what's the probability that that's going to generate this row the observation for this row which is a win or the observation for this row which is a loss and then the c vector here the one that encodes reward just says whatever the cr value is which would be a positive number just says I prefer winning versus losing by some amount and then this is the equation for learning and then this is just showing this is just showing the expected free energy equation and again this isn't the example I'm going to work through but what you could do I'll skip that is you could find that people with substance use showed lower action precision in other words the behavior was a little more random they showed higher learning rates for wins and lower and less slower learning rates for losses compared to healthy people right so you can tell this interesting story about hey like if people with substance use learn more slowly whenever a bad thing happens when they take the drug but they learn more more quickly every time they feel good right after taking a drug then their behavior is going to have a much harder time stop stopping taking taking a drug despite the fact that it has negative consequences you know and so you can fit these are just group estimates just showing in a Bayesian way that learning rate for losses was different in people with substance use than healthy people and the example for just planning and decision making without learning this was a an approach avoidance conflict task basically people just had to decide where to move this little avatar dude on along this little runway line and the closer they were to like the sun and and a bar that had lots of red that mean meant they were going to see a nice happy image and they were going to win some points the more red in the rectangle thing the more points they would win and the rainy cloud meant they were going to see like a really terrible picture and hear like a terrible sound so in some cases they would need to say okay I'm willing to go through seeing this really negative aversive thing to win a lot of points or not right so in this model it's just trying to say you know they start in the middle for example you know how close did they decide to get to one side or the other which controls the probability of getting what's on one side or the other so it's just planning where to move right there's no learning because they already know exactly what the rewards and punishments are and so in that case again I won't go through this but you can specify a generative model for this in a similar way you can run simulations about what people will do under different parameter values this beta thing is the expected precision or expected for energy precision and EC here is just how much they like the reward like the points relative to how much they dislike the negative images and we just called that emotion conflict that's just another way of presenting the generative model I won't go through that but here again you can show for example that people with depression and people with substance use showed less emotion conflict than healthy people and they showed greater decision uncertainty than healthy people they showed they had higher at lower expected for energy precision which corresponds to higher beta values so that's just another example just I think one thing that from the present of you you can't see questions but now that I'm not in present of you I can see that Daniel had his hand up oh sorry oh okay just yes this is great stuff I just want to highlight a few pieces here so with that same task you can imagine a purely descriptive model and a paper being written on something like people who have x diagnosis are more likely to approach when there's conflict or something so that would be a purely descriptive finding based upon the same exact data now what this tutorial is about is using the exact same behavioral data within and across participants and then modeling underlying parameters that have a very specific graphical layout and relatedness so here we're talking about a t-test basically being used on a summary statistic so they're not just different in the descriptive outcomes like this group was more likely to gamble on red this is like saying this group had a higher hidden state variable that we're going to attach to what we're going to call like gambling propensity now that relates to a question from the chat which is and a really important question how are the neurobiological interpretation of the agent's generative model parameter estimates done with respect to the parameter model so how do we go from a summary statistic or a graphical model estimate to a neurobiological interpretation let alone intervention or anything like that but just mainly the interpretation side so we have a whole section in the tutorial on the neural process theory that's associated with active inference that's something that we're going to that was we planned to talk about in another session probably in part two so I'm probably going to hold off on describing that in detail right now but the kind of quick version is that the process the neural process theory just makes certain assumptions about how neurons can be connected up to do this kind of thing you know so for example there's a you can have different rates at which your beliefs change rates with which the with your distribution over states becomes more or less precise and you know the the neural process theory says that the for instance the the sorts of ERPs that you would measure right the changes and the changes in the magnitude of change in the neural activation that you would see would correspond to the rate of change in in those beliefs essentially you can think about it is how quickly prediction error is being minimized as well whereas the the updates to this beta thing trial by trial are modeled as the as the phasic dopamine spikes that are essentially updating how much the expected free energy controls actions like you know policy selection so there are there is just there's just a neural process theory that proposes one way in which the the variational message passing of it otherwise these models how that could be implemented and early and therefore what sorts of signals you know you ought to measure in the brain if that process theory was correct but it's really important to recognize that there are several different possible process theories that you could come up with right so there's there's many different ways you you could connect a bunch of neurons up together to implement these sorts of models so there's there's like a lot of different levels of study here right there's what generative model best describes task behavior and then there's given that model what of a bunch of different message passing algorithms is the one that is being used and then there's the question about given that it's message this message passing algorithm you know there's a bunch of different possible ways the brain could implement that so which one is the right one right so there's many different levels of empirical questions that that you can ask at each of these different levels of description yep what I'll say to that is that the relationship between the free energy principle and active inference again as a process theory it means something within the philosophy of science literature so check out the alias 2018 interview with Carl Friston myself and marching forcher or the recent paper of Mel Andrews because basically the process theory makes specific falsifiable testable hypotheses so that's the kind of thing you could actually then say dopamine use this kind of tool and you should expect to see this and that's why this is a little bit more towards the data focused empirical end so continue right yeah but I like I said I mean we'll go into it you know task models will make specific predictions about what kind of ERPs you know you would measure in EEG you know during certain tasks or same thing with like fMRI responses and and so but anyway we'll we'll get into that in in another session so just a you know so this is another example again I'll walk you through these really briefly but that's what or not that's another generative model we built for this heartbeat perception task and what we found here was that in a certain condition a certain high arousal condition healthy people showed higher precision higher inter receptive precision whereas a bunch of different clinical groups didn't show any change in precision so you guys are showing differences here in the way the brain treats signals coming up from the body when in a high arousal state or really a state where they're having to hold their breath for a long period of time but and then that's you know you can also estimate prior expectations those didn't differ by group then finally this is an example of doing a using active inference for neuroimaging that philip schortenbeck did and here he was looking at the the beta flash gamma updates that expected for energy precision updates and he showed the trial by trial those did those updates did correlate with these midbrain region with this midbrain region that is where a chunk of that region is is where a bunch of dopamine neurons are so sort of consistent with this kind of idea that phasic dopamine responses are the ones encoding these changes and expected for energy precision anyway so those are just a bunch of examples of the way that this kind of thing has been used and I should point out that this is all just within the last couple of years this is very recent so the the task that I'm going to walk you through how to actually build it and we'll see how much time we actually have to go through this is a pretty simple task where you just participant just starts in the start state and initially they don't know anything about they have to choose one of two of these slot machines the one on the left or the one on the right and they don't know I have no idea to start with on trial one what which one is more likely to to give a reward and if you're in the left better context and that means that choosing the left slot machine will win 80% of the time and if you're in the the right better context then choosing the right slot machine will give you the reward 80% of the time and so crucially if on the first time step they just choose a slot machine then they get they'll win four dollars if they're right but the other things that'd be kind of like the reward seeking thing you know they want as much money as possible but what they can also do is they can first ask for this hint if they choose to ask for the hint it will tell them what context they're in it will tell them whether it's left better or right better and then they can choose one of the two slot machines and again they'll win 80% of the time if they choose the right one based on the hint but crucially if they take the hint first then they'll only win two dollars if they get it right instead of four so in other words taking the hint is costly so so you can think of choosing one of the slot machines right away as being this kind of reward seeking thing where it's also risk seeking because you don't know ahead of time which one's the right one or you can do this information seeking thing where you choose the hint first even though that'll get you less money but you'll be more confident which one's right so it's set up specifically to have this information gain component by choosing the hint and the reward seeking component so the question is how would you actually in practice build a model of this kind of task now the first thing so there's going to be some basic steps to do this one is to define whatever your initial state priors are and that's going to be that D thing and just as a we'll go into this more with the learning but big D if you use both big D and little D big D is kind of like the true generative process you know the real thing out in the world that are generating the observations and little D if you use it is what's in the generative model so the agents beliefs right and those two can be so big D and little D can be different if big D and little D are the same that means that the agents beliefs are accurate and so if you want the agent to learn or to have different prior expectations then what the true ones are then you have to use this little D thing um same thing here we have to define the likelihood or the state outcome mapping and that's going to be our big A and our little A if we want it to learn and have different beliefs than the true ones we have to define the preferences over outcomes right so that's going to be the C thing then we have to divide define the possible transitions or actions and so you know if there's just one transition matrix for a factor that just means the agent has no control over what it does but if there's a state factor that can have multiple possible transitions where each transition is like an action then the agent can choose policies that correspond to different transition sequences you know so for instance well I'll just show you but then but then for V here the last thing you have to define is policies which is this thing V in the code and that's going to be specifying different sequences of V matrices different sequences of possible transitions that could happen over the course of the trial and that the agent could choose one of those transition sequences to just jump in yeah really quickly for each of those then if there's you know if I'm talking about a different outcome modality I would have I would have to have one of those parameters for each outcome modality is that correct you would need to have so there's going to be one a matrix for each outcome modality and that's just going to say what what outcomes are going to most likely to be generated given each combination of states so given the value of each of each state factor and these are things that are that come up in experimental design right so like the probability that a given machine would dispense a winning to get that's something I should be thinking about as the empirical person designing my experiment right yeah exactly so so I'll show you here but right so typically you can do it other ways but typically this likelihood thing is what would define the reward probabilities right so you'd say given that I'm in the state of you know having chosen the left slot machine that will generate reward the reward observation with 0.8 and the not reward observation as point as 0.2 for example such as saying the reward probability is 80% and you know the agent would if you want them to learn those reward probabilities then you'd specify one of these little a things and then you'd have it learn it over repeated observations it's such a critical and and really fascinating point that reward isn't absent from the model through the preference vector C it's baked into how policy is calculated and so it's like the agent is pursuing precision all things being equal through natural selection you know unsuccessful models are just not going to exist and models that don't see themselves performing behavior are not going to be active for long so under a model of a successful preference then there's just a convergence towards that with whatever affordances are at hand so it's just a really interesting way to see how reinforcement learning situations can be adapted so the 18 minutes yeah and I mean some of that some of that gets a little bit more into the kind of like free energy philosophy stuff but but I mean yes like the assumption is that people inherit preferences that keep them alive but but in in kind of like an empirical task context you just use the C vector to just define what counts as the reward you know so it could be it could be anything right it could be winning money it could be winning points even if it doesn't give you money right it could be like seeing a positive image you know anything like that you just define what the preference distribution is and then usually you parameterize it so you fit that as a parameter to see essentially how much does the person like you know winning two dollars for example and I'll show you that in a second um but uh but yeah so so you're gonna need one C uh C matrix for each so you need one A matrix and one C matrix for each outcome modality and you're gonna need one D vector and at least one B matrix for each state factor is the to answer your the original question so the first thing here right is you the first so the way that I decided to build the model for this task is the first hidden state factor is the context right so am I in the state where the left one is better or am I in the state where the right one is better and I started that out specifying this little D here thing and the brackets just say the number of the state factor so this is D for state factor priors over states for state factor one and there's two different possible states left better or right better and I just specified these as two really small numbers which just says it's an even probability that the it's going to be the left one or the right one is better but these are really small and so I'm not confident at all about you know whether that distribution is right and I'll explain more about that when we get to the learning because this is this will be softmaxed during inference so this will become point five point five during inference but when you learn you sort of build up the numbers here so they become bigger numbers which means the agent learns to be more confident in what the distribution is that it believes but then I can also specify big D right which is the generative process and I just put this as one and zero which means that the true context is that the left one's better so that's what I would use to set up the priors for state factor one state factor two here if you can see at the bottom so hold on can you guys can you guys see this or can you guys see the bottom because for in mine I can see like my bottom the like we only see one line of code yeah the D2 bind yeah I'm just asking do you see like this like search down here and all that stuff it's perfect yes it's perfect okay there now it's gone okay so so then D2 is the second hidden state factor and that's the one that corresponds to your choices right so this is a one here which means it always starts out in the start states the first column here is the start state and then it can either transition into the hint state right which is the second one here I just defined that and then the third one is the choose the left slot machine state and the fourth one is choose the right slot machine state so this is just as that time one my prior is with 100% certainty that I'm going to start in the start state of the task so that's actually pretty simple right so then the next thing you're going to do is you're going to have to specify the likelihood so the A matrix and I'm not going to show you the complete one yet until we go into the code I'm just going to show you the parts that matter but so for the second state factor so A2 so for the second outcome modality so the likelihood mapping for A2 is going to be this thing where the first row corresponds to null which is just it hasn't won or lost yet the second row is the observation that it lost and the third row is the observation that it won and each of the columns here correspond to the state factor values so the left column here corresponds to the D1 you know this thing you know so the 0.25 here for left better for the left better state and the right one here is the one that corresponds to the light better state which maps on to column 2 here so the way that you would read this matrix is that if you were in the right state right better state then you would win with this probability and you would lose with this probability or if you were in the left one then you would lose with that probability and win with that probability or columns of that softmaxed are those distributions yeah the columns the columns are softmaxed yeah again if you're using I mean to get a little technical if you're using if using little a so these are Dirichlet distributions then it will build up counts right so these numbers could become like you know like 50 25 eight you know whatever but those that encode the confidence in those distributions but yeah they'll get softmaxed for inference um so then so then you'd have to also specify the C matrix for that state factor or I mean that outcome modality sorry so C2 is going to correspond to A2 basically it's for the sec they're both for the second outcome modality and here again rows are going to be observations but here the rows correspond to time points in the trial so basically this is just saying that at time zero or at time one I have no preference for anything at time two I have a negative preference for losing and I have a positive preference for winning and this rs thing is just a parameter right so we can say this la things a loss a version is one so it doesn't want to lose with negative one right and for rs we say rs equals four which is you can think of as the four dollars right and then so if it observes the win at time two then it's going to prefer that with a value of four and if it gets it at time three then it's going to be rs divided by two which means at time three if it gets it it's only going to win two dollars right or whatever the relative values are for the person right you could fit rs for a given person to see how much they prefer four dollars over two for example and these also um in the code are softmaxed and then logged so they become they become long probabilities so that's just saying how much you dislike losing and how much you want to win I mean again you can set those as parameters and then for state outcar for outcome modality one which is the getting the hint right you can have no hint you can have the hint that the left machine is better and the hint that the right machine is better right so those are both observations and then you can say with this pha thing you can say basically how how informative is the hint right like if I observe the machine left hint does that tell me with certainty that the left one's better or does it just tell me with some probability that the left one's better and here you could just set this as being you know like you could just set this is like ones and zeros to just to just tell you basically that would just say if pha was one then that would just say that the the thing is a hundred percent accurate in telling you which one's better if you observe one hit versus another and and so then the next thing right is you have to consider the possible actions or sequences of actions that are going to correspond policies and so here the way to think about it is so you're starting out in state one for state factor two right they'll like action state factor and you can either go from this start state immediately to choosing the right machine or you can go immediately from the start to choosing the left machine or you can choose to take the hint and then go to the left one or to the right one or take the hint and then go to the left one so those are like the the those sequences of actions are the policies that are going to matter right they're going to encode what the agent can do and so to do that you have to set up different transition matrices different B matrices that encode each of these actions right transitioning from one to four transitioning from one to three transitioning from one to two transitioning from two to four and two to three and so that's going to correspond to these setting up these B matrices which here so B one right which is the state factor for the which one is which slot machine is better this is just going to be an identity matrix which basically just says that the left better context is constant across the trial you know it's not as if from time point one to time point two the context is going to change or something like that right this just says the belief is and it is true that each trial has a stable identity as being the left better one or the right better one um but for state factor two so B two here that's where you want the agent to have different possible actions right so B two colon colon one right so the third dimension being the different different possible action um the first one would be like this which basically just says so columns are states at time t and rows are states at time t plus one um so this just says I can start in any state so in any column and choose to move to state one this one says I could start in any column and move to state two so taking the hint this one says I can start in any column so any state and move to state three which is using the left machine then same thing here for changing the to moving to the right machine so there are four actions one two three and four that correspond to four different third dimensions in the B matrix for state factor two and so then given those given those we specify V which are the policies and here the third dimension corresponds to which state factor so for this one for state factor one there are no actions right it's just this one B matrix and it's identity matrix and here the rows are the action and the columns are or sorry the rows of the time point and the columns are the action or the policy sorry and so the row here just means action moving from time one to time two and the and this second row means the action from time two to time three and so that is useless right for state factor one because there's just only one action whereas for state factor two we have all the different possible action sequences right so policy one would just be if the thing just decided to stay in the start state the whole time right like to just not do the trial the other one would be action two which is take the hint and then and then take action three which is move to the left slot machine or take the hint it to the third column take the hint and then move to four which is the action of choosing the right slot machine and then here this is just kind of a little bit of a trick but you can choose to at time from one time to one to time two you can immediately choose the left slot machine and then move back to the start state so three and then one or same thing choose the right slot machine and then again move back to state one and the only reason for that move back to state one thing is is that if you let the thing stay in state three and state four then it's as if it won four dollars and then won two dollars after that so you have to kind of move it back out of the state where it would win so so then you know then you've built everything really so you and you throw all of it in this little MDP structure so I didn't mention this but you have to specify T which is the number of time points in the trial so in this case it would be three so start take the hint choose left start take the hint choose right etc so there's three time points V is just V which is what we defined we'll ignore U for now but that's what you would use if you wanted to specify one step policies instead but the thing doesn't look ahead at all you know and then A, B, C and D are just the different matrices we specified and we want it to learn D so we specify the little D and then these other things that are just different parameters that you can set so EDA is the as a learning rate alpha is kind of like an action precision so it's basically an inverse temperature parameter it controls how random someone's choices are given the policy that they choose beta is the expected free energy precision that we talked about and these other two parameters I won't really go into detail but they just have to do with specifying the time constant so basically how quickly evidence accumulates after an observation and this ERP thing is basically it controls certain assumptions about what ought to happen if you would make a new observation each time with respect to what sorts of neural responses you would get and this is all explained in the code but I think that's a it's a perfect pause point so that we can close within the hour and leave people with excitement for part two so what can they look forward to in part two and beyond in a minute yeah so I mean I would just say that you know once once you have this set up then the next steps are to run this structure through this vbx underscore tutorial script and that will actually run the model and you know simulate the behavior of the neural responses and then we'll show you how to use plotting scripts to display and show how that behavior show the behavior that was the outcome of the simulation and then we'll show you how the kind of structure the MDP structure works and then actually show you some simulation results and how they work and things like that well and I actually probably walk you through some of the actual code that was an amazing session Ryan, Christopher, Max thanks so much for coming on everybody who was watching live and in replay also very appreciated so please leave a comment if you have a question or feedback for the authors or for anyone else and stay in touch with us because we're going to be making this a multi-part series where we're going to be going deeper into the technical aspects and also highlighting use cases hearing about people who are just learning programming learning to apply it people who are experts in other fields so whatever your perspective is you're in the right spot to be learning so thanks again all of you for coming on and we will see you another time