 All right, I think we are live. Hello everyone and welcome to the Active Inference Live Stream. This is Active Inference Live Stream 11. It is December 16th, 2020, and there's a lot to get to today. So thanks for tuning in. Welcome to the Active Inference Lab, everyone. We are an experiment in online team communication, learning and practice related to Active Inference. You can find us at our website, activeinference.org, on Twitter as inference active, at our Gmail account, our YouTube channel, or our public key base team and username. This is a recorded and an archived live stream. So please provide us feedback so we can improve our work, whether during the live stream or after. All backgrounds and perspectives are welcome here. And as far as video etiquette for live stream goes, if there's noise in your background, we'll always remember to use respectful speech behavior, et cetera. So thanks, everyone. Just a few announcements and points of process. There are two more meetings for 2020, and they will both be from 7.30 to 9.00 a.m. PST. And both of these meetings are going to be about the paper, Sophisticated Active Inference, Simulating Anticipatory Affective Dynamics of Imagining Future Events. A 2020 paper from the IY conference, and that's what this 11.0 is going to be primarily about. So if you're listening to this at the end of 2020, then we would love to have you at either 11.1 on December 22nd or 11.2 on December 29th. And also we are very happy to now announce for 2021. That's right, that is going to be the year 2021. We have announced our whole year's schedule. So it's going to be Tuesdays from 7.00 to 9.00 a.m. PST. And all the instructions, everything you need to know is at this link, rb.gy slash kvnpyc. And just to show you what that link looks like, it is a spreadsheet. And on the top is information about how to participate in the discussion. So fill out this form if you want to participate. Email us with what dates you want to attend. And then we will add you to the discussions event that day. And then if you want to collaborate, check out this link, which I'll get to in a second. And just looking at a few of these papers that we have up until the middle of April, for every single paper, we've been really lucky enough and have an awesome community of authors who want to join on. So we'll have an author representative of every single paper up through the middle of April and beyond, and we'll have, most weeks, we'll have an author on. So we're gonna get a lot of opportunities to hear from the authors about what they were trying to do with their papers. I'm just really excited to hear from all these different perspectives. And then of course, by bringing on people who are learning active inference or who are from any different field of expertise or experience, we're gonna be able to try out some new combinations that maybe haven't been tried. Speaking of the active inference lab beyond the livestream, if you go to activeinference.org, you'll see a call for collaboration for 2021. And this Aweber slash T slash AU SM3 link, you'll be able to fill out a form of interest and just let us know if you might be interested in collaborating for 2021, because we have three main projects that we're really excited to be working on. The first has to do with an active inference book of knowledge, which is gonna be like a participatory and an open and a very interesting designed textbook, reference and source of online course material. That'll be project one, that's very educational driven. Project two is gonna be the livestream and some other associated content types that we'll get to in 2021. And then project three is going to be tools for remote teams. So this is gonna be related to the frameworks for analysis that we laid out in our September 9th paper, Viacan et al. And we're gonna be working on how we can do analysis and design for remote teams. So that will be quite exciting. All right, thanks for all of that. Sorry about the relatively long introduction. Here we are in active stream 11.0. And the goal of active stream 11 is to set the context for the next two weeks of discussion, which are 11.1 and 11.2. The paper that we're gonna be reading in both of these weeks is sophisticated affective inference simulating anticipatory affective dynamics of imagining future events. And that's by Casper Hesp et al. 2020 and the research gate link is provided. This video.zero is an introduction or a context for some of these ideas. It's not a review or a final word or a judgment. The video is just an attempt to contextualize some of the ideas and the vocabulary as well as some of the formalisms for the paper, which will be really helpful in understanding the paper for people of all backgrounds. The punchline of the paper is that we can add features to active inference, such as anticipation and affect and aiming for more powerful simulated agents and perhaps new insights for a computational psychiatry. So we're gonna be looking at it with two main foci. One is on the simulating agents side, there are computational agents and robots. And the other side is sort of the human side with the computational psychiatry and how are we going to be using this to have insight into the human brain and mental health. The sections of 11.0. First, we will go through the keywords and that will be like a bridge or an on ramp for those who are from those different fields, maybe learning about active inference for the first time. Then we'll talk about the aims and claims abstract and the roadmap of the paper, how they get from A to Z. Then we'll go through the formalisms and figures and in this paper's case, the formalisms are actually drawn from a few other papers. So this video might go a little long and it will contain information on the two citations that give us sophisticated affective inference, which is the sophisticated side and the affective side, but we'll get to there in a few slides. In 11.1 and .2, we will discuss this paper. So if you're listening in the live chat, it'd be awesome if you have any comments as well as in the comments after the video is done or if you wanna put it in a Twitter thread or however, just let us know what questions we could ask the authors and also the panelists on 11.1 and 11.2 about this paper. And if those discussions have already happened, then keep on adding comments because the discussion never stops. All right, so the keywords, the best part. The keywords, as usual, there's so much to say about each of these keywords and my target here is really the construction of the on-ramp for this paper and the on-ramp for active inference. So if you're in the active inference community, I hope that these keywords are an invitation to explore some adjacent areas of research and if you're just learning about active inference and you want to familiarize yourself more, then hopefully these keywords are like an interface or a foundation for your understanding. And if you have any thoughts or any questions, then always just leave a comment or get in touch with us or come on an active inference stream because we're all really just learning these ideas at this point. And so we just want to hear what people think about what are the connections that are possible and certainly my hasty production of these videos would be greatly improved by people in the live chat or comments just letting us know where we could take the discussion. So the keywords of the paper included anticipation, counterfactuals, affect and anxiety. So maybe some of these words are more common than others. People talk about anticipating events a lot in everyday life as well as of course, anxiety and affect or emotionality, counterfactuals you don't hear as much about in everyday life, but maybe one does, they just don't use that word. So we'll come back to that in a second. All right, the first term is anticipation. So anticipation is a really cool term and there's a lot of interesting research. And my goal here is to really draw out this thread of anticipation itself, apart from for example, systems approaches to ecology or cybernetic approaches to action. So we're gonna really focus on this anticipation link and that's gonna hopefully enrich in all of these different areas because we're gonna see how they make this puzzle that fits together and it's bigger than all the puzzle pieces. So we're gonna start on the anticipation side with a couple of key citations which are related to Robert Rosen who was in the middle of the 1900s did some very influential anticipation research. And then also this 2017 paper by Louis and some other recent review papers on Rosen's work helped contextualize this discussion for me on anticipation. So the axiom of anticipation as they say is that life is anticipatory. So we've heard of life as a replicator. We've heard of life as a negantropic force locally like an organizing force. And here we're gonna have another facet and that's that life is anticipatory. So we're really thinking about all kinds of systems just like we were thinking about definitions of life as being pattern making or replicating. Anticipating is now going to be one of our key features for what is life like, at least according to this lineage of research. And this is gonna relate to a ton of other ideas including the counterfactual, which is really important. And the work of Rosen and others on anticipatory systems led to them formalizing what an anticipatory system was. And here was the actual diagram that they used. And again, there's many perspectives on anticipation, cybernetics of Russia and non-Russian, non-western anticipatory research, but I'm just gonna trace out this lineage of Rosen for right now. And certainly the concept isn't new, but the systematic study of it was new when Rosen was writing his book. And so here's a quote from the mathematical foundations that leads me to this figure. It says, Rosen's rigorously mathematical study of this biology inspired subject led to a sequence of papers culminating in his book, Anticipatory Systems, Philosophical, Mathematical and Methodological Foundations. So here's the definition that is provided of an anticipatory system. An anticipatory system is a natural system that contains an internal predictive model of itself and of its environment, which allows it to change state at an instant in accord with the model's predictions pertaining to a later instant. An anticipatory system's present behavior depends upon future states or future inputs generated by an internal predictive model. Model-based behavior or more specifically anticipatory model-based behavior is the essence of social, economic and political activity, end quote. And that was drawn out as this MES. So it looks kind of familiar to some diagrams we've looked at in active stream, but it's actually different letters representing different things. So M is the model of the system. We might say that's like the generative model. E is the effector, those are like the action states. And then S is going to be the system or the niche. And the environment includes the total system. And then the effectors influence the system. And then there's a bi-directional relationship with the model and the effectors. Now, I hope this isn't too far down the rabbit hole, but I think there's a few things that active inference and non-active inference people can learn from this here because, again, it helps to distinguish which parts of what we're talking about are related to, for example, inactivism or which parts are related to Bayesian computation statistics, which parts are related to thermodynamics. And by tracing out the different lineages of thought that culminate in what we have today, we can also clarify not just previous ideas and realign them in the context of active inference, but also we can start to potentially work on all the systems that were within the scope of these disparate ideas within the scope of what we have. And so it is very important to make the mapping and to be specific about it rather than just say, well, yeah, active inference reimagines it, so it's gonna be totally different. Well, it's not gonna be totally different, actually, because a lot of these figures are from the 1970s and they're going to remind a lot of active inference people about probably a lot of things, things that I'm not even perceiving, which is why I put it out there for other active inference people and non to be thinking about. So we saw from inactivism and from the ecological, psychological domains that we're developing in the 1900s that there's many rich ways to think about this shared interaction or the interface between the ecological niche and the skilled and intentional organism. We also saw from multi-scale systems, complex systems, that there's many ways to think about internalism and externalism and about causal relationships. There's no privileged level of causation or other multi-scale system approaches. Okay, now we're gonna focus on anticipation, not these other multi-scale domains. We're gonna be one domain, one type of niche. What are we gonna really be looking at with anticipation? Okay, so the anticipation research frames this world as a multi-loop situation that are simple or complex versions that include the observers formal model F as well as the natural system. So down here on the bottom, we have F, which is the formal system of the agent that's like internal to the agent. And then we have N, the actual external natural system. And so we've seen this again in the context of internalism and externalism and in action and inference. But here it's actually only about anticipation. That's why I think it's so interesting, especially because it's actually called inference what the agent is doing with the formal model. And that's inference under a formal model. So a lot of hopefully connections are being made. And the idea of the anticipation research is that we can find analogies amongst natural systems within Rosen's holistic anticipatory ecology. So one formal system, this is gonna be our formal ecology. And we're gonna go out to a bunch of natural systems, deserts and rainforests and we're going to be doing a decoding process. And by decoding the causal entailments of a natural system by inferring what kind of feedback loops are anticipating and being anticipated by that system, which is our own loopiness over here. By undergoing that process, we'll be able to find structural relationships between different natural systems. So here's natural system one and natural system two. And this line of research is interesting because it introduces a basis in logical areas as well as in category theory and dynamical systems. So it bridges a lot of areas of formalisms and that enables this kind of law-like and semantic or ontological addressment of systems, which are things that the active inference research frontier is also focused on like is active inference a theory of semantics? And so I think there's a couple of ways in which active inference could come to this anticipatory framework and have some insights that are a value add from active inference. So first the formal model of the organism doesn't have to be formal in any sense that we traditionally understand, but it is something that is formalizable or maybe it's related to the church-turing hypothesis or there's a lot of ways to think about this, but active inference is pretty vague on purpose about the exact nature of this model by just saying, well, it's something that we're also inferring as an observer who can make a certain kind of model. So we're not sure that we're on the right one, but this F is a generative model. So it doesn't have to be just an analytical or inferential model. It's actually related to control systems. So here it's encoding and decoding, but we talked a lot about action and about perception. So active inference more squarely places this loop, not in the realm of signal and abstraction and meaning like semiosis, but within the control theory loop of system state and internal states and effectors. Second, the formal model, as we've talked about with the internalism and externalism and tale of two densities, the internal model or the formal model doesn't have to be simply inside of the organism. It can exist in this exocortex way in this extended or embedded and cultured or a collective way. And so it doesn't need to be all within the bounds of the organism. Now, Rosen probably was aware of all these tensions and used fuzzy sets and ecological thinking, a lot of other things. So it's not, again, not really commenting on the anticipation, sorry, anticipation research. Don't know too much about it, but I'm just thinking about where we could draw and build upon. And really, this is so interesting that in the anticipation area, there's a dialectic or there's a dynamic between the formal system and the natural system. So it's almost like natural and artificial and then that's related to encoding and decoding. So it's framed about information and about natural communications with an unnatural almost. Whereas the active inference approach takes that ecological insight and connects it to control theory cybernetic. So it's action oriented. That's the action research side as well. Instead of the semiotic elements and seeing the semiosis as emerging out of action rather than seeing the semiosis as primary and stimulating or signaling action. Which one is primary? Well, which one was around first? And again, really important precursor research for complex systems, a lot of other areas and just we're coming at it from anticipation. We're not using too much active inference jargon. We're just thinking, wow, if active inference could map to anticipation then every keyword and literature about anticipation, we could be able to be one jump away. So that's why semantic indexing of research databases is so important as well. All right, second keyword, counterfactuals. So counterfactuals are very related to prediction and to anticipation. And that is why they're in this order. So many philosophers, and this is drawing from the Stanford encyclopedia philosophy, many philosophers have proposed to analyze causal concept in terms of counterfactuals. And the basic idea is that A causes C and that's related to this claim that if A had not occurred, C would not have occurred. So that's called the difference making cause or the cause that makes a difference. So it's like you imagine two worlds where something happens or it doesn't happen. And then you ask about how they'd be different. So if tomorrow there was no sunrise in California, then it wouldn't be the temperature it is today. So you can think about an alternate world even one that can't happen or wouldn't happen. And then you can talk about that and say, oh, well that's cause the sun causes the day to get warmer when it rises or when the earth rises, should I say. And what's good about this counterfactual is that you can go beyond simple counterfactuals and you can actually have quite nuanced logic of counterfactuals. So here was an example from Lewis 73 where depending on the way that more information is added, it changes the truth value of the counterfactual. So here's just an example of how that can flip and then also how we can use logic not going too deep but just we can use logic to denote these sentences. So if I had shirked my duty like not signed up when I had to, no harm would have ensued, no harm would have followed. So if there would have been I, this event I, then not H, no harm. And then in B, we see if I had shirked my duty and you had to, so now it's kind of like the and operator, I and you, two events happen, event I and event you, then there would have been H. And then here in part C, we see I and you and part T, then not H. And so this is kind of like a formalization but it can also be done in ways that are a little bit more data driven instead of formal framework driven but it's very related to so many things that we've talked about like infinite grammars and logical sequences and about the ways that you can have a generative model and then you can ask about how things are based upon how things could be in a different sense. We're gonna of course come back to that with the papers. Here is where the counterfactuals come a little bit more directly to inference. So counterfactuals in the inference case are like can we understand how things are or were by asking how things could be? That's purely about inference. And then the action orientation of that question is can we reduce uncertainty about future states? Like can I make sure that I have a good core body temperature by asking about how things could be so that I can choose the right actions. So if it gets cold later tonight and I'm going to want to have to have a jacket or feel warm, then I might need to get a jacket right now. And so that's what this paper is about. That's what we're gonna return to in precision but let's keep it within the sort of keyword level analysis. And here's a plot from SEP from Stanford Encyclopedia of Philosophy. And it's a strict analysis. So categorical of how the antecedent, what happens before not the ant, the antecedent is related to what does or does not. That's what this kind of L means not the consequence. The consequences are what happened after. So in this situation, we have things that actually happened before above the X axis and then things that didn't happen before like if the sun wouldn't have risen yesterday then that's this not phi, that's not what happened. And then the consequences are shown on this symmetry axis. So in this quadrant on the top left we have things that did happen before and they did follow. So this is like today I tied my shoe and it did stay on. So this is sort of normal reality. The shaded region must be empty. Why? Because it's things that actually did happen in the antecedent but didn't happen in the consequent. So this is like things that were, I boiled the water but then it wasn't hot or something like that. Like something where you did the setup correctly but then it didn't occur. So we know that that doesn't really happen. And then there's this whole set of worlds below the X axis which is like if anything had been different in the past, if I had been wearing a different color shirt, if dot, dot, dot, dot, dot, all of those worlds whether what actually happens happened or not all of the worlds of the past are accessible because they might relate to a similar world outcome or not. And we can nuance this sort of categorical approach to what's called a similarity analysis. And so a similarity analysis is where we can start with our estimate of our current state, the current moment, the causal relationships and all that sort of stuff at this origin point. And then with the same axes of what did or didn't actually lead up to the event and then what did or actually didn't follow from the event, we can have more of a continuum. So yes, there's still a little shaded area here but as you get to events that are further and further in the future, you get to things that potentially there's more latitude but the sphere is a similarity idea. It's kind of like philosophers discovering numbers. They're going from the categories and they're like, maybe it's a one through 10 scale. Maybe it's a continuum. Maybe we could take an integral over it. Maybe we could take a derivative. What would all these things mean? And so that's kind of the fruitful discussion with people who have a quantitative and a qualitative background because the rigor of this approach which is barely scraped out with this just summary abstract that I have, it's a lot of horsepower to plug into if we can reconsider some of these topics in terms of a more integrative way of thinking then we can come back to everything about antecedent and consequent and reinterpret it and reenliven it. So that's what's really exciting about transdisciplinary research and teams. Hopefully not to always be returning to that point though but it's really true. All right, affect. So there's many usages of the word affect just looking at the dictionary definition. There was just so many. And I found this image and I thought I don't even know what to say about this image because person A has an angry emotional affect. So that's like using their character about them. Is there they had an affected character and then trying to hide their anger person A put on an affect because affect has a connotation of being how other people perceive your manner of speech and affect also has a sense that means affected like changed altered in a way. Once he entered the room he started speaking in an effect in an affected manner. Now doesn't really describe anything just as he did it in a different manner but still person A then affects affected that's what this is trying to show in this left cartoon I believe affected person B by pushing them into the pool. The effect of this A versus E was that person B ended up in the pool right side. So that's what I think was trying to be shown was the common sense and the confusion which is the affect is the cause and the effect is the consequence. So that's why it's interesting because affect and effect of course common typo but it relates to what we just saw about the precedent at the antecedent and the consequent in terms of counterfactual structures of the world but also affect has to do with of course emotionality that's why we're here that's what interests us. And so in that more classical psychological sense affect is considered to be related to like emotion or experience or qualia again not my area just from my skimming in preparation for this video. And often affect is displayed on these lower dimension manifolds or axes of variation. So for example one axis might be from relaxed to stimulated. Now that doesn't mean feeling good or bad you can be feeling good and relaxed or bad or tired or here there's a mild to intense. So these are emotions that don't grab your salience so much and these are ones of the high intensity that do like being astonished or being odd or shocked. Then there is a orthogonal so a independent angle axis on the X here that goes from unpleasant to pleasant. And so whether you think of it as a two by two like there's the pleasant and intense experiences in the top right and mild and unpleasant down here being a little bit too cold or something like that you can think of it in a two by two you can think of it in a circle you could think of it as having more dimensions you can think of it as having measurements in this phase space. So if this is maybe the first time you've seen all the emotions on a chart think about what all the options of charts are and then all the options of emotions and then that's kind of what we're gonna be getting at. So not just points in space but we could actually be talking about trajectories through that phase space whether it's an emotional cascade or a pathway to radicalization like we were talking about yesterday in 10.2 whatever it happens to be or just your daily routine if you wake up and you're here how are we going to get to here? This is from feeling afraid through feeling excited and happy into content and serene. Now, not that this is a pathway or the pathway or whatever but perhaps if we use tools like active inference or other computational frameworks could it be possible to introduce some ways of thinking about this diagram that would help us actually navigate these ridges and find paths, find policies for people who are in one area of this phase space to find a trajectory into different parts of that phase space. So they already have the phase model of the valence and the effect and then there's already this trajectory model of fields and trajectories coming together and now we're gonna reconsider. And here's just a couple citations on the left about non-active inference neuroscience approaches to valence and to affect. Valence being positive to negative just like an electron or a proton has like a valence. Valence is positive to negative axis. So that's the x-axis here. It's considered to be one of the major axes of variation as people who feel good or bad or change how they feel throughout time can attest to. And so again, we wanna think about affect in a couple of senses, not just the semantic. We wanna think about affect as a statistical phenomena especially one that can be studied in all kinds of animals as well as in even computational agents. And in that situation affect as a statistical phenomena is like feeling greedy or feeling volatile or feeling regretful or feeling confident. These are terms that we might wanna apply to even a chess playing statistical machine or a gambling machine. Oh, it's really acting shy. The intentional stance from Dennett maybe that's a valid thing to say but the statistical sense of affect is just purely the way that the system's behaving how would we describe its intentionality as if? That's very behaviorist, very from the outside. Then there's the neuro side of affect and that's related to the neuroscience kind of literature that's here. People who are actually looking in the brain for electro and chemical differences and people who have different affect and different affect states connecting it to different features, dynamical or otherwise of the brain. So that approach is very internalist. It's focused on things internal to the organism and it tends towards this sort of neuro scientific reductionism. And then there's also affect in the psychological and in the psychiatric sense, which is the experience. And that's where of course this whole area this field of emotions that maybe some people experience a little differently than others but no one strongly disagrees or says that it's preferable to go on this side or the other. I mean, that's kind of the question in many ways of cognitive diversity though. And this is where I believe active inference comes into play is you have the statistical behaviorist could be about robots could be about anything. You have the neuro scientific, very internalist very neuro human related. And then you have the psychological psychiatric which is very social and very semantic and based upon the clinical relationship in many cases. Here's where active inference comes into play. It adds in a level of computational computability but also philosophical agnosticism about a system that using systems engineering other approaches we're gonna be able to talk about systems very precisely while also remaining basically agnostic on several key issues. So when people think that some aspect of uncertainty in a framework is a simply negative thing. It's not that I'm doing apologetics here. I don't think every uncertainty is a good thing. Certainly not, but there are several areas just like we discussed at the end of 10.2 we're actually saying, yeah, we're using it as a tool. So whether the person chooses to have cultural script A or cultural script B about this experience we're not gonna make a value judgment on that but potentially from a purely statistical framework we just wanna be able to describe certain kinds of behavioral state transition matrices being different in our model. So that's a way where people can have a cultural experience in a personal experience but also use tools that are empowering. So also beyond the computational and the systems tractability we bring in this whole control and action perspective because all these are about thoughts, you know words, words, words, thoughts, thoughts and rumination and overthinking and underthinking can also be a maladaptive state for individuals or for collectives. So how are we going to orient this psychiatry especially in action instead of just like you're in brain state A so now we need to zap you or give you a chemical to put you into brain state B? Well, it's not so simple or at least we haven't figured out the simple way to do it yet. So how are we going to think big about the niche and the culture and the extended environment and then ask about how can we navigate this, you know care support team, experiencer, network how can we navigate through this landscape which is defined by specifics for this person in their situation? How are we going to find a productive policy to get from here to here? So those are the kinds of things that get me really excited because I feel like they're strong yet neutral and powerful ways to talk about mental states and to give one example of a mental state which was a keyword so I have to go into it for this section but it's something that's, you know in our experience is anxiety. So I'm not a medical doctor of course and I'm not your medical doctor. So this is really about that computational psychiatry perspective though we all feel anxious we all wanna help each other no one is alone or should feel alone so we all wanna work together and help get you on a productive path wherever you're at. All right, so here's the definition of anxiety that's given by none other than anxiety.org. Anxiety is the mind and body's reaction to stressful, dangerous or unfamiliar situations. It is the sense of uneasiness, distress or dread you feel before a significant event. A certain level of anxiety helps us stay alert and aware but for those suffering from an anxiety disorder it feels far from normal it can be completely debilitating and in the anxiety.org literature they provide three different categories of anxiety disorders. Again, this is contextualizing why we're studying this area as well as trying to find what are the footholds where active inference could start to make an impact whether in people's lives or in the biomedical system or in the health insurance system just where across the board are we going to see this insight take place in systems change. So the three kinds of categories of anxiety that they talk about are one anxiety disorders which are general featured by excessive fear and anxiety kind of I don't it's a little self-explanatory but second category is obsessive compulsive and related disorders which are related to obsessive intrusive thoughts and compulsive behaviors. These behaviors are performed to alleviate anxiety associated with obsessive thoughts in their quote. And then the third category is trauma and stress are related. So one time or chronic stresses and all these different interacting factors that are so nuanced that is related to this third category. So what to say about anxiety? I mean, wow. So first is there's just many kinds and there's many experiences. So we wanna respect that everyone's having their own experience and it's gonna be a totally unique experience to them and it will demand a personalized way to help them just like everyone. So that was sort of the first point. Then there's of course to say that there's many aspects to this and there's many causes and factors just from the clustering of diagnoses that can be considered within the category of anxiety disorders according to the DSM let's say. It's quite broad. There's many features that some people might present with versus others and there's no clarity on the underlying neurophysiological or even situational basis. And also it influences many kinds of thoughts and behaviors but what's a theme and anxiety and something that connects all these different areas is uncertainty is a theme. And when we see the word uncertainty in active inference it's like a doorway because active inference is all about agents that are trying to reduce their own uncertainty about action in their niche. So when we see uncertainty and anxieties about uncertainty we wanna think about how we could build active inference into a broader discussion about psychology and psychiatry. So let's think about these different manifestations of anxiety and reframe them in the context of active inference and a couple of other topics that's partially where this paper is coming. And we can start with the integration of all the keywords together before we go into the paper. So starting with anxiety let's go first to affect. So anxiety it's either a dimension of affect or it's a point in affect but the point is it's someone's lived experience that they're having anxiety and specifically that they're feeling negative about it and they wanna get better because if it's not feeling negative about it then it's not pathological that would be considered I believe within the realm of healthy but not suffering. But if the person says puts their hands up and says yes I'm suffering I would like to have less anxiety that's what we're talking about in this situation. As well as managing it for everyone whether they realize that at a metacognitive level or not. The next keyword is anticipation and anxiety can be very related to anticipation which is like predictions or estimates for the future. And here's one thing that we can unpack at this point as we start to turn towards more mathematical sections for the rest of the talk. And it's actually that the estimate is really like a package of a couple of things. It's the specific contents of the expectation but also it's likelihood relative to other things that could happen and it's confidence. So that's like a way to say that we can separate what the prediction is from how likely it is and from how likely we think our prediction is gonna be good. And let's combine that with a few examples with the counterfactuals. Because I said it's how likely something is relative to other outcomes and that's related to the counterfactuals. That's related to those spheres of similarities and those similar worlds. And so counterfactuals are essential for anticipation but they're also critical in the current moment. So let's think about anxiety as a theme and counterfactuals and anticipation and about the past present future. So a counterfactual about the past that might lead to anxiety would be like, I wish that in the past I had done X instead of Z or I wish I never did Z. Two hours ago or 20 hours ago or 20 years ago. And then a counterfactual about the current moment is like I wish it were like X right now instead of like Z right now. Just I wish something were different in a state about the world. But here's where we get those confidence estimators and the anticipatory systems element because we don't have to anticipate the past or the present. That's just kind of how time's arrow is funny, huh? But when we're talking about the future, people don't really include these confidence estimators about the present. They don't say I'm 100% confident that it's daytime. They say something like it just is daytime. But when they include something about the future, it's very common to include a confidence estimator. Like I think it's unlikely that my friend will do X in the next three years. And then you can have always predictions amidst other predictions. So like I think there's a 10% chance that my baseball team wins the World Series. That means that there's a 90% chance that you don't think they're gonna win the World Series. And then you could have a critique or a metacognitive estimate for your estimate. Like, yeah, I said 10 and 90, but actually I have no idea. So it could be way, way, way off. So you have a very loose estimate even on the 10 and 90. Other times you're like, yes, I'm very, very confident. Now, again, you could be wrong. You could be way off. You could be asking the wrong questions. You could be not saving the text file. So the world intervenes in ways that go beyond this model, but we're thinking about just this simple decision-making context. And so in the context of anxiety, we can think about the generative model that leads to the production of affect of experience, whether it's epiphenomenal or not later day. But this generative model is in this strange attractor of a zone of thought and action. And that strange attractor is whether it's ruminating or whether it's related to any of these other manifestations of anxiety. It's like there's a strange attractor of thought and then that can be overwhelming or it can lead to negative experiences or it can lead through a trajectory that goes to negative places. So we wanna be maybe moving towards thinking about this landscape of attractors of thought and action and niche and social environment, thinking about all these factors together. And then this field of affordances, the field of consciousness, the free energy landscape. How can we bring all these things together? Think about how to help people who are experiencing anxiety and other situations. And before we go to the paper, it's this slide we'll return to and it's what does free energy principle and active inference say about the relationship between agents in the world when in this case, the agent is affective, emotional and imaginative. And here's the affective agent who's also imagining future emotions. So this is the kind of model that we wanna keep in mind for the paper. It's about an agent who's acting in the world and right now it's just a simple interface but you can imagine more nuanced interfaces at a later model. And this agent is having an experience but that's not a philosophical claim. It's just a parameter that it's calculating. And then it is also calculating perceived future likelihoods of different states. Okay, so with no technical details at all, that's what this paper is about. Agent in the world with this valence affect here, it's simple, it's zero to one, like good to bad but you could go in different dimensions but here it's just good to bad, agent in the world thinking about the future. Okay, now it's gonna get more technical. That's what the paper's for though. So sophisticated affective inference, simulating anticipatory affective dynamics of imagining future events. It's from October, 2020. So just like two months ago. At the first international workshop on active inference, IY 2020 at Ghent. And the authors are listed here. So cool, first annual, good, bold start. We wish them the best of luck in their series. The aims and the claims of the paper are as follows. So they write, in this paper, we aim to provide a mechanistic account of how affective responses can be generated by imagined future outcomes and how this can become dysfunctional during rumination. By combining two recent developments in active inference, anticipation and affect, we provide a formal model of these phenomena and simulate how overthinking a situation can occur, continuing to the point where unlikely, yet aversive and arousing situations emerge in one's imagination. So interesting motivation for the paper now. And in non-active inference terms, I would frame this as, how can we model affective and anticipatory active inference agents? What features do these simulated agents display in simple environments? And then, because this is such a short and introductory paper, how could we think about this being useful in the fields of psychiatry, cybernetics, all these different areas? And of course, we'll go into the formalisms soon. So the abstract of the paper. In this paper, we combine sophisticated and deep parametric active inference to create an agent whose affective states change as a consequence of its Bayesian beliefs about how possible future outcomes will affect future beliefs. To achieve this, we augment Markov decision processes with a Bayes adaptive deep temporal tree search that is guided by a free energy functional which recursively scores counterfactual futures. So we're already seeing a lot of the keywords come back into play. Our model reproduces the common phenomena of rumination over a situation until unlikely yet aversive and arousing situations emerge in one's imagination. As a proof of concept, we show how certain hyperparameters give rise to neurocognitive dynamics that characterize imagination-induced anxiety. And so we're gonna use the computational psychiatry definition of anxiety and yep, it's a short abstract, relatively short paper, invokes a lot of formalisms and so I hope that people find it interesting. The roadmap is pretty short. We only have to go to a few gas stations on this one. There's an introduction. There's a methods. Figure one is a directed acyclic Bayesian graph which we'll build up to. Figure two is an illustration of the state space of the task with all four states. And then table three has a lot of formalism and it's the predictive posteriors that provide the empirical priors for the generative model. Three is the results and then there's figure three which is just an example of simulation results showing detrimental effects of overthinking. So they made it about overthinking but they didn't overthink the complexity of the paper. So it's a simple straightforward paper but again, it invokes a lot. So let's get to the formalisms and let's try to really understand them or at least give them a space where they can exist even if we don't know all the details where are the edges of the formalism so that we could be thinking about it correctly or asking the right questions or just the questions that make sense for us. Whatever they are, that's the best question to ask. Best question to ask in the live chat. Best question to ask in a comment or on a live stream but definitely just ask it because someone else is gonna be asking it and you'll be really helping them. So formalisms, here's from the paper. They wrote, by combining the ensuring recursive update scheme of sophisticated inference with deep parametric affective inference. I don't have the British accent to say it the right way though. We can derive a general purpose generative model of the following mathematical form summarized graphically in figure one in an tabular form in table one. Okay, so they're combining two citations here and so it's worthwhile to ask what these citations were and they are sophisticated inference. That's first in it all 2020, the first citation in the sentence and then there is PESP et al's paper deeply felt affect which was actually published November 30th, 2020. So it was a preprint I think at the time of this one being written but then now it's been published. So here's how to think about this paper sophisticated affective inference. That's what everyone is listening to this video about. That's what this paper is about. That's what the discussion is mostly about. So it's combining building on the shoulders of other shoulders the affect side with a temporal depth side, both of which we're gonna go into in this video which is why it's a background video. This paper builds on top of them another brick in the wall, another citation in the network and we're gonna be thinking about where does that go given that this was in October, 2020 and it's only December, 2020 or maybe even only a few months later if you're listening to this but where are the next places to go? Well, there are the directions of making more advanced simulations like taking something that they took as a fixed variable in this paper and just asking if it could be a vector of variables or if it could be a continuum or if it could be another type of model inside of that and then give it a new acronym and make it another meme. So what simulation directions could we go just purely exploring or making it more mappable to other fields? Then how could we move in a way to integrate better with computational psychiatry and psychology? So how could we integrate with experts and with vocabulary and key questions in those fields so that this isn't just like a machine learning parallel universe related to mental health but it's actually something that's connected to how social workers in the field or how caretakers in their daily experience relate to these topics. And then another direction which I don't know too much about but I hope that we can have some people on to teach us more are about new models for robotics. So how does including these kind of top level parameters like affect and in the context of a deeply nuanced counterfactual anticipatory model how does that relate to a robotics context and will that help robotic birds finally come out for public use? Well, let's go to sophisticated inference first. So again, temporal depth and affect the two pillars that we're gonna build on we're gonna go depth first search. Some of you are gonna get that. Here is the sophisticated inference paper which we're not gonna read for an active stream so I will just read the abstract. Active inference offers the first principle account of sentient behavior from which special and important cases can be derived. For example, reinforcement learning, active learning, base optimal inference, base optimal design, et cetera. Active inference resolves the exploration exploitation dilemma in relation to prior preferences by placing information game on the same footing as reward or value. In brief, active inference replaces value functions with functionals of Bayesian beliefs in the form of an expected variational free energy. Key point. In this paper, we consider a sophisticated kind of active inference using a recursive form of expected free energy. Sophistication describes a degree to which an agent has beliefs about beliefs. So higher order theories. We consider agents with beliefs about the counterfactual consequences of action for states of affairs and beliefs about those latent states. In other words, we move from simply considering beliefs about what would happen if I did that. So that's a first degree counterfactual to what would I believe about what would happen if I did that? That's a second degree counterfactual. So for example, it could be, I really wanna know about a Penn's relationship to my affordance of writing but I don't know how heavy it is. And so you can keep on thinking about, well, what should I do with my hand to find out how heavy it is so I can resolve my uncertainty about it being an affordance? These are the kinds of nested and open-ended frameworks for cause which can be extremely contextual and ecologically conditioned. We wanna set up these causal chains in a way that we can capture them. That's what this is about, but the state space is big. So a common theme we're gonna come back to is like how do we look through these big state spaces? The recursive form of the free energy functional effectively implements a deep tree search over actions and outcomes in the future. Crucially, this search is over sequences of belief states as opposed to states per se. We illustrate the competence of this scheme using numerical simulations of deep decision problems. So one way of interpreting this last part about here is this search is over sequences of belief states as opposed to states per se. This is really taking that full jump into the world of action instead of just inference on world states. So instead of like I'm gonna have all my computational power on what the temperature is gonna be in California in one year from today, it could be like I should just have between one and three jackets. And then within an 80-20, but really more like 99-1, all have the right policy. But no one puts the big computational power on the jacket prediction algorithm. They go on the climate prediction algorithm. And there's a lot to say about that, why that decision is made, not in that specific case, but just in general, why do we go with all of our power on predicting states in the world rather than doing prediction on policy? When in the end, policy is like what we can control and bound and then we can leave space for the unknown. We can leave space for how likely is this tsunami every once in 30 years or 50 or 100? What's the most resilient thing we can do right now? What's the best inference on policy for our beachside community, not what is the Gaussian distribution of tsunami waiting times? Do people hopefully see a few directions in which active inference steps into these issues about uncertainty? Here's some of the formalisms from the paper. They write, our objective is to optimize beliefs. So an approximate posterior over policies, pi as always, and their consequences, which are hidden states, s. And those are basically s triple lines, like a super equal sign. And it's saying s is related to this tau. The t is like the tau, the policy horizon. So the hidden states are s from some initial state, s1, stay at time one until the policy horizon tau. So that's the length of the prediction, the depth through time. Given the observations O from up until time, until the current time, t. This optimization can be cast as minimizing a generalized free energy functional F of, so it's a function of something. And that function can be enormously complex, like a neural net is like a function. It's a mapping of inputs to outputs. It's a map, functions are maps. And so F of everything, it's like that's the biggest possible wormhole, but that's why this is a powerful approach. And then it's q of s and pi. So this functional is around q, which is some distribution of states and policy. So the world state is like what the actual, my estimate about the world state is gonna be, and the pi is my policy. So here's what is being asked here for those who might be learning about active inference or wanting to really go into these formalisms for perhaps even the first time. This is like asking, how can we have the best policy over a given time horizon? So that's the big control theory question. Given tau and given the distribution of affordances, pi, what's the best pi over tau? Not division, just what is the best pi over the time horizon tau? So if my policies are left or right in a collision situation, and then there's a time horizon, you only have those two choices. And so you can pull back to the meta game and say, do I have another choice? There's always space for innovation and for breaking out of the box and for reframing it. But at this first base level, let's just focus on a certain time horizon and a certain set of policies that are fixed. So we're playing like connect four here or we're playing like backgammon. It's gonna work for these models too. And we wanna ask, how can we estimate the consequences of pi? So of ourself, of the policies of others and the policies of groups. And the way that we're going to do that mapping rather than go infinitely deep on what is a causal model of the world or have every atom simulated in the world, we're actually gonna go to a different way. And we're gonna ask, how are observed states related to our estimates of hidden states? And how does that relate to our underlying causal model of how hidden states change through time and how our policy influences them? That's the framing. And there's times where you can like, breathe as much as you want, but you're not gonna change by yourself the CO2 in the atmosphere. So by modeling it this way, it doesn't mean that there's gonna be a way to make the estimate. The agent might not be set up in a way to succeed. But again, it's the framing here that matters. And you could frame a winning chess game or a losing chess game or a million pawns versus one or any board size. All this can be included. That's what we're using these numbers and letters for. Because then the pie can be really nuanced. It could be a million different self-driving car parameter settings, but it could be evaluated. They continue. This generalized free energy has two parts. The first entails a generative model for state transitions given policies while the second entails a generative model for policies that depends on the final states. So emitting constants. So the generalized free energy of sophisticated inference, which is deep time inference is gonna have two parts. One part is going to be a generative model. They're both generative models, but they're kind of like two-stroke engine. One generative model is about state transitions given policies. And then the other half of the generative model is about policies given state estimates of the world. And so those are manifested in Q and P. And they're both related to policy, but they're related in really specific ways. So more details are provided above. This is just the screenshot from the paper. But we have two key equations that come at the bottom here. Q, so mind your Q's and P's, just classic. Q is conditioned given policy. It's about observations and states given policy. And it's related to a model of how observations arise from states O conditioned on S and related to another model about how states are conditioned on policies. Okay, so it kind of makes sense because we're conditioning here on states, I'm sorry, we're conditioning on this right side of P and Q with policy. And then here on the P, we're conditioning on S, but then we're doing an S estimation here. So we like, maybe it kind of cancels, that's how we ultimately get S on the left side here. Like this one somehow scoots to the left side. Ask a statistician, but there's more nuanced layouts. This is just the simple one. And it's basically related to this P and Q internally. Maybe there's another, I don't see how Q, I don't think it's recursively defined, I think it's two components of Q, but let's hear the clarification from the authors. Then there's this P function and that is related to policy selection itself. So it's not conditioned on policy, P is about policy. And this is related to expectations across policies and the free energy, generalized free energy of the policy. So P is like the policy selection side and Q is like the inference side because this is like what are the states and the observations of the world given policy that I do, given that I go on a run every day, what would be the states and the observables of the world that would be different? So it's a counterfactual, but I'm framing it within this type of approach. And then the other half is given that that's a likely or a possible or a preferable or a curious or a fun or a habit forming whatever it happened to be, however it fits into E and G defined formally, that relates to the policy selection path that ends up getting enacted by P. So yep, definitely let's get some more clarification from the formalists themselves. They continue with this section, equation 1.3. And so here's something pretty interesting. This is, they write, this corresponds to a form of approximate Bayesian inference, i.e. variational base in which equation 1.3 is iterated over factors of the mean field approximation to perform a coordinate descent or fixed point iteration. So they are saying that this formalization corresponds to approximate Bayesian inference I'm gonna provide a little bit more information on that in a second. I found that to be kind of an interesting claim and not sure are there mathematicians or machine learning experts who could clarify this issue, that would be appreciated. And here's what they wrote which I thought was really interesting. Sophisticated inference recovers Bayes adaptive reinforcement learning in the zero temperature limit. So cool, because it's like saying in the extreme no noise case in some info thermodynamic way it converges to a certain type of other thing. So just like some algorithms might converge to a normal distribution. Let's just say you sample a bunch of weights and they converge to a normal. This is like a convergence, but it's converging this model family is converging on Bayes adaptive reinforcement learning. So we take all that we know about Bayes adaptive reinforcement learning all those neural nets. And then now we have the best one. And then it's about converging into that best funnel in a higher temperature space that's even going beyond the model parameters of those Bayesian approaches. Quoting the authors again. Both approaches sophisticated inference and the Bayes adaptive reinforcement learning perform belief state planning where the agent maximizes an objective function by taking into account how it expects its own beliefs to change in the future and evinces a degree of sophistication AKA being good at chess, being sophisticated, go. The key distinction is that Bayes adaptive reinforcement learning considers arbitrary reward functions while sophisticated active inference optimizes in expected free energy that can be motivated from first principles. Key difference, people who are looking for a key difference between free energy and something else, that was one. While both the Bayesian approach and free energy can be specified for particular tasks, the expected free energy additionally mandates the agent to seek out information about the world beyond what is necessary for solving a particular task. So that's extremely key. This goes beyond, for example, framing the potential value of play in terms of the estimated values of the deliverables. This actually something that we've heard so many participants on the act in stream like blue and others highlight is like making space for play. And this full formalization of the expected free energy, it includes the information gain as well as the value driven components and some other components. So actually on equal footing down to even the units of information used, at least according to this first interview I listened to a couple of days ago. It's on our, one of the playlist on our channel. The same natural unit or at least the same framework, whether there's a coefficient that balances them, he seemed pretty negative on that. But I think there could be another way that they're integrated in a more nuanced way than just simply an 80, 20, 20, 80. This isn't bipartite partitioning. This isn't nature nurture 1.0. This is relational causal networks. And so it's gonna have a really different way of phrasing. It's not gonna be just a coefficient, but it'll be like a coefficient, I believe, because there's systems where it is like it's 80, 20 or it's 20, 80 in terms of delivering on something standard versus novelty. So like we talked about scripts in 10, there's the scripts like the ritual that are meant to be carried out exactly. Then there's the ones that are unstructured. So that whole space. Back to the authors. This model allows inference to account for the artificial curiosity that goes beyond reward seeking to the gathering of evidence for the agent's existence, i.e. its marginal likelihood. This is sometimes referred to as self-evidencing and the citation there is to Howie 2016. And so we can think back to active inference eight when we were thinking about that mountain car and about how it just wanted to explore. But by exploring in that admittedly trivial setup, it did discover the goal in the way that the optimal reward uphill climber got ground down. So how can we move beyond just one well settings, move into more nuanced, maybe even semantic innovation landscapes and optimization landscapes and converge to approximate Bayesian inference in the truly formal case where we need to but then relax those assumptions with instrumentalism and use approximate Bayesian inference empirically, empirically just as statisticians, as scientists, as policy planners. That's really the question in the doorway. All right, another from the same sophisticated paper. Amortize inference effectively and also see active inference stream eight with Alec who gave us some really great information about this and hopefully we'll have Alec on a future discussion as well. So amortize inference is learning to infer. Variational auto encoders can be regarded as an instance of amortized inference if we ignore conditioning on policy. Clearly amortization precludes online inference. That's like real time like data in a loop and as such may appear biologically implausible. However, it might be the case that certain brain structures learn to infer, e.g. the cerebellum might learn from inferential processes implemented by the cerebral cortex. So this is definitely related to a neurophysiological argument for like grounding the plausibility of certain computations being performed or as if they were being performed by certain brain regions that have certain macro histological arrangement or micro histological structure. Sounds great. I would say in the case of insects could we study whether they're doing it or not or I just think maybe it's simpler or it will help us realize that, oh wait, we don't really need the neurons to be doing that. We just need to be able to think about it. Just like we didn't need the neuron to be doing a linear regression for us to do a linear regression across neurons. So I don't think that using these frameworks necessarily casts the system being analyzed into the same statistical framework any more than the linear regression frame does. So that's to say they all do implicitly and that's the transparent concept. That's the water we swim in. Given that it's the water we swim in and there's a few different fish tanks, I just don't see how using free energy on a system makes you strongly committed to that system being doing a certain thing. Another part that was interesting from this paper was this focus on separating the intrinsic and the extrinsic value. And so there's a lot of ways that this can be broken down but basically a few cool equations. So this is the divergence, the Kolbach library KL informational divergence which is a measure of how distant two distributions are informationally. And the KL divergence between this Q and the P is seen as the risk. And that is equal to the risk of certain states like whether the red light is gonna signal something or not. And then the equal sign on the other side there's risk in terms of the outcomes. So here's S conditioned on pi, here's O conditioned on pi and then here it's P of S instead of P of O. So this is a divergence from the observations given the outcome. So this is like this Q is about observations, P is about observations and it's conditioned on policy. So this is divergence in states, divergence in outcomes plus expectations. This is about Q of observations being conditioned on policy. So we're kind of functionalizing and conditioning on a lot. But this is actually about the states. And this is about the states given the observations and the policy and how that is bringing us close to what? The mapping between states and observations which is the interesting part. And so there's so much to say here but they unpack it in terms of the intrinsic and the extrinsic value. So there's basically this separate sections. I would just say not just in the interest of time but just in the interest of knowing that these formalisms really demand to be unpacked and mapped in a really formal way, not just discussed in an introductory lecture. So let's save them for another time. As far as the general and the specific cases of this, there's a nice figure here with a few different special cases that descend into certain different sub areas. So if there's no prior, this is like an uninformed or an optimally uninformed Bayesian design. So that's the information maximization principle. That's info gain. If we have no ambiguity, like if the prediction can be completely perfect as far as the observations are concerned, like all the thermometers are perfect in your building, for example, then you can have basically risk sensitive or aware policies that's like cybernetics as well as informational control and kind of Auckland's principle. I'd be really curious to learn what they meant here, but maybe that means like argument by simplicity in the context of systems engineering could be pretty interesting. Then if there's no intrinsic value, so if it's just pure curiosity, search, pure exploration search, then it can converge depending on how you value intrinsic versus extrinsic to a variety of different types of theory. And then also there's this maximum entropy, which I would also like to learn more about from the authors about no ambiguity or prior. So I'm kind of curious how these two, no priors and no ambiguity, how does that relate to the no ambiguity or priors? And then is there one that doesn't have any with no intrinsic value either? Like what happens when you have nothing? All right, and this one is gonna come back in a very, very similar format in the figures for the paper. But basically it starts off and it's like, here's us at the top at observation time point three. And then there's one, two, three, four things can happen. And then from thing one or three happening, four things could be observed. And then from one or three of those or from two or four, then you can have one, two, three, four things happen. And here outcome one and two and four lead to one of four observations. And so this is like an unrolling. This is like a rollout of a deep time prediction on a tree. And that's related to a couple of different ideas, which is why we introduced them, which is the prediction through recursion and the counterfactual branching. So the first thing to remember is that the branches are literally counterfactuals. It's like, what would my prediction be? But it's really just predicting the whole distribution. So instead of thinking it as predicting, is my team gonna win or not? Just think, what is the distribution of likelihoods of each team winning the World Series? And then that's like, what is the likelihood of each of these outcomes conditioned on this? And you keep on conditioning on, conditioning on and rolling that out through deep time. So it relates to counterfactuals and the state outcome mapping. So state of the world and then mapped onto your observations, the sensory outcomes through this learning of the relationship. So here is what it looks like in the format that we looked at at the end of 10.2, you have B, the transition matrix. And it's like, there's a state, it emits observation. It's this dark kind. Then there's an action policy, whether from the agent and or from the world. And that modifies or not the state transition matrix. And then the state goes to the next state. And then through A, which can be similar or not, it emits another observation. That can be probabilistic or not. So we're gonna see this again in just a few slides but pretty interesting. And that's just the first approach and that shows that the sophisticated inference paper was the one that really brought it out. And then also, this is pretty interesting. Figure eight, and here's the caption. This schematic summarizes the various imperatives implied by minimizing a free energy functional of posterior beliefs about policies, ensuing states and subsequent outcomes. The information diagrams in the upper panels represent the entropy of three variables where intersections correspond to shared information or mutual information. So Professor Jim Crutchfield at UC Davis or anyone in your area, I would love to be learning more about this question of the overlapping Venn diagrams because I remember a lecture with some overlapping Venn diagrams and information theory from Professor Crutchfield that was very influential for me. And then the caption continues. A conditional entropy corresponds to an area that precludes the variable upon which the entropy is conditioned. Note that because there is no, note that there's no overlap between policies and outcomes that is outside hidden states. This is because hidden states form a Markov blanket, i.e. an informational bottleneck between policies and outcomes. Two complimentary formulations of minimizing expected free energy are shown on the right in terms of risk and ambiguity. And left in terms of information gain in entropy. Respectively, one can see that both will tend to increase the overlap of mutual information between hidden states and outputs. That's what we really want is we want the states and the outputs, we want to be learning that relationship. Both will tend to increase the overlap of mutual information between hidden states and outputs while minimizing entropy or Bayesian risk. In these diagrams, we have assumed steady state such that risk becomes the mutual information between policies and hidden states. For simplicity, we have omitted dependencies on initial observations. Fair. The various schemes or formulations considered in the text are shown in the bottom. These demonstrate that the Bayesian control theory, i.e. KL control and Bayesian risk and optimal Bayesian design figure as complementary imperatives. Okay. Bayesian design of what? The answer is of experiments. So this is optimal experimentation conditioned on everything you know, okay? So this is like given the state of what's known, how can you get information gain? And then here is related to making the lowest possible risk through policy given to the states that are known and that's related to what's known as KL control. And so what we're seeing here is in this mapping of active inference and not necessarily other frameworks. I'd be open to correction or expansion on this, but not necessarily in other frameworks are there such even ways to understand and equivocate a really broad range of strategies. For example, you could do really, this is kind of like explore and this is kind of like exploit, except instead of the parameter switching explorer to exploit or one mode to the other, it's actually like a more nuanced trade-off mediated through policy. And what that policy is enacting in is a co-optimization of information gain that's this outward and it's expansive in this variational principle with the downward coordinating Bayesian risk precision optimization. So the Bayesian design is like you're on this point on this, you know, wherever you're at and it's expanding outwards with the info gain with the optimal experimentation and then it's collapsing inwards in order to decrease surprise about the world. All right, now, on April 6th and 13th, 2021, Casper, HESP and other authors from this recently published paper will be joining the active stream. So we'll go through the paper and all the figures and everything then, but the figures will just go through it one first time because they're going to be basically exactly copied over for the sophisticated affective inference. So where we start is M1 and we'll go through these pretty fast. M1, the first layer of the model is about how hidden states instantaneously are associated with observations. So here's observations, here's the likelihood mapping, here's the states we're estimating, states given our prior D and then from observations through the likelihood mapping A, some math. Now, M2 is a generative model of anticipation. So here we have D, our initial priors and then we have B, which is how D changes, basically how those priors change. And then we have states through time one, two, three and then at each time, we're having either a prediction or a confirmation or an updating of an observation. That's the time specific predictive density. So M2 is like, how does M1, which is that one time point, change over two or more time points? So that's M2. M3 is a generative model of action. So this is where we introduce the action layer. And so there's a couple of pieces that play into it. The first thing to notice is that we have this exact same M2. It's literally, it's the exact same image copied out. Here's M2, looks kind of like three letters, there's three prongs pointing down and a little head with a D. Same exact shape here. Okay, so M2, the box is clarifying this is the exact same M2 that we just talked about. M3 is around that. And so how does M3 come into play? Well, it's all about Pi, policy. And so P policy plays into the B matrix, which is how the state estimate changes through time. So our policy is really, if I exercise every day, then I'll be getting healthier in this way. So something like that. And the Pi is, again, only playing into these B matrices. It's not playing into the S state estimation. It's not playing into any A matrix learning. It's a different part of the brain, different part of the model, or so they say, maybe. And the Pi reaches downwards into the B, but also it's getting influenced by a couple of new things up here, which is the baseline prior over policies and the phenotypic outcome preferences, which are C. So C is your preference matrix. E is like your affordances. That's like your baseline affordances of your prior of affordances. Just like endowed from evolution and from learning and development, what are the tools that are at the disposal? And then there's the phenotypic preferences, which are also evolutionary priors that can be updated and they can be contextual. And through G, there's a policy selection. Okay, more math, more math, but it's about D, this internal model of how states and observations are linked, and then we're gonna hook P Pi policy in here through using this G free energy optimizing function. Then G is the interface for where this generative model of implicit metacognition comes into play. And so this estimation of meta confidence, so remember we talked about, well, it's 10% that my team's gonna win the World Series, but I'm not too confident about that estimate. And so implicitly about the past, present, and future, we speak about it in different ways, but we have not just a state estimate about the world, but a precision estimate. Like it was around 70 degrees or it was exactly 70 degrees. I looked at it on a thermometer. And so here gamma is this precision parameter and beta is the prior over precision. So it's because beta is the parameter, I think of this gamma distribution. So the gamma parameter is what is kind of the high to low temperature on G. So it is like the Explore-Exploit trade-off in a sense, but it relates in a little bit more nuanced and surely with a lot of different consequences, it relates to this optimal experimentation versus precision management. Convergence, divergence, expansive retraction and tensile and compressive, tensigurity, synergetics, it relates to a lot of these other areas. All right, so to conclude that section. Yeah, this one is a little long, but it's fun and it was a fun one to learn about. So hopefully we can return to this slide now and say, okay, the affect and the temporal depth are two pillars and now we're gonna build on top of them. We're gonna think about an agent that goes deep and one of its predictions is about its own affect. So here's the figure from the paper that we're reading from the HESP et al. And the reason why I went through it in the previous section was because as you'll see, it's very similar to the previous version. So we have states and the A, likelihood mapping states and observations of the world. Then we have B, the matrix of how states change in the world and then we have U, which is action here and U is as before interfacing only with B. Okay, so that's where we've seen before. But now we're gonna be estimating precision and free energy at each future time point. So we're gonna be considering our action and importantly our precision parameter which is required to calculate the G from the top. We're gonna have basically this initial condition D and then there's gonna be a state that carries over into the next situation. So the next branch down the row, this whole thing is gonna get copied over. So you can have these big graphs, all these recursive functions calling each other basically. But this is the state and then A here is going to be initially just giving this first state. But again, we're gonna be at each time point rolling out in a deep sophisticated way task-specific sophisticated inference as in first in 2020, we're gonna be doing this counterfactual analysis about this deep branching and our precision and our valence about it at each time. Now, here's the illustration of the state space of the task with the four states. So it starts off neutral and then basically there's a transition that's possible. So you can imagine a four by four matrix with the state transition probabilities and the transition matrix that they set up has a ability to go from a neutral to a precise small gain, that's two. It's a good probability of getting a good outcome. Or you can go to this dangerous high gain where it's mapped here. I believe it has the three. So if you land on a one, you start to stay in the neutral position. If you get to a two, you can go to either one or three. Or they have it connected to two, but I don't know if they mean that edge properly. Then if you get to three, you can hop to one, two, three or four. Again, I don't know if they have an edge there as well. And then if you get to four, it's an extremely painful state and it's absorbing. It only leads to itself. And so here it's like a dead end. So at the first time point, let's just say we're starting at T equals one and we're just evaluating, I'm in state one. What could happen? Okay, so here you could go to four. Now you can't on that first one. So maybe the likelihood of outcome four is zero. So you're like, wow, all my outcomes are great. I could either have this two or this three, not that those are the values, but you can at least say at least one road, one down the road, gain of one, gain of two, one step down the road, both my outcomes are good. Then you look one level deeper. And all of a sudden, there's a few things happening. First is this branch is staying very high positive. All the estimates are very positive. But on this branch, you're starting to see that it's possible to get this negative valence. So now your estimates out of these four, you're like, well, it could be pretty good, pretty good, very good, or kind of negative. So now at each time point, not only are you tracking the likelihood of different estimates, but really you're tracking it as like a distribution. And the distribution is about state estimates and uncertainty and the variance, the prior, beta, gamma. Those are the key pieces here. That's why this is so interesting. And so as you could see from this graphical layout, this kind of Markov state transition matrix layout, which we could see maybe in the code, you can see that eventually you're gonna get four. Even if only 1% of the time from three to four, you go to four, if you stay in four, you're always gonna stay in four. It's an absorbing state. And so eventually, if you had a hundred little ants or a hundred particles and they started here, they'd all end up in four if you set up that way and they can't escape. And so here's what we end up seeing. Well, first, before we get what we see, these are just the mathematics. We're not gonna go into it in this one, but they're described there and they're described here and here. And here's what you get when you do the simulation. And so remember that the negative states were absorbing and there was just an increasing number of them. And so four charts all on this time scale from zero to 2000. So the first thing to notice is that the perception of the good versus bad, it starts in the middle. Then it goes high, maybe as it's discovering some of these shallow branches where there's a lot of winds, but then around 500, it starts to really slip down and that's in terms of how many states it's playing out the simulation in, okay? And now, so even though the simulation each round is totally winnable, let's just say, if there was any agency in the situation, by imagining that there's an infinite or a growing number of future outcomes that are negative, it leads one to have bad affect in the way that they modeled it in this model. Another interesting feature is that initially as evidence starts pouring in, precision increases. But then around the same time that the valence drops, the precision drops basically as well. This looks like a crypto just dying, but the precision dropping here, going from a high of around 1.8 to Florida at 0.6, that is showing that there's an increased variance of the estimate. It's like, would you rather know that you're gonna have $10 plus or minus one or $10 plus or minus 40? It's like, wait, I might have negative $30 from this deal. That sounds like very different. Maybe I'll just take 10 plus or minus two. And so that's related to risk tolerance and design of experiments, okay? Let's look at this bottom right and then we'll go to the top right. The bottom right is the threat perception, which is the fraction of imagined events that are negative. So again, at one temporal depth, there's 0% of these outcomes are negative because we haven't gotten there yet. In this first step, none of them can be negative. You can't get any of the bad outcome. But as more and more simulations roll out, there starts to be this converging amount that are negative. Now here, again, they just call it a proof of concept, but they're just showing one parameter combination. So it's not a statement about how a certain natural system is in just any way. It's a framework for thinking about it. But as the threat perception increases and levels off, there's also a stabilization of a uncertain predictive posterior for action. So when things are good, precision is high and the posterior on action is obvious. So it's like very clear, we're getting a lot of big winning states and there's a very low posterior for a lot of negative states. Then as the percentage of threats starts to increase, interestingly, it's actually after it starts to get a little better, but maybe the state space is just branching out even further. As that happens, there's a phase transition in the precision and the valence, which in active inference are very tied up because anxiety, which is a negative emotion, we're saying in this one in excess, is increased by uncertainty. And so basically, this is like a good level of uncertainty. Our posterior on action was pretty, we were pretty clear about which way we were gonna be able to go. But then this is like, even though the green is still at 60%, the teal is kind of at like, so the good ones are not losing. Most of it is actually rewarding, but there's still a small fraction that aren't damaging. But it turns out that by having that level of uncertainty, it generates negative affect, negative valence through this low precision of the model. So that's pretty interesting and this reduced action model precision is interesting too. So that's basically the figures of the paper. It was relatively long, but again, the paper was short, so I hope that you read it. But at the same time, it clearly builds so directly upon the other two work that it just didn't make sense to only read the formalisms from this paper when it was so linked to previous work. All right, so the concluding questions are just, what are frameworks for affect and anticipation? What would a good framework for affect or anticipation enable for us? What does affect encompass other than valence like other than the plus minus? What other features of affect are there? What are some unique predictions and some experiments of this model, whether in general or in the specific situation, but like what is something that this model predicts that we might be able to go out and test or measure or already have the data to know? And then what are the next steps for active inference? How will active inference increase to another level of sophistication or include another group of researchers in the conversation or integrate best or emerging practices from another domain? How can active inference grow from this encounter and serve as a helpful reference point for the other fields as well? And then what is the goal of this research? They were mapping things on to anxiety, but of course it's just a parameter in a model and certainly few authors would say that their models are anything but parameters in models, yet it's so difficult to, especially the multiple senses of anticipatory systems and living systems and affect in different systems, robotic systems, it's at the nexus where it's almost like granting agency to models, but then removing agencies from humans. So having affect in a model, but then not caring about a human affect when they're telling you. So that's sort of a, not of course saying that the authors are doing that or part of that or anything like that. It's just that when we have these affective emotive agents, it's like, yeah, that's us. That's how we should be respecting people. So how can we build on that and take this model, which seems to be really affirmative about human agency and choice at the very least enabling of human agency in a way that other models, again, not that they weren't, but that is different. That's kind of the discussion that we are having and that's what's been so awesome. So thanks for participating, everyone. It was kind of a fun discussion, little bit more than an hour and a half. We will provide follow-up forms to the live participants. So anyone's feedback, suggestions, questions would be super helpful and awesome. Stay in communication with us and yeah, we hope to see you in these last two discussions of 2020 by then the new 7 a.m. to 9 a.m. PST 2021. I'm really looking forward to that. I know a lot of our teammates are as well. So check it out, go to the website, get the updated information. Thanks everyone for listening in live or replay and it was a fun discussion. I hope that you got something out of it. All right, thanks and talk to you later.