 a signal processing group, so I designed my trade. Sorry, actually, I just one thing changed. Could you just restart that, but go for it. Okay, thanks for the introduction then. My name is Berthe Fritz. I'm a professor in Eindhoven at Eindhoven at the University of Technology in the Netherlands. I'll be a part of an electrical engineering department in a signal processing group. So we designed signal processing algorithms. And about six, seven years ago, I read for the first time a paper by Carl Friston, it was called A Rough Guide to the Brain, and struck me that this could be fantastic for signal processing. So since then, I'm really trying to work with people in my lab on realizing agents that will design or automate the design process of signal processing algorithms. And as you know, we're doing this by message passing and we want to talk about it today. Thijs? Yeah, my name is Thijs van der Leug. I'm a post-doc with Berthe Lab. I did my PhD also in active inference on how to automate those processes. And also together with Marco Cox and all the colleagues from our lab, we built a toolbox called FornyLab. And I'll be talking and walking you through how we apply that in an active inference context and do some cool things with that. So that's for later. Yeah, hello everyone. So my name is Dmitry Bagaev. I'm a PhD candidate in bias lab, also in A2E, in the university. And yeah, my work is mostly about reactive message passing based Bayesian inference that we hope will help with active inference as well. But that's also for later. I will talk about it on my time. Yeah, on my slide. Cool, thank you. Okay, shall I go then and do a little introduction, a few slides? Perfect. Let's see, yeah. Okay, let's see if I can share my slide. Yep, looks good. Yeah, okay. So, first slide is, I ain't over. Because you may wonder, where is Eindhoven in relation to Amsterdam? Well, it's about 100 kilometers south of Amsterdam close to the Belgium border and not so far away from Germany either. So it's sort of a high tech city. Phillips originated in Eindhoven on the right bottom. You see a picture of the center and the right top is a view of the campus of Eindhoven. It's a University of Technology. Here's an aerial view of, oops, going too far. Here's an aerial view of our campus. And so here, let's see if I can share a pointer. Yeah, so this is the building for electrical engineering. So this is where we are. ISLAP is short for Bayesian intelligent autonomous systems. That's what we try to build. And we have about three, let's say staff members, faculty members and currently six PhD students. Dimitri is one of those PhD students and we have open positions if there are people watching that are interested in probabilistic programming or how to make active influence work. Then, what are we trying to do? This is a picture that's probably familiar to everybody in this forum, right? This is, this wants to show that, well, the only thing that's really going on in the brain is free energy minimization or expected free energy minimization to do everything. And that's a huge inspiration to us, to us engineers. So what we try to do is basically this. We want to put this in an iPhone or an Raspberry Pi and let a robot learn how to ride a bike. But the beauty of this framework for engineering purpose is that it's almost one solution approach to any problem. So if we can do it, we can teach a robot how to ride a bike by free energy minimization. Probably we can also apply this in virtual reality and design algorithms for hearing aids or even self-driving cars. The big, let's say promise or the attractiveness for engineering is that it's just one, always the same thing. You just have to propose a model and minimize the free energy. No matter what the application is, but it's very appealing. The problem for engineering is that is the energy functional is a function of observations and observations are streaming data, common usually, well, could be at every millisecond. So it's a highly time varying function and the number of latent variables, let's say the space of latent variables is usually very high. So we have a very high dimensional function that is time varying and we wanna minimize that. Now the brain is very good at it, right? The brain has, what is it, 10 to the power of 14 synapses, 100 billion neurons. But a normal optimization library in Matlab or Newton will not cut it. You cannot minimize a time varying function of thousand variables in Matlab. It's not gonna work. So we need something quite radical here and the idea we go with is not that radical. Again, we just take inspiration from the brain, right? The brain is a network of message passing and it turns out in my own field, in signal processing or information theory to be exact, this has been formalized. There's a paper here by Dave Forney in 2001 called some graphs, normal realizations. He called it normal factor graphs. But in honor of David Forney, today they're called Forney-style factor graphs. And so this is the origin of the factor graphs that we're talking about. And a few years later, Hans-Andrea Ludiger at the University in Zurich, I have made this popular in the signal processing community. And already around 2007, you see in his papers these typical structures like this that we will show also later. These are Kalman filters. This is really also what Tristan talks about in these kinds of structures. So this is what we wanna do in our library, just to combine what's active inference from the inspiration from the neuroscience community and combine it with what we know in signal processing and information theory about factor graphs and use these tools to implement this. So today we have two presentations, one by Thais van der Leyen on how to do message passing by Forney lab. Forney lab is the toolbox that we wrote. The name of course refers to Dave Forney. And Thais was sure how to do active inference with this toolbox. Then we're also working on a new version. It's called reactive message passing. And Dimitri is the main person there. And Dimitri also, I'll talk about that a bit. And that's all for me. I'll get it back to you then or maybe Thais can continue. What is your preference? I'll stop sharing my screen. See. Super interesting. Thanks a lot for the context. And it will be awesome to hear from the authors with maybe any reflections on those general points you raised and how it gets applied in a really specific way. Cause you brought up a ton of topics that we talk about all the time, like the simplicity of one unified approach and the challenges of an ongoing optimization problems and drawing inspiration from nature. So great things. I think we can just jump to whichever of the presentation you all prefer. All right. Then I'll go ahead and I'll try to share my screen see if that works. Select screen and entire screen. Okay. Practical screen. I think you can see my screen now. Yep. You can make it look just large on your side and I'll resize it. There we go. Looks great. There we go. All right. Well, thanks again for having us. Can you click away? I'll be giving a little introduction. Sorry, where it says you're sharing your screen you could hide that little jitzy thing. And it's gone. Thank you. So, yeah, I'll be giving a little introduction on how to do message passing with Fornilab in the context of active inference and to give a little bit of motivation for this talk. This is kind of the situation where we're interested in. So we have some kind of an environment and we have an environmental process that's running in that environment. And we want to develop an agent that has some purpose or does some purposeful task within that environment. And the agent is allowed to send actions to the environment and manipulate that environment and it will receive observations. And in the environment process there is some function RT that is running there and it might be simulation that we run and we interact with that or it might be even a real world process of which of course we don't really know what RT is running in the real world. And somehow we want to build an agent that does something in that world. So where do you start with that? There's this paper from Conan and Ashby in the 1970s and they have an interesting theorem that says that every good regulator of a system must be a model of that system. So that actually means that if you want to build an agent that regulates or manipulates our system, the environment, then we have to model that system. So as an engineer, we are in the business of building models, generative models to be precise. And this model represents our belief about how observations follow from our manipulations. We are presented by a function F that is a function of Y, the observations, the controls, U, and also some latent state, hidden variables X that act as intermediate between observations and controls. Now, reasoning forward is one thing, but in the end what we want to do is observe things and then propose controls that lead us to favorable states. And that is where the idea of the free energy principle and active inference comes in. So if we want, so this is the hypothesis that if we want to build an agent that does something purposeful, then we can do that by minimizing a free energy bound on surprise. So we want to build an agent that avoids surprises and by that we can do Bayesian inferences about that environment. So we have to build some kind of free energy functional for the agent to optimize and by that do Bayesian inferences for actions and controls. And we defined this for energy functional as a KL divergence between the generative model that we have and some kind of an approximate posterior that we postulate that we here define QT. And minimizing this energy functional will then allow us to reason backwards from observations towards controls. And that's kind of the general idea. And having postulated this, I have choices to make. So for example, one big choice that I wanna make is how do I choose my model? How do I choose my F in such a way that is useful to me? And also how do I choose my QT? How do I choose the factorization of that such that I can make my inferences? And that process is kind of a thing of trial and error. And as an engineer, I want, of course, to build the best model that I possibly can. But how do I go as quickly as possible through a process of trial and error that gives me the best model? This is where the idea of a model design cycle comes in. And this was, again, made popular by David Bligh in his paper in 2014, built a compute critique repeat. And he proposes this cycle where an engineer proposes a model and then with that model, you want to infer your quantities of interest given data. And if you have inferred a model like that, you want to criticize it and based upon the performance that you have evaluated, you want to reiterate, rebuild your model, infer again and see how well it then performs until you're satisfied and you can apply it in your agent in practice, maybe in a real-world setting. And then the challenge becomes then how do I go through this design cycle as quickly as possible? So we want to be flexible. We want to automate things. And making model proposals itself is something that, well, you have to do as an engineer, you have to come up with a proposal of how you believe the world functions. But once you have that, everything else can be automated. You can infer these quantities by probabilistic programming. You can evaluate different energy as a measure of model performance. And automating this cycle will be the key to making model proposals for an agent that will be useful in practice. So you want to go through this as quickly as possible. So that's why we choose a specific method. We choose factor graphs to represent our factorized models because in a graph you can do manipulations, you can add nodes, delete nodes, you can rewire things very quickly. And once you have a graphical representation of your model, you can do message passing on that graph in order to automate the inference. And you can even evaluate free energy by local contributions in that graph. So that's why we choose a graphical representation and that is why we want to do message passing because we want to design models quickly so that we can design effective agents. Now, how do these graphs kind of work, right? How do you build an effective graph or what does that look like? So we choose a specific representation of our model and this is an example where we have a generative model of five variables, X1 through X5, and we have three factors, F, A, B and C. And here on the picture on the right is the graphical representation of that model. And as you can see, the edges corresponds to the variables and the nodes correspond to the factors. And the edges are connected to the nodes of which the factors are an argument. So here you see node F, A which connects edges X1 and X2. Well, that is because X1 and X2 are arguments of F, A. Same for F, B and F, C. Now, how do we do inference in this graph? So suppose that we observe X5, which is indicated by its little solid square and we want to compute our belief over X2 or given X5. Well, what do you do? Well, you marginalize everything. And by marginalization, I mean you integrate out every variable except the one that you're interested in. So except X2, you integrate everything out and you add this constraint that you observe X5. Now, if you're a big model, then this integration becomes very cumbersome because this here is, you have here four variables over which to integrate. So this room, this integration space that really explodes if you want to do this naively. But you can be smart about this. You can reshuffle these integrals according to their respective terms. So for example, this integration over X5, well, there is only one factor where X5 is an argument of. So I can use the distributive rule to bring this integral inside. So that I divide these integrals up into smaller parts which are manageable. And this is where message passing comes from. Message passing is essentially solving these integrations one by one. So I can first integrate out X1 here, for example, which as we say summarizes all the information inside this orange box. And it gives us a message, a first message here that exits this orange box. And then I can continue for the bottom. So integrate out X5 and I get a second message. And I can use this second message in the computation of the third message. So this way you get a nested solution approach where I use the solution that I have for message two in the computation of message three. And you see that in the end, the multiplication of these two colliding messages on the edge for X2 will give me my proportional belief, my proportional posterior over X2 given what I have observed. So to give a little example of a specific node, how that works, here's an equality constraint node which we use as a constraint to constrain the beliefs to constrain the beliefs upon the adjacent edges to be equal. And what that does is that we will follow if we go through the math here. This is what the node function looks like. So I say, okay, it's a function of three variables, X, Y and Z. And I constrain Z to be X and I constrain Z also to be Y. So then X will also be Y if I constrain those two to be the same. So let's see if we can derive messages based upon this. So if I want to summarize the information into this orange box, I can use the sum product rule for that I have the two messages coming in here from the left and from the bottom, message one and message two. I multiply that by the node function of the equality node and I want to perform this integration. Now I substitute this node function here with the integration and then I can use the sifting property to kind of replace the arguments here of X and Y with the arguments of Z. So what I have essentially done here is saying I can compute this message three by multiplying the two messages that are incoming from the left and from the bottom. Now if you squint your eyes a bit and you look through your eyes you can kind of recognize a proportional base rule in here. So essentially you can say, well if message three here represents my posterior then message one can represent my prior and message two my likelihood and then I have a proportional posterior message three and that I can pass on to the rest of my model. And this node is also often used to kind of combine information that's coming from the left and from the bottom. So that's a very quick introduction to message passing. We have derived lots of rules also from the literature also implemented a lot of rules that are already derived into phony lab which is basically, well software probabilistic programming suites that does the scheduling and this message computation for you. So you don't have to think about that too much how to redistribute these integrals and how to derive a lot of updates for specific nodes. So that's kind of the things that we can automate on that side of the design cycle. Now any questions about this because I'm going through this very quickly and what I'm gonna do next is kind of wanna walk through an example. So maybe this is kind of a natural point where people have questions that they can kind of stop me and think, hey, you're going too fast here. I think it'll be good to see the example and then we have some questions in the live chat and also anyone's welcome to add more. All right, sure, sure. So how do you do this in practice in an active inference context? So this is a little example it's called the Bayesian thermostat where we have an environment of environment. So where we have heat source here on the left and we have a little car that can move around it can move away from the heat source or it can move towards the heat source. Now the position relative to the heat source is what we call X and at every position there is a specific temperature and the temperature that we measure in a specific position is what we call Y and we have a preference of being at a temperature of four Y four, I don't know, it's something we choose. And we control this car by moving it left or right and our control, this is the velocity that we move left or right with and it's called U. So this is our environment and we want to build a model for that. So in active inference, what we want to do is we want to reason from states where we would like to be towards controls that we have to apply at this current time in order to get there in the future. So what we want to do is we want to reason forward from the current time, time T, up into some time horizon, big T and in the future we want to consider how the agent will move and where we want that agent to be at that time and in the end we will want to be at a temperature of four. And it is a space model that represents our belief for how this agent will move itself through the world. So at time T we have an observation coming in of a specific temperature and maybe we have already made an action and we say that the current state XT relates to the previous state in the action that we have by adding so basically UT here is a velocity that we add to our current position in one time step and then we get kind of our next position by just adding that and also applying some Gaussian noise. So we have a state transition that is additive with UT saying I want to move forward to the left or forward to the right and then we add some Gaussian noise saying, okay, we're kind of uncertain about how this agent might move in the environment. We have an idea about that but let's add some variance in order to account for some uncertainties. And then we say here with this vertical line that if we have an observation or if we have a position then it relates to an observation of the temperature by, well, minus one is kind of a rough guess. It's just saying, well, if I move to the right, if I move away from my heat source then my temperature will decline with a slope of minus one. And this is kind of a really rough guess. In the real world, there will be a very nice temperature gradient that maybe will move very slowly downwards. But we're just saying, hey, we don't really know what that is. Let's make a rough guess. We have a belief. This is our generative model, our belief about the world. Let's just say the temperature decreases with position. And again, we're not very certain about that. So let's again, add some Gaussian noise. And now I extend this into the future. So I say actually, well, I believe that in the future my environment will also evolve according to this. But I also have some ideas about where I want to be. And this is where gold priors come in that constrain my future observations of the temperature to be around this desired temperature of four. And I'm saying then here with some added noise, well, I want to be around four, right? It doesn't have to be exactly, but I want to be pretty certain that I will be around four at time t plus one. And I will continue to remain there until some time in the future. Now we have a generative model. And we can define our energy functional and we can solve this energy functional by message passing on this graph. And we pass these messages towards our next control because we're interested in inferring what we have to do next, like from next time. And this is where we can do message passing. We can summarize all the information that we have from the past. So this will be an estimate of the current state of the agent and it will be biased by our beliefs about the future saying where we want to be. And that will influence or determine which control that we will take at the present time or at the next time. All the details of this are described in this paper from 2019 by Bert and me. And there, you can also see kind of how we apply that within the action perception loop and things like that. Because you will have to do this at every time t because your state estimate will change, the world will change according to your actions. So you will, for every time, you will have to recompute a new action for what you're going to do next. And this is kind of the main idea. Now, of course, we don't want to do everything of this by hand. So what we want to do is, well, we want to use a tool for that. So now I'm going to show a little demo of how you would implement this using the for me lab program tool. So let's see if I can open a little demo. So this is readable, actually, or should I zoom in more? You can zoom in a little bit more. Maybe like this. Yeah, I can resize it. That's pretty good. Oh, that's perfect. That's pretty good. Yep, thank you. So this is, so it's kind of set what we want to do. So this is kind of our definition of the environment itself. So this is the real world where we have this temperature gradient, which is a very nice function. So a very smooth function where at the heat source, we have temperature of 100 and it goes down with position from the heat source. So this is a real world. We don't really know this. So this is the one that we're going to approximate this very rough minus one. And then we'll see whether that will work or not. Now, this is my wonderful ASCII art that has now been resized of basically the model that I've just shown in the slides. And this here is our model definition. So actually I have to zoom out a bit to make it more readable. Yeah. So we want to build a graph for this, right? So here we have our observation states, controls. And we define a prior belief about T minus one. So where have we been? Which we're going to say is a Gaussian and with some mean and some variance. And then for every time into the future from one into the horizon, we'll have a prior belief about controls, which basically says, well, what am I allowed to do? And I say, okay, well, I'll have a Gaussian prior about controls with the mean of zero, I'll substitute that there and then maybe some variance. And here I say, well, I have my position of the agent. It relates to my controls by just this addition with some added position gamma. So this is our transition model. This is horizontal line. And this is our observation model, which is very rough estimate minus one of the temperature gradient times current state and also some precision, observation precision. And this represents my goal prior. So note that I haven't really put in any specific values here yet. Just have placeholders for the actual statistics and values that I'm going to put in. Because the actual derivation of my algorithm, it doesn't really depend upon the statistics. It's just something that we can put in later. So this is what I want to do, build a free energy functional. And then for any lab, that's actually just basically four lines of code. So this posterior factorization that defines my factorization for Q, saying that I'm just going to have a joint variational distribution over the entire graph. I don't really subdivide it into a structured factorization. Some people are familiar with that. But I just say, okay, I'll just have this posterior factorization of the variational distribution to be my entire graph. And then with just one command, I can derive a message passing algorithm that propagates all the messages towards control U2 which is the control for the next time step, T plus one. And I want to evaluate my free energy. So also give me at the same time an algorithm that I can execute in order to evaluate my free energy. And this line will convert this to source code that I can then load in my environment. And what this does is it builds a message passing algorithm for us. And so this is the code that executes the message passing on our graph. And here you can see that it computes all these messages by some product rules and other rules. There we have pre-derived and we have implemented in Fornilab. And you can see that these messages, they depend upon the statistics. So this is, for example, the mean of our prior overstates and variance of a prior overstates. And we have other messages that depend upon previous messages in the way that I explained message passing. And this builds an entire list of, in this case, 26 messages. And in the end, we're interested in the marginal beliefs. So the posterior marginal beliefs about, for example, our controls. And you get these beliefs when multiplying messages together. So for example, to get a new belief over the following control, you multiply messages nine and 25. And you also recognize this from little introduction, message passing introduction, where you have co-writing messages on the edge that you multiply in order to get your posterior beliefs. And that's what we return in the end. That's what we're interested in. So this is the execution of that algorithm. And notice that we've made an entire schedule here. And that can be very cumbersome. It can be very long, very long schedule. So that makes it expensive maybe to load and to execute. And Dimitri has a solution for that. And he'll explain that in a minute. So I'm very happy about building schedules. Dimitri, not so much. He tries to find a solution for them, get rid of them. Because in the end, that will be the fastest to really don't have a schedule that not really had recipe, but just cook by whatever is given to you. So now we execute this algorithm in practice. So this is our action perception loop for every time step. We want to act, we want to evoke an action from the agents. We want to execute that action in the environment. So this is sending the action towards the actual real world or kind of simulated world in this case. And what we get from that is an observation. And from our action and observation, we want to infer our new action and also return the variational free energy. And then prepare for the next step. So in every step, we propose an action to the environment, action gets executed, we observe new outcome, we infer new action and we evaluate how well we did with that action. And then you get, well, nice plots like this. So this is the velocity of our agents. We start at zero and you can see that it moves very quickly to the right. And then it also moves back a little bit until the end comes to a stationary point, more or less. And you can see actually here what it does. So it starts out at a temperature of 20. Then it moves very quickly to the right so away from the heat source. It overshoots its mark and then it actually goes back and then it settles around this desired temperature for that we have encoded by our goal priors. So why does it overshoot? Well, our model of the environment wasn't perfect, right? We had this very rough estimate of minus one and in the real world was this very nice, smooth, kind of complicated function that has this kind of bell shape. So even though our generative model of the environment wasn't perfect, in the end we pay for that by overshooting a little bit, but still we were able to get to our desired state at a temperature of four. So in the end, it kind of worked pretty well. And you can also see that the free energy of the agent also decreases with time. So we start off at around a thousand, so the scale is logarithmic here, and it decreases very quickly to our some value that is rather low, right? And then we have some noise that induces some unexpected changes or some unexpected surprises. So that's why you have these little ripples here. But at the end, we're minimizing free energy and it's decreasing drastically. So we actually have this free energy minimizing agent. And yeah, that's basically all I have, the story and the demo. That's awesome. Thank you very much. We can, with screen share on or not, ask a few of the questions from the live chat and also give people a few seconds to type more. So I'm just gonna jump into the questions. First of all, let me stop my screen share. Great. Are we still sharing the screen story? We see just the Jitsi, so just click the screen share button again in Jitsi. So, awesome, thank you. So the first question is from John and John writes, how does the factor graph approach work if the graph structure is not known beforehand is structure learning possible? Yeah, so that's a good question. So is structure learning possible in graphs like these? So that is actually something that is still an active field of research. That would be awesome if you can automate also the structure learning of the graph itself. But there are a lot of challenges there. So how do you parameterize this graph structure? And how do you do that actually? What is your search space in terms of graphs? How do you learn node functions if you don't have a node function given for your graph? So these are all very difficult questions and you don't have a straightforward answer. So that's why this actual model design step in the design cycle itself is still kind of a creative process. There is an engineer that has to come up with a model and as an engineer, you also have to think about how to adjust that model if you're not satisfied with it. So if your energy is still high after you run your agent, well, something is going on, something is wrong with your model. It's not an accurate representation of what is the environment and then you might adjust that. How do you adjust that? Well, currently it's still just trial and error. So maybe you can say, okay, I think here's still something that I can improve. Maybe remove a node, maybe the model is too complex. But structure learning in itself is, it is something that is also, it can be addressed. So there are methods like nested model comparison and things like that that you can do in Factor Graph. So you can, for example, if you have a nested model, compute savage decorates here and then kind of prune your model based upon that. And so those are things that you can do, define a complex model and see which you can cut away. But adding to that model is still difficult. So once you've cut it away, well, how do you add? I don't really know. So that's a good question. And I don't have a very straightforward answer as you can hear from my rattling. So yeah. Perhaps I can say something about that. I mean, I totally agree with Thais. I mean, generally, when you design a system or a signal processing system, you have to design the structure. I have to estimate the parameters and then you have to infer states. And the states change very fast. It leads to a common filter and that's very well known in factor graphs. Then the next stage would be, can we also over the long-term learn parameters? And our factor graphs can do that pretty well. Now the next stage in biology, I mean, we know the structure is also learned over even longer times, right? And so that's not working at the moment in formula. You can compare like Thais says, if you have two examples, you can compare free energy and just pick the best one. But we are working on that. And I mean, we're taking again the lead by from the Carl's idea on Bayesian model reduction, but how to implement that in factor graph is a research project. So we're working on that. And I'm not sure where that will end, but that's our goal. So did you have Bayesian position available, Bert? So... Yeah. Also to bring it back to the example of the motor car getting to an optimal temperature, it's like if every hour you notice that there was a bump in the free energy, then you go back to the drawing board and you go to the people who are on that car in that area and ask, well, what happens every hour? Or maybe we do need to include this other source of information, but the model that you had does its job. So you could include the wallpaper and other features of the room, but that's kind of this art and science of engineering. And that's why it's always interesting to work because often in research, it's like, let's finish the analysis. But really there's this cycle that the entire model is embedded within that helps us always keep an eye out for those patterns that are precluding being captured by our current models. Here's a... Yes, I will put it. Here's a second question from the chat. How does the factor graph approach work if the goal temperature in this example is not defined beforehand, can the goal be abstract? So kind of a similar question, but rather than structure learning, how does preference and goal orientation arise? Again, excellent question. So that kind of relates to the question, well, who sets the goals? Or how did you set goals? So yeah, the play answer to that is, well, it's an engineer that defines that, but it can also be a higher level agent that does inference and sets those goals. So then you get to kind of turtles all the way down argument in the sense, well, you can put layer upon layer upon layer upon layer and layers set goals and other layers set goals for that layer, et cetera, et cetera. So how that will work in practice, yeah, that's still again, active research area. But in the end, it should come down to minimizing free energy. So that's kind of the central theme of this. We want to minimize free energy. We do it by perception, we do it by model learning, but also learning of the goals should be driven by minimizing free energy. So you should choose the goals to minimize free energy. Well, how do you do that? I don't know if you should tell me, right? Yeah. One more question that was asked was, how or does this integrate with graph databases or with large data sets? Is this something that Scalesfer has already integrated to work with that kind of empirical data or what? I think it depends on what you mean by large data sets. So when I hear large data sets, I usually think about big data, things like that. So this toolbox is specifically built for dynamic modeling. All right, so other than the sense that you have a lot of features in your data, but it's signal processing. So data might come in very quickly and you have a model of how these data change over time. So in that sense, you want to do processing quickly. So you want velocity. It's based on processing with velocity, not processing with volume, as for example, other programming toolboxes are excellent at, like Bytorch and things like that. It's sampling based toolboxes, AdWord, all grade toolboxes if you have volume in terms of data and you have, for example, an IID model that you want to fit. In our sense, if we talk about large data, we talk about volume, so data that come in very quickly and we try to be good at that. PhonyLab is not yet that good at it, to be honest, but the reactive message passing is going to be the bomb at those kind of data sets. Yes, there's also a bit of that what we really try to build this toolbox to build active inference agents and the idea that you just put these agents in an environment, they make actions, they select their own data. So we don't really try to build a toolbox that is really good at just doing machine learning for a fixed set database, but rather we try to build models that, let's say dynamic models that can adapt and that can make actions and that can be a real-time process streaming data. And in that sense, over time, if you wait long enough, yeah, there's a lot of data streaming through it, but it was sort of generated by, or at least affected by the agent itself, right? We don't try to optimize for big fixed databases, rather for just streaming data in an environment. It's a really fascinating point because there's a lot of time-dependent tasks like autonomous driving, where it's like, okay, here's the 500 terabytes of video, or here's this, now give me the best possible score in a time snapshot, and it's every single time with an active inference agent, we're always including action and policy selection in the loop. And so how are we going to deal with that? But that's why the inspiration from biological systems becomes so important. It's kind of like when people will go, oh, well, the brain must have this processing power. And then if it were a light bulb, or if it was this kind of a silicon computer, it would take up this much energy. Ergo's efficiency is such and such, but it's predicated upon the kind of computers that we're used to seeing instead of kind of starting with what already exists and then looking for an answer that rests in that, rather than how do we make lower power transistors so that we can fit them into the brain? So, awesome topics. I think we're ready for the next presentation. Yeah, hello, everyone. Let me also share my screen for a second. I hope it should work now. Looks good. So, can you see my slides now? Yes. So, okay. So, yeah, hello again. My name is Mitri Bogaev and I want to introduce you like this new reactive message-passing-based framework for Bayesian inference in Juulet that we call Reactive MP. It's more like our future vision on solutions for active inference. Fornulab.jl is like more like mature framework. It works and Reactive MP is our research project for now. And we are going to start with a question what is actually reactive message-passing and in most senses it's the same message-passing as in Fornulab, but re-implemented in Reactive Paradigm. So, basically, the major issue with traditional approach as in Fornulab is that in order to run inference for our model, we need to create this shadow of messages in advance. So, we need to pre-analyze our graph. We need to pre-analyze everything, basically. And we need to do it every time we change our graph structure. And if graph is big, this graph analysis time and shadow creation may take a lot of extra time. And it might be not a big issue, but as an engineer, you may want to test a lot of models until you're satisfied with the performance and this extra delays and shadow creation times, they may be just a bit of annoying. And reactive message-passing allows us to eliminate this predefined shadow and also it gives us a lot of other benefits and possible future research directions. So, instead of having a fixed shadow, we cast our graph to an event-based system where everything can react on its neighbor environment and in terms of message-passing, nodes react on incoming messages and also it reacts on updated posterior marginals over our parameters. And the whole model becomes reactive. We can also react on changes in this system and basically do whatever we want. So, we may perform some action based on new updated posterior marginals. It also, in message-passing, natural starting point is our data or observations. And the model itself reacts on changes in our data or in our priors and it also changes posterior accordingly. So, here I outlined some extra benefits from the active message-passing implementation. So, first of all, we may outline biological plausibility because in nature, we probably don't have any predefined scheduler for our information flowing on neurons in our brain, for example, they are driven by some chemistry, by physics. And in a sense, it's also a reactive system. It constantly adapts the changes in an environment and also it reacts only if needed. And in terms of message-passing-based inference, it may be unnecessary to even react in some events, like some messages. If they aren't really important, a fixed predefined shadow, you're forced to do all computations even if you don't really need them. Another great benefit of reactive message-passing is that it scales very nicely for very big graph. So, it's more like an answer for previous question. So, it supports like hundreds of thousands of factor nodes, even millions are possible. I will show you an example in a couple of minutes, but just to give you a bit of context. So, in this plot, we modeled just some moving object with known linear dynamics. And the idea was to estimate seeding states given noisy observations. In this example, I used Kalman-Smulfer with 50,000 observations. And this model contains roughly about 150,000 factor nodes. So, Bayesian inference on this kind of a model is impossible on sampling-based inference. But with reactive message-passing, it takes only about eight seconds on just a home MacBook laptop. So, we can go further a bit and make reactive system robust and tolerant to some, let's say, failing nodes or missing data from, let's say, failed sensor. We fixed predefined shadow. If something like this happens, we need to do everything from scratch. We need to recreate our model. We need to, again, make a shadow and it will just take some time. With reactive message-passing, we may just stop reacting on some missing data or maybe failing sensor and we can just wait for them to be available again. It makes it very robust in that sense. It also gives us an opportunity to change our graph structure in runtime and still perform inference without stopping. So, there's a slide about how we actually do that. We use Julia's main programming language in our lab roughly about a year and a half, I believe, ago. We wrote a library for reactive programming in Julia. So, it's completely unrelated to message-passing and just a general framework for event-based systems, but that allowed us at the later point to build reactive MP.jl package. And it implements free energy minimization by message-passing. We also introduced GraphPPL.jl package that is a high-level and user-friendly probabilistic model specification language that we use in our demos. Yeah, here, I want to show you an example just in a second. So, you can see my screen, right? Yep, maybe one little size larger or two. I hope it will not break. Yeah, make it a little larger than if it looks unexpected, you can pull back. Okay, here we go. Better? Yep, thank you. So, here is our example. So, assume we have some moving object, it has some hidden states, and just assume that we know it's linear dynamics for simplicity. And we don't have a direct access to its location, but we have a noisy observation of this moving object. And we wish to estimate the true location of this moving object by only observing its noisy measurements. And we may use a linear multivariate Gaussian space-based model. Like equations looks like this. These are equivalent mutations, they're the same. And basically here we say that, okay, we have a state x a time step k, and it depends only on the previous time step through some linear operator A. And we also have a transition noise, which is a Gaussian with covariance matrix B. And our observations are basically modeled also as a Gaussian with covariance matrix Q. And basically this is our model. And we can simply create a factor graph out of it. And as you can see, that our model specification resembles very closely to the equations defined above. So here we have our state xk, and it's modeled as like a Gaussian over previous state with some, let's say no one covariance matrices. And this is how we build our model. And under the hood, this code, it generates factor graph. And we can later on use reactive MP.jl API to estimate here in the states of our system. But here's our example. It looks a bit off, probably because I zoomed in a bit, but it's fine I believe. And in this example, I perform the Kalman-Smuffer for 500 points. And I just decompose the trajectory of this moving object into two axes, x axis and y axis. And we can see that Kalman-Smuffer predicted like the real hidden states of our system correctly. Even though we have a lot of noise in our system. So these blue dots, it's like our noisy observations. And we can go further with this. So because our system is reactive, we can estimate our states in real time. So here's our example. Let me try to run it. I hope it looks smoothly. On my computer, it's actually very smooth. But I know that maybe recording is not that smooth, like screen sharing. But here in this example, what we see is that, okay, we have an infinite data stream of a moving object. That the blue is a real one and the red is an estimated one. And we can see that, yeah, let me reload it because it's kind of off. And we can see that reactive MP is able to perform and Bayesian inference in real time. And it actually adapts the changes in the environment and also changes the posterior over the current estimated state. And we can go even further. You probably noticed this orange line here, just inactive for now, but we can also incorporate prediction for this model. So we can just extend a little bit our graph. And we can also predict future state of our system like this. So now in this example, I do Bayesian inference in real time and I also predict the future state of my system like an orange. And you can see that prediction also adapts to the new observations. And yeah, it changes basically its future beliefs over future states. Yeah, let me stop it and continue with a presentation. So yeah, I'm going to talk about like when reactive message passing is available. Actually it's already working. We have a fully working stable back-end and API for exact and variational Bayesian inference. We also support expectation propagation but that API is not yet stable. So it may change. We support for now only conjugate models from exponential family. We support extra constraints for variational optimization procedure such that form or factorization constraint. So rocket.jl library, it naturally supports infinite data streams, for example, from an internet or from some sensor. And framework itself is able to handle missing data but that API is also not yet stable. Here, some of our future plans for this platform. So we want to extend it to support non-conjugate models as well. Basically foreign lab already supports it. We just need to carefully port all of the available existing functionality from foreign lab to reactive MP. And we also want to integrate with other probabilistic programming libraries that exists in Julia community. Reactive message passing gives us an opportunity to try to integrate parallel inference and react in different parts of the graph simultaneously using like multi-core capabilities of our CPUs. We want to extend graph PPL to support model or model specification. For now, it's unfortunately not possible. Also reactive message passing naturally gives us an opportunity to run inference where different data streams have different update rates. That's what I was talking about in the very beginning about robustness, but it's also interesting how far can we push it and maybe it's possible to automatically adapt models, graph structure in runtime based on free energy. But it's like a research project. We don't really know if it's possible or not for now. We have some ideas about interactive visualization for message passing based algorithms and free energy minimization visualization. And eventually we are planning to release this new platform as a stable version of foreign lab, possibly in 2.0. And that basically is everything I wanted to show you in my slides and thank you for your attention. I would be happy to answer your questions. Awesome. First question was, is there a paper on this reactivemp.jl or how does that work? There is, but it's in progress. So it's not publicly available, but we are working on that. So there will be eventually a paper about all of this approach. I will describe everything, how it works and how we use it. So yeah. I mean, of course, if anybody writes as an email who wants to put an early copy of the paper I would be happy to share it. If it stays confidential, I would be happy to share it. It's very interesting development. Maybe to the other authors, like what other epochs of foreign lab were there? How did we get to this reactive programming paradigm? And also it seems like you did a lot of the foundational work in the lab on the reactive programming. Was that just in the Julia implementation or in the more conceptual grounding? Yeah, we did actually really a lot of work to support reactive programming in Julia. Basically, Julia is like a young language and it has no good capabilities to run reactive based systems. So basically we build it from scratch. Yeah, but we used a lot of, like let's say, ideas from other programming languages that we had experience with. I mean, I know that the Active Influence community, not a lot, not a lot is fantastic. But for really, if you want real time processing of streaming data, then Julia is a very good, is a better option. Because Julia is, almost has the syntax of Matlab from a user's viewpoint. But out of the box, it's almost as fast as C. And so it's a better combination for if you are in engineering and you actually want to build systems that run in real time, you want to make demos. And Julia is a better one. And also Dimitri also makes use of some really advanced stuff in Julia. It's called multiple dispatch, but it doesn't matter what it is, but it's quite advanced and it's not available in Matlab, but it's extremely useful for what we do. Or do an import tool box. It seems to be, if you look at a graph and message passing, it seems to be like, oh, this is actually not so hard to implement. But there's a reason why there are extremely few toolboxes for factor graphs, right? Microsoft has built one, infer.net, and it's great. But it's not really a real time toolbox. And so as far as we know, we are, there's not much competition at the moment for us, not that our toolbox is very advanced yet, but I don't see a lot of people working on this. I see a lot of people working on multi-color sampling, but multi-color sampling will not work for the sizes of systems and for the real time data streams that we wanna do with active inference, right? We are working with systems where we wanna make actions and influence our data. So this is a real time system. And so we have to go to, we wanna actually scale up. We have to go to build a really high quality professional toolbox that automates message passing. I hope this will be one of those toolboxes and hopefully there will also be other toolboxes, but yeah, that's why we're doing this. Very interesting answer. How has the dialogue between the math and programming and computer science side and then the active inference side been, like what has each side kind of contributed because it sounds like some of those real time insights of active inference are kind of propagating back to the design of algorithms and then we see both directions. So how has that played out maybe for any of each or each of you? I can also answer maybe this question. From my point of view, I mostly like doing programming, but I still learn something new every time because we're like in this constant dialogue with math guys in our lab, let's say, and I learned something new every day. And of course it reflects like our design choices for our software as well. So it's kind of, but sometimes it's very hard to fit this high level mathematical ideas to the actual implementation. And also it's very hard to make it efficient. Yeah, we are not, we're not done with this. I mean, the toolbox is available for anybody and all the nice tutorial examples, the Benzian thermostat, the tic-tac-toe, and it all works very smooth. Our hope is that with reactive message passing, we can actually get a path to scaling this up to serious applications. And that's, I mean, we're an engineering group. We want to build systems that really do something useful. I also work for a hearing aid company. So I want to build a real-time audio processing algorithms and others may want to use this for robotics. So yeah, and that's not gonna work in real-time math lab. It's not going to work. Yeah, in with all the multi-colonial sampling. So this is our effort. And it's quite, it takes a long time because it's very multidisciplinary efforts, right? We have in my group, we have mathematicians and neuroscientists and computer scientists because it's difficult. There's a lot of different expertises that you need to build a good toolbox for active inference. I think because it's so difficult, there is a lot of cross-facilisation between the active inference community and the engineering that we do. The community is mostly interested in explaining biological systems, right? They see energy principle and active inference as a model for biology. And I think that's very interesting because well, nature has a certain way of doing things efficiently. And if this is a model for that, then it's also a good idea to take that to engineering. And that's kind of where we come in. So we kind of take the ideas that are available in the active inference community and the way they explain brains and think about brains and think, hey, how can we take these ideas and use them to build an engineering system? And I think that's kind of the main interaction between it. And hopefully then we build tools that the community can also use then in their research eventually. What that made me think about was it's really a reframing of some of the main challenges, like seeing signal processing as a real-time event or seeing the causal relationships between action and future data, all these kinds of reframings of the problem. It's not like the active inference algorithms take massive matrix calculations to do per se. They can be very simple or not, but it's actually that reframing and the embedding of action within every step that then ends up solving the scaling, some of the scaling challenges, some of the resilience challenges because resilience of a transportation network in a city, they would do sampling or they would take something that's an unfolding dynamic process and then try to roll out a million iterations of it statically. So again, it's not that the calculations inside of the loop have to be challenging, it's just a reframing of prioritizing action ended up going down these roads of scaling and real-time capacity. Yeah, I like that thought. Perhaps I can say something about it in the context of what I work on because I work on designing, basically I'm a hearing aid engineer, I work on. And if you ask a hearing aid engineer, signal processing engineer, so what is your task? Well, our task is to build the best hearing aid algorithm. And so then what happens is that a hearing aid client goes to the store and buys a hearing aid and usually it's very happy and goes out and then two weeks later sits in the restaurant and cannot understand her conversation partner because there's noise and this wasn't expected and there's nothing she can do because she can't ask a hearing aid client to fill with hearing aid parameters. So this happens a few times then she throws or she puts her hearing aid in the drawer and this indeed about 10, 20% of hearing aids end up in the drawer. They're very expensive and 10, 20% ends up not being used or really sad statistic. So rather you can turn it around and say, okay, what is the real hearing aid design problem? The real hearing aid design problem is send somebody out with any hearing aid but what do you do when she's unhappy in the field at in the restaurant? And what we want to build is an agent that will then, so she just slaps her wrist, I'm happy. And now this agent needs to make an action and give her new parameter settings that are the most interesting for her and in the best compromise between information seeking and gold driven, gold driven being or making her happy. So hearing aid design from my viewpoint now is just build an agent that will make actions, make hearing aid proposals when she's unhappy. And if she, right, she's unhappy and then hearing an extra proposal says, no, that's not good. Well, the agent gives another proposal and then she says, okay, that's better. And then we move on and maybe a week later the same event and this goes on continually but there is at least a procedure to keep moving on to keep improving over time, right? So that's the real thing of design is then the action in the field in situ which is very different from what's currently happening which is hearing aid engineers sitting at the desk with mob lab, not in the environment. So that's a, it's a paradigm shift and active inference agents can maybe make that happen and could really, I think, create a pivoting point for something like signal processing design which is, you know, it's not something that you think about when you're a neuroscientist but this could really mean something big for engineering and not just for signal processing but also in different engineering disciplines and so that's why I think over time you'll see more and more people coming from different fields getting an interest in this active inference and the free energy principle, even the non neuroscientists, but engineers, yeah. Cool, we hope so. And that really reminds me of the pragmatic turn which is it's a little bit like a horseshoe theory with the engineers and the philosophers often talking about pragmatism in different ways, different communities, different tools and now there's a way to kind of close that gap or at least map across it where we can embed some of those insights about really inactivism, the role of design science, anticipatory design science with take it or leave it philosophy or there's the toolkit to develop. Maybe it's interesting five years after working with a toolkit or maybe five years of philosophy and then you're curious about the toolkit. So one question was how do these graphs account for temporally deep models? How are those specified or what is different about the graph? Because it's just calculating the next action to take in the examples that you provided. Maybe as you can show your graph again, how the future is represented. And how do you add temporal thickness to that, for example? That's kind of, so if you have a graph like share screen, yes. If you have a graph like this, then how do you add temporal thickness, for example? That's just about adding layers. So what is shown here is one layer and this layer, it acts at a certain clock time. Observations come in discreetly, but you might have a layer above this that is somehow connected to this layer below that acts at a higher temporal time scale. So it evolves slower, but it regulates or influences the parameters of, for example, a transition model here or an observation model here. So you get a time-varying observation model that is influenced by a slower evolving layer. On top of that, you can get an even slower evolving layer, right? And that's the way you can build hierarchies. And actually there's also a PhD student in our group, Ismael Senios, who has done some very interesting work on that and he has investigated how you make hierarchical Gaussian filters. So hierarchical Gaussian systems where you have one top layer influencing the variance of the transition model of the layer below. And you can see that you can model very natural signals with that. So in nature, you have signals that are time-varying, that the statistics there are time-varying and the statistics of those models are time-varying. So a hierarchical model in that sense can be very useful for that. And of course, that's, yeah, of course, also how our brains are structured. So again, there you have this inspiration from nature on how to structure the models that we try to build. And with vector graphs, at least in theory, that should be pretty easy because you can just connect them, you can connect layer on top of this and see how it behaves. And eventually, the complexity of your model will become very high and you will get penalized for that automatically. So there will be a cut-off point given your data. That will be the optimal amount of layers that you need in order to explain away all the variance that you observe. So that's kind of how we think about, yeah, temporal thickness and time dependence and dynamical models like this. Then can I share my screen also? Yep, sure. Okay, let's see. Because this... Awesome. Yeah, so this is a graph from a paper two years back in Frontiers. It's called I think deep temporal models in sped-factor graphs or something. And here you see a three-layer system. I mean, don't worry about the details now, but here you see at the top layer, just basically has one section, one time step. And in that time step, at the top layer, you have two observations or two steps at the middle layer. I'm not sure if you can see my... Yeah, we see it, we see it. Okay, and then again for the, let's say the third layer, whenever the middle layer takes one step, we have two steps here. So there is sort of a finer granularity at each layer. And so we can build hierarchical models, really. And so we're doing that also and we're also, yeah, if you look at some of the papers on our website, you'll find a lot of papers on the hierarchical Gaussian filter from Chris Matisse. We've implemented that in... But also, I mean, you can also do it for active inference, things like that. We'll look like this. Thomas Parr has also graphs like this in some of his papers. Very interesting, and we see a lot of the same variables, like G, D, B, like we can, it's kind of a different representation of active inference. So this is kind of also going back, I think, to the earlier Frontier's paper. What are some of the big equivalencies or kind of like airtight mappings from and to 40 factor graphs? Because there's probably some broad areas of application that could almost be hot swapped for this underlying it, perhaps. So what are the equivalencies with just Bayesian graphs in general versus factor graphs versus other topics at that level? I think a foreign style factor graph or a bipartite graph or a Bayesian network, they aren't really different. They can be used as, they're just a representation of a factorized model. And the way you represent that can have an impact about how you think about those models. So for example, if you choose a foreign style factor graph representation, it's very suited for signal processing because you kind of can see these messages as signals that flow over your graph. So for us as engineers, this is a very intuitive representation. In Bayesian networks, it's a bit more, the model representation itself is more compact in that sense that you only have variable nodes and you see how they relate. So it gives you a good idea of model structure. So it can be very nice to have a quick overview of a model like that. And then you have, for example, a bipartite graph which also shows a relation between variables as an additional kind of factor nodes in between. And it gives you a bit more granularity on how these are connected and gives you room to talk about, well, what is the relationship between these variables? And so in the end, they're equivalent. You can take one model and represent it in three different ways, maybe even more. But I think it does impact the way how you think or how you think about these models. So that usually comes up when writing a paper and then you have to think, okay, what is actually the best representation for my idea? Sometimes it's a bipartite graph. Sometimes it's a Bayesian network. Sometimes it's for install factor graph. It just depends upon your story. Yeah, hopefully I can also share a question. I do find though, in my experience, when we think about systems and a whole set of equations, if you read that in a paper, or if you write it down, it gives more insight if you draw to me, it gives more insight if I draw the graph. And there must be an exact correspondence between the graph and the equations. And very often if you draw the graph and then you write down the equations, there's not an exact correspondence. So you learn from it. You basically, there must be an exact correspondence. And basically you often find out you have an error in your equations or in your graph. But I think like the end goal, or the end goal is to make a toolbox, a toolbox for, let's say for the community to design their own active inference agents. And what I envision there is something like simulink, I'm not sure, or a lab view. I'm not sure if people are familiar with that. But these are graphical models. You want to actually also define them graphically, right? And you want to have a palette with nodes and just draw your graph and say, this is my model and now go run. You don't want to worry about the inference. The inference is under the hood. Depends on passing should be done by the designers of the toolbox, but you should just be thinking about your graphical structure. That's the, and just run it and connect it. With your mouse to your microphone of the computer or to the camera and maybe to, you know, there's also connectors in the toolbox for robots and then it should just go, right? That's the- You're going to be the Bob Ross of graphs. It's a little bit- Yeah, that's how you want to design, I think, right? You just want to draw the brain and at least the generated model of the brain and just let it go. And don't worry about how inference takes place. What you just said there about viewing the equation and then maybe going from the pen and paper to the programming language to the graphical representation and then kind of cross-checking. It's like when you reverse translate languages and when you lock in where two words map back to each other, you know, you've made a map, but if you're on this infinite loop, you're lost in the word space. So when you move across sectors like that across modalities with the analytical, the simulations on the computer and then the graphical, it kind of embeds action in what otherwise might be seen as a knowledge product, a product of inference and then a final action. And that's a similar fallacy to train the dynamic process in a snapshot and then expect that it's going to work in real-time. So it's like embedding this real-time flow in the production of knowledge. Yeah, yes, yes. She's very important. Yeah, yeah. Yeah, we're implementing active inference while we are trying to design this stuff, that's true. Yeah, yeah. One general question was just about factorization. So the factorization, the way we go about it is starting with our intuition and looking for residuals or how of all the ways to factorize a model, do we find one that works? Yeah, there's two kinds of factorizations. There's a factorization in, let's say, in the generative model, right? That's the graph that we draw. And so, I mean, let's say, most people or the most common models are these Markov models, right? Where you kind of retain a current state that summarizes everything that happens in the past. And you use that current state to basically, to summarize the past, and then that's all the information have about the past. So you don't need to remember the past. You just remember your state. And with that, and you make a new observation, you combine the information, make a new state, and so forth. So hidden Markov models, POMDPs, all these models have that same structure. This Markov structure. So that's for the generative model of these dynamic systems. Then there is a second question. If we now do inference, and yeah, then there's often what we call the mean field assumption for the variational posterior, but there are also variants on that structure, mean field. So you can still decide if you want your posterior to be even more structured, let's say even more factorized than the generative models. But in the end, yeah, it's a proposal. You just run it. And if you have another proposal, you just run it. And the one that has the lowest energy wins. That's it, right? The challenge is to automate that process. Go from the, let's say the poorer structure to the better structures. Just by fair energy minimization, just by message passing in real time without stopping the whole process, doing an analysis. Right, yeah, it should just keep moving, right? It should just, a structure adaptation should be like state estimation. Just keeps moving over time. There's no resetting. Maybe there's a bit of a thing like a dreaming stage, right? But if principle time moves on, and that's the challenge. And we don't know that in our factor graph, but we have one student who's currently looking into that. Yeah. Like as an engineer, I think the question is, well, what do you start out with? What will be your initial model proposal? So if you talk about factorizations of your generative model, you think, so every factor then kind of represents a prior distribution or conditional distribution. And that's how you kind of build up your model. Where do you start? And it starts with how you believe the world generates observations. And so in that sense, you think about what is the causal structure of the environment and you might have some idea about how physics works or how states transition in your environment or at least how you believe these transition. And that's kind of where you start. And you think, okay, well, maybe let's try a very coarse note. This mine is one, for example, in my talk. Let's try that here because I don't really know what I wanna put there, but I have to put something. Let's try something and see how that works. Kind of models how I believe the world works. And that's how you start out. And then you start thinking, oh, I actually know a bit more about it. I'm not satisfied with how this model proposed, how this model works. And I can change this minus one, for example, with something that's a bit more complex because I know the physics, I know how temperature degrades with distance, for example, and I can build that in. So that's how you go through a second proposal. And it's always inspired by how you believe the causal structure of the world is how that works. So in that sense, it's kind of theory building. You're building a theory or an explanation of your environment. And well, maybe if you're really, really good, hope you have an excellent model, then you can find something that kind of improves upon the state of the art because you can always improve. Just find something that gives you a little free energy or better model for us in that sense. And you can keep tweaking and tweaking these models and these factorizations and things like that. Free energy minimization has been described just as a way to rank the different models and as an imperative on these multiple different fronts. How do we know that free energy minimization is making the policy that is going to be like resilient? Is there a way where similar to a local optimization getting trapped somewhere in a bigger optimization space, there might be some avenue of free energy minimization that makes the system just like break down, like we'll get there faster, let's just accelerate and then there's some kind of one-time failure of the system. I'm just wondering how can one metric that's a number and can be sorted, how can it evaluate such radically different chess strategies or driving approaches? At the end it's just probability theory and probability is also just a number, right? And the free energy kind of it approximates your evidence which is the probability according to your model of observing the data. So if you make a model that gives you the best evidence then it's a good explanation for your data. So in the end it's just, you try to do approximate probability theory and if the energy is then a bound upon your evidence and it also takes into account this posterior divergence term which kind of says the price that you pay for approximating Bayesian inference. And so it has two parts to it, it has the model evidence part and it has the posterior divergence part and both play a role, one says, okay, this is the quality of my solution, it's how well your model explains the data that you observe, you can get a number for that, the surprise or the negative log model evidence and you have the other part of the posterior divergence that says, well, this is the price in information that you pay for making this approximation. You can also put a number on that, it's just the KL divergence that you reevaluate. And then you add those two and you get a number. So yeah, so the question is how can one number kind of represent everything in terms of quality? Well, you might care about different things than how well your model predicts or evaluates the observations, right? And then if you care about other things, then you might use different number but in our sense, in our work, we're kind of, we want to have this Bayesian measure of quality, I don't know if that really answers your question but it's still a good question. It's kind of, is free energy or that number that you get, is that enough, right? That's kind of what's behind this and probability theory says, yes, but if you apply it in practice, then you care about maybe other things, you care about how many people survive or how many mistakes you catch from something. And if you care about that, well, then you should use that as a performance metric. I think, right? I mean, the probability of course has many decompositions, right? In complexity minus the accuracy and surprise minus the KL divergence. But I'm not sure if I interpret the question correctly, but if you say like, how do you know that it does well in practice in the field? Well, you don't, right? The only thing you can say is that, I mean, what the system does, it will look for a configuration that minimizes free energy but you don't know what you don't know. If there is another model that will do better, if we haven't simulated that model, then we don't know. And the only learning opportunities there are is to generate errors. So the only way to really improve the model is to actually hit indeed, let's say situations where it's not working. And then you need to adapt your parameters and even more over time, you adapt your structure. So, yeah, I mean, building a good system is a process, right? It's not a matter of just, you have to be extremely lucky if you just build a system and it completely works, right? What this disc, what we did, or what Active Infants described is a process towards better systems but not, it doesn't describe a system that doesn't make mistakes. In fact, it needs the mistakes to learn. It describes the process. Awesome points. It's that operational insight that if you're getting 100% on the test, like something is at the very best, less than informative, at the very worst, you're way down the wrong path. And then it reminded me of like a grocery store, is it that we're ranking the different objects according to one measure? No, not really. It's like balancing strategies for finding preference. And that's where all of these assumptions, like the factorization in the mean field and it's not guaranteed to give you the best object every time and there's still the stochasticity of the real world and the ability of the grocery store to change, like it's not the end of the story. It's actually just a practice and a process that has some of those features of reactive graphs and biological systems. That's right, yeah. If any of you have any final comments, this was just really one of the coolest times to learn about this, but you're always welcome back to join anytime as a participant or to present. And if you have any final comments, you can note them. Well, it was a real pleasure to be here. And if we have new stuff, we would love to come back. I think it's a fantastic show here, or show, it's a forum or whatever. It's properly called, but I really enjoyed it and we really enjoyed being part of it. And thanks for the discussion, some food for thought as well, it's really cool. It's probably an opportunity to present this. Great, okay, till next time. Yeah, bye. Bye-bye.