 Hello, everyone. Welcome to the Active Inference Lab, the Active Inference Livestream. This is Active Inference Livestream number 14.0, and I'm looking forward to today's discussion. Welcome to the Active Inference Lab, everyone. We are an experiment in online team communication, learning, and practice related to active inference. You can find us on our website, on Twitter, our email address, on our YouTube channel, our Keybase public team, and our shared username, Active Inference. This is a recorded and an archived livestream, so please provide us with feedback so that we can improve our work. All backgrounds and perspectives are welcome here. And as far as video etiquette for livestreams, please mute if there's noise in the background. Raise your hands so we can hear from everybody who wants to speak, and we will use respectful speech behavior. For the regularly scheduled Active Inference streams during 2021, all group discussions are going to be on Tuesdays from 7 to 9 a.m. Pacific Time, and that will be a regular weekly time slot, and we also do special sessions. If you go to this link with the red text, you can see some information about participating in future streams, and definitely get in touch with us if you want to participate in 2021, or if you have an idea for an event or somebody we could invite for a livestream. If you go to that spreadsheet, you will see these columns that cover the date of each Active Inference livestream scheduled and what we'll be reading and anything else about it. We just finished up 13.1 and 13.2, which were really fun, and this .0 video is heading into Active Inference Stream 14.1 and 14.2. Reading the Math is not the territory by Mel Andrews, and Mel will be joining us for these discussions, so we're really looking forward to that. Today in Active Inference Stream 14.0, the goal is to set the context for 14.1 and 14.2, which will be group discussions, and the paper under consideration in all three of these livestreams is The Math is Not the Territory, Navigating the Free Energy Principle by Mel Andrews, 2020. The video is just an introduction to some of the ideas and some of the context. It's not a review or a final word, and a punchline for this paper is all models are wrong, but some are useful and some are the Free Energy Principle. Which ones are which? That's what we're here to talk about. In the sections of 14.0 today, first we'll go through the aims and claims of the paper, talk about a little background represented by the keywords, go through the abstract and look over the roadmap. Then we'll pass through a few key topics and key quotations from the paper and end up with a couple questions that are remaining. In 14.1 and in 14.2, we're going to be discussing this very same paper, so save your questions, post them as a comment on this video, and definitely get in touch if you want to participate live in this conversation or in a future or different conversation. Okay, here we are in the paper, which is the math is not the territory. Navigating the Free Energy Principle by Mel Andrews, which was put up on a FilSci Archive preprint server on October 8, 2020, and the aims and the claims of the paper are as follows, quoting Mel. With respect to the demarcation problem, I defend the position that the FEP Free Energy Principle can be taken to belong properly to the domain of science. And we're going to talk more about the demarcation problem in a few minutes, but that's the question of what is science and what is not science. Mel writes, the FEP is certainly not anything like a theory, a law, a hypothesis or a research program, but these have been classically understood in the history and philosophy of science. So this is a claim that what the FEP is or what it does, how it's enacted in the world is different than some of these classical history and philosophy of science terms like theory, law, hypothesis or program. In order to see how the literature on scientific models, this is a claim, according to the FEP. However, we will first need to establish a firm sense of what the FEP is and what it is not. This will require dispelling a number of false assumptions that have been made about the framework. I accomplish this first by tracing out the historical buildup to the FEP, illustrating where the formalism has been derived from and from what it has come to signify. And continuing on that quote, this serves the purpose primarily of showing how much of the mechanics of the FEP had physical meanings in its initial form but has since come into a strictly formal use. Following this, I will demonstrate that the formalism is empty of any sort of facts or assertions about the state of nature that would allow it to draw taxonomical distinctions, to differentiate classes of natural systems, to explain their behavior in terms of underlying mechanisms or to bring forth testable hypotheses. Strong claims. And that's what's going to be fun about this paper and this discussion is hopefully going to be highlighting many different perspectives on what the FEP is or is not. So if you're watching live, if you're watching afterwards, if you want to participate in some way and you're curious about these questions, these are the kinds of questions that are being actively researched and debated. So you're in the right place. Let's see how we're going to set up the background to address these aims and claims of the paper. The keywords of the paper are as follows. Free energy principle, Bayesian brain, life-mind continuity, scientific models, modeling and simulations, scientific theories, epistemic virtues, falsification, and the demarcation problem. Now, as always, for these keywords, there's really so much to be said about each one. You could study for your whole life going down just one detail of one of these keywords. So it's not a review, it's not a final word. My target with these keyword summaries is the construction of a bi-directional on-ramp. It's not an off-ramp, it's actually an on-ramp for everyone. And the on-ramp is going between this paper and broader communities. So if you're in the active inference community, I hope that this paper is an introduction or a reminder of some of the bigger philosophical and historical trends that the FEP and active inference research is happening inside of. And if you're outside of the active inference research community, then hopefully you're going to find a few points of contact with how you currently view the world, but also start to get excited and curious about this little thing called Free Energy Principle. What is the Free Energy Principle? Well, on this slide, I've put up a couple of terms, and these terms are all going to come together in the Free Energy Principle, and I'll read a couple quotes on that. But the terms up here are multi-scale systems from the left to the right. So that means systems that exist across multiple spatial and temporal scales. So something that exists over milliseconds and seconds and minutes, that's temporal scales, and something that is existing or displaying orderliness over spatial scales that are small to big. So cells that are part of an organized tissue, that are part of an organism, part of an ecosystem. That's multi-scale. Another key aspect of the Free Energy Principle is a focus on ensembles, which is akin to talking about collective behavior. So we're not going to be behavioral reductionists or essentialists. We're not going to go to the smallest possible unit of motion and say that that's the most meaningful, nor are we going to simply go to the largest scale of phenotype and say that that's the most meaningful. We're going to be using this multi-scale perspective and thinking about ensembles of collective systems and using that to move beyond internalism and externalism to integrate wholism and reductionism. We're also going to be taking a strong informational thermodynamic perspective. And some of the history of this we're going to trace in this video and was so excellently traced by Mel during the paper. And info thermodynamics is basically starting with thermodynamics, so about the vibrations of molecules and about heat and entropy, and then considering how, since information is something that has to be embodied or manifest in the real world, for example, on a hard drive or passing through a wire, we can actually think about thermodynamics and signal processing within a broader informational capacity. And then this last point here. Action policy selection as, so this first part is saying it's going to be a control theory or a cybernetic focused framework. And the way that those policies are going to be obtained is going to be through a type of variational multi-scale Bayesian inference. So we're not going to be using linear regression or we're not going to be using some other type of machine learning algorithm. We're going to be using derivatives on this variational Bayesian inference approach. How about this bug on the left side of this slide? Why is there a little bug? Well, that comes from the title of an interview that I carried out with my colleague Martin and Carl Firsten in 2018 that was entitled of Woodlice and Men, a Bayesian account of cognition, life, and consciousness at the URL here. And we asked him early in the interview, we said, what is the free energy principle and how did you get to the place where you are at with it now? And Carl wrote, he was watching Woodlice, which are these little bugs watching them scurry around on the ground. And he wrote, after half an hour of observation and innocent childlike contemplation, I realized that they're scurrying had no purpose or intent. They were simply moving faster in the sun and slower in the shade, which meant that they were spending most of their time in the shade from a population or ensemble perspective because they would go rapidly through the sun and then dwell in the shade. So even if it was 50-50, you'd end up enriching them in the shade. The simplicity of this explanation for what one could artfully call biotic self-organization appealed to me then and appeals to me now. It is exactly the same principle that underwrites the ensemble density dynamics of the free energy principle and all of its corollaries. So that's a key piece. There's going to be ensemble density dynamics, so it's going to change through time and it's going to be collective behavior and it's going to be related to not just the free energy principle but some of its corollaries or some things that are adjacent to it and we'll talk about what those are. He then continued, over the subsequent 20 years, I learned enough mathematics to think about the shapes of woodlice in terms of density dynamics, namely the evolution of probability density distributions over ensembles of states. For example, swarms of woodlice. Happily, people had been using this exact sort of framework both to model the world and analyze their data. I came to know this as ensemble learning and in particular, variational bays. This is how the free energy principle developed into the current framework. In brief, I was very lucky to meet the right people and work in an era when these ideas were in the air. So as always, humble and insightful Carl. What is the Bayesian brain? Let's draw from one of the questions that Martin and I asked in our interview because we tried to be very clear about what the Bayesian brain was so that we could be very clear in distinguishing it what the free energy principle was and also just so that we don't end up confounding our uncertainty as we try to contrast different ideas because it's really important to have a solid understanding of these terms that we're going to be talking about so that we don't accidentally make errors or misattributions or false positives or false negatives in our understanding. We wrote, the Bayesian brain hypothesis and the free energy principle the Bayesian brain hypothesis, predictive coding and the free energy principle are often equated with one another. You have yourself suggested that these three frameworks are variation of the same basic mechanisms. To be clear, what we call the Bayesian brain hypothesis is the idea that the brain performs inference according to Bayes' theorem, integrating new information in light of existing models of the world. A perceptual or cognitive state can be modeled as being a posterior probability, so P of H given D, where P stands for probability, H hypothesis cause and D observed or available data. P is the probability of the hypothesis being the case given the data. The posterior probability is the product of the likelihood P D given H, the probability of the data given the hypothesis and the prior probability, which is P of H or just the probability of a hypothesis being true. In other words, the probability of the model H being true is the likelihood of the model H given the observation D multiplied by the likelihood of a model H relative to other models under consideration. To make these equations a bit more concrete, let us take the following example. The brain is receiving scarce data D from the retina and must form a model H of how the world has caused this pattern on the retina. In Bayesian terms, the problem to be solved is the following, P of H given D. No, not PhD, P of H given D. The Bayesian brain hypothesis implies that the posterior probability at time 1, T1, implies the prior probability at time 2, T2. So this was our characterization of the Bayesian brain and we're going through it because it's a keyword in this paper because it's very important to understand what exactly the Bayesian brain is or isn't so we can understand what the FEP is or isn't. So that's the Bayesian brain hypothesis. It's the idea that we're going to use Bayesian approaches, Bayesian equations, formalisms to model the way that the brain as a statistical instrument in the world, yes, an evolved one, but we're going to focus on the process of inference for now. It's going to be generating hypotheses about latent causes of the world or about things in the world given observations and updating its prior understandings or beliefs about the world. We then asked if Carl agreed with that characterization of the Bayesian brain hypothesis. And he wrote, even though he also improved the question as we had asked it, he wrote in his response, do I agree with this characterization of the Bayesian brain hypothesis? Yes, I do, with a couple caveats. I think it is useful to make a fundamental distinction at this point that we can appeal to later. The distinction is between a state and a process theory, i.e. the difference between a normative principle that things may or may not conform to and a process theory or hypothesis about how that principle is realized. Under this distinction, the free energy principle ergo its a normative principle stands in stark distinction to things like predictive coding and the Bayesian brain hypothesis. So critical point, principles are contrasted with hypotheses. This is because the free energy principle is what it is, a principle. Like Hamilton's principle of stationary action, it cannot be falsified. Key point, and this has resulted in a lot of discussion. It cannot be disproven. In fact, there's not much you can do with a free energy principle unless you ask whether measurable systems conform to the principle. On the other hand, hypotheses that the brain performs some type of Bayesian inference or predictive coding are what they are. Hypotheses. These hypotheses may or may not be supported by empirical evidence. So in other words, those hypotheses can be falsifiable. They can be tested and rejected. You can say if it's the case that we're measuring Bayesian inference, it should have this observation in this brain region, okay, we measured it's consistent or it's not, we rejected it. So it's funny because even though we're talking about Bayesian brain, which is a post-falsificationist idea in the sense of statistics, we're actually going to use the paparian idea of falsification from a philosophy of science perspective to ask whether we could falsify so what does it look like to have hypotheses that are beyond falsificationism? Does Bayesian statistics or the Bayesian turn actually move us away from simple falsification? Because it's not the case that there's just going to be some experiment in most situations that slam dunk disproves. Usually, most evidence is actually ambiguous depending on how you weight the different possibilities. For example, a tree to an evolutionary biologist is an example of natural selection, but to a creationist, the very same tree is an example of creation. So that's how evidence is related to the generative model that someone has. In this view, the relationship between the free energy principle and predictive coding or the Bayesian brain or active inference is the relationship between a principle and a process theory. Crucially, there are lots of process theories that conform to the free energy principle. Predictive coding is arguably the predominant process theory in cognitive neuroscience. However, there are other contenders, for example, based upon discrete as opposed to continuous state space models. These would include things like belief propagation and variational message passing. These schemes or processes serve as plausible metaphors for neuronal message passing that may or may not have the look and feel of predictive coding. It is important to note that there have been other process theories that have not fared so well in light of empirical evidence. For example, probabilistic population codes and attempts to understand ensemble dynamics in terms of sampling from the posterior, e.g. Gibbs sampling and particle filtering. So in other words, free energy principle is beyond or before empirical data. However, it lends itself to corollary hypotheses, process theories that make specific testable predictions about how the FEP is getting done. And so you could predict that little tiny unicorns are doing it and then you could go and measure and falsify that idea. So again, we're using the paparian idea of falsification of theories to discuss whether this Bayesian perspective on the brain is adequate. The Bayesian understanding of the brain, if we pull back a step, is related to what is called and what is key worded as life-mind continuity. And this is what the definition of the inactive life-mind continuity thesis is. It's written as, any living system is ipso facto a cognitive system. Or in other words, life is sufficient for mind. So that's the claim that living systems are mindful systems. What makes a system living and therefore cognitive is its autonomy. Now this often ties me up because I think, okay, autonomous systems does that mean they must be cognitive? Or working backwards from what appears to be cognitive? But isn't that hubris because we're putting our understanding of another system in front of its actual way of being in the world? I don't know. And the last point here is understanding, cognition, requires understanding the principles of biological autonomy. And this is just the simplest phrasing of that thesis. So that's kind of like that claim which is living systems are those with mind. Now here it is awareness coming to play, free will, agency, autonomy. Great questions. If anyone knows, if anyone has suggestions, let's hear them in live chat or come on and let's discuss it on a stream. But this was something that I was just learning about when I was preparing for this talk. And another thing that I found that was pretty cool was a paper from 2017 in progress in biophysics and molecular biology which sounds like it would be just about molecular biology of a journal, actually it's an awesome journal with a lot other ideas happening. And it's a paper by Hippolito and Martins and Ines Hippolito is also a free energy principle researcher. So I thought it would be cool to highlight this paper and this perspective because the inactive life, life-mind continuity thesis is not written about with the FEP in mind whereas this paper may have been written at least a little bit more directly with FEP in mind. And it's called Mind Life Continuity, a qualitative study of conscious experience and the authors wrote, According to the Mind Life Continuity thesis there are three levels. On the weak continuity view whatever has mind will have life although not all things that have life have mind. So this is the weakest version of the claim which is it isn't the case that just being life makes you mindful, however if you have mind you do have life so that's the weakest version of the claim. On the strong continuity view life and mind have a common abstract model or set of basic organizational properties or things that lead to them. So in this view the one-to-one overlap is strong. There's a total Venn diagram overlap. Living systems are mindful systems and that's one-to-one. So you're either both or you're neither. On the methodology continuity view the understanding of mind requires further understanding the role that it plays within the whole living system. So that's a little bit more like a meta or a holist or an ecological perspective because it's like saying we need to understand autonomy and agency which is the embeddedness of the agent in the niche. We can't just look at the head and ask whether mind is inside of it. It's actually a view that suggests that life and mind are relational so that's pretty cool. Each of these views within the life mind continuity thesis share the perspective that cognition should be studied in the context of the whole organism. So an activism holism, ecological psychology a lot of these terms are coming into play with this life mind continuity idea. Let's talk about scientific models. So scientific models it's a pretty big field. There's a lot of different types of models and science means a lot of different things. We're going to talk more about what science is or isn't in a couple minutes but it's times like these when we can just go to the simple English Wikipedia page and try to find the truth. And on simple English wiki it was written. A scientific model is a simplified abstract view of a complex reality. Go to reality keyword if you want to find out what reality is I guess. Scientific models are used as a basis for scientific work. They can be used to explain, predict and test or to develop computer programs or mathematical equations or as I always think about it model explain predict design perturbed control. A scientific model represents complex objects events and physical processes in a logical way. They are an image of an original which can be a model itself. So this is kind of like simulacra theory or the work of Umberto Echo. Scientific models only have those details of the object or image model that are relevant. I take a slight contention there. It's not that they are only capturing the details of the object or the image that are relevant. It's that they capture what they capture and whether what is captured is relevant or irrelevant that's perspectival or whether it's useful or not it's also up to the the eye of the model viewer let's say. And there is no strict mapping between a model and the original object it models. Models may only be valid for a given time interval for a given object or a given purpose. For example there isn't necessarily going to be the one true bitcoin price model. There might be different timescales of analysis or different kinds of trading systems that lend themselves to different kinds of models being utilized and a model trained on year one through five might not be as good or might not even be adequate for a different time period. So models are situational and that's also where we get this idea of science as something that's doing relevant and useful work, utilitarian pragmatic work yet also is a social construct because it's enacted in a social setting. Here's a couple flowcharts that are just showing different aspects of the scientific process and the role that models play in the scientific process. Basically starting from a real system models are made and experiments can be carried out. Models lead to models of the system. Now model system gets used a little bit confusingly for example to refer to fruit flies as a model system as in they're a system that people study but fruit flies are a real system and a model system would be like a simulation of a fruit fly but of course you get into these questions like what if I'm really actually interested in studying the model of a fruit fly but whether you're doing experiments in a so-called real system or simulations in a so-called model system you're going to end up with observations and then even more abstract than simulation is actually doing analytical work or theory above the level of the model system to obtain theoretical predictions like for all temperatures this variable is going to be greater than this variable and then by comparing and contrasting the empirical experimental results the simulations and results that we observe and theoretical predictions that we might get from a more a priori source we're able to compare and improve our model and theory overall so this is not quite the scientific process that's on the wall of the elementary school but it does capture how scientific modeling is about iterated improvements and about integrating across different kinds of experiments different kinds of simulations and just to give a point on the falsification because falsification means in some situations to confabulate or maybe fabulate to make up to generate data that's false like to basically commit scientific fraud is often called falsification of data so if someone says those documents were falsified they don't mean that it was a hypothesis that got tested usually they mean somebody deep faked or disenforged those documents but here is falsifiability from this paparian perspective in the philosophy of science falsifiability falsifiability or refutability is the capacity for a statement theory or hypothesis to be contradicted by evidence for example the statement all swans are white is falsifiable because one could observe that block swans exist falsifiability was introduced by the philosopher of science Carl Popper in his book from 1934 revised and translated into English in 1959 at the logic of scientific discovery he proposed it as the cornerstone this idea of falsification to both the problem of induction and the problem of demarcation so induction is about generalizing from the specifics to the generals but what is this problem of demarcation we knew it's one of the keywords what is it another wikipedia coming in to save the day for these last minute presentation makers like myself the demarcation problem in the philosophy of science and epistemology which is the philosophical field about knowledge STEM knowledge not just science technology engineering and mathematics but STEM means knowledge and epi means above so the philosophical field that's above the question of knowledge in the philosophy of science and epistemology the demarcation problem is the question of how to distinguish between science and non-science it is it examines the lines between science pseudoscience and other products of human activity like art literature and beliefs the debate continues after over two millennia of dialogue among philosophers of science and scientists in various fields the debate has consequences for what it feels such as education and public policy so at the social level this is why it's important because it will often be the case that somebody will say well we need a science driven policy or a science informed policy or that's unscientific policy related to public health or education and so who exactly gets to speak for science and who gets to demarcate or delineate or define science and who gets to say whether science would and if people mess up then they're not doing it right or is science all bad and all the good stuff comes from some other question of activity that humans do so it's actually quite a big issue about this demarcation problem and that is resonant with the fact that people have been debating for a long long time and not come to any simple distinctions now with this paper in mind we can look at the first lines of the introduction of the paper from the start by Mel, gotta love it with respect to the demarcation problem I defend the position that the free energy principle can be taken to belong properly to the domain of science so that's big and unique claim one which is FEP is within the domain of science now it might not be exactly like some of these things that we've seen before but it's going to be taken to be within the domain of science so if you're listening to this and you think well I disagree I don't think that FEP is within the realm of science or maybe you have a yes and well yes I think it's in the domain of science and I think it's also in the domain of something else great so if you have a yes and or a disagreement or a question put it in the live chat or come onto these group discussions and talk it out because Mel will probably be really happy to hear about it and we'll all learn so much by bringing out these different perspectives on science because we're going to some of the fundamental questions in philosophy and in science with these papers let's talk about one way that this demarcation problem has been resolved or addressed in the past and that's this NOMA idea or non-overlapping magisteria which is the name of a paper by Steven J. Gould who is a evolutionary biologist I guess we could call him and Gould wrote that or magisterium it's kind of like the reign of science covers the empirical realm what is the universe made of fact and why does it work that way theory the magisterium of religion extends over questions of ultimate meaning and moral value these two magisteria do not overlap nor do they encompass all inquiry consider for example the magisterium of art and the meaning of beauty now as always when quoting somebody or reading someone's paper we can really respect the author's view their perspective they said it out there how they wanted it to be said Steven J. Gould chose his words carefully when he wrote this but also we can ask whether we agree or disagree so here's this non-overlapping magisteria as a Venn diagram they're so non-overlapping that they're just separate circles so on the science side we get what is the universe made of that's what that's fact in Gould's mind and why it works that way which is how or theory or potentially a scientist might call that mechanism and then under the NOMA model religion is what deals with morality and metaphysics now interestingly when some people think about non-overlapping magisteria they'll say something like well science says what the world is implicitly they mean how as well like what the world is it's also a mechanism and religion says why and so that's a what how versus why distinction for the demarcation so these are questions that everyone can think of whether you're just learning about these topics or whether you've been practicing in one field or another for a long time it's an interesting thing to wonder about what is science what isn't science who decides and why does it matter is science one to one with technology or are there technologies that aren't science or are those two bound together more often than not and when we really explore these maps of meaning that we have for these different fields we start to get a better perspective on oh yeah so when somebody says that it's a science informed policy they want me to think it's good they don't want me to think that it's faith based policy because faith based then dot dot dot dot dot so it's always a cultural context that these terms are understood within and it's sort of like a battlefield of meaning for who gets to control these different terms but anyways I just really like that mel led so clearly and strongly as a philosopher of science who's so embedded within our communities as well and it just it rides the line with scientific understanding and philosophical depth so speaking of philosophical depth and scientific understanding let's talk about epistemic virtues and yet another wiki selection I believe it's our last one being an epistemically virtuous person is often equated with being a critical thinker and focuses on the human agent and the kind of practices that make it possible to arrive at the best accessible approximation of the truth so I wasn't too familiar with this term epistemic virtues but it's a little bit like saying what are the good attributes or the adjectives that are good you know not the ones that aren't good the ones that aren't good are the ones that are the epistemic virtues and so a couple of them are written down conscientiousness which is funny in light of our discussion on the big five personality traits in Actin 13 as well as a couple of these other ones that you see here such as creativity curiosity humility people can go into debates about what each of these terms mean or which ones are or aren't virtues but the question I would leave us with at least in this introduction section is how do we facilitate epistemic virtue so given our own understanding how do we not just develop it in ourselves but make communities where epistemic virtue can thrive in an accessible fashion alright let's get to the abstract of the paper mel wrote the free energy principle or FEP has seen extensive philosophical engagement both from a general philosophy of science perspective and from the perspective of philosophies of specific sciences cognitive science neuroscience and biology so people are talking about it the literature on the FEP has attempted to draw out specific philosophical commitments and entailments of the framework so what are the commitments that's leading up to this the ceremony right what's the commitment that you make leading up to the ceremony and then the entailment what follows from the ceremony so what leads one to be committed to FEP and what is entailed by committing to the FEP but the most fundamental questions from the perspective of philosophy of science in general remain open so this is a list of awesome questions so each one of these questions we could probably have a whole group conversation on and there's such wonderful questions so I really like how this paper was phrased to what disciplines does the FEP belong does it make falsifiable claims what sort of scientific object is it is it to be taken as a representation of contingent states of affairs in nature does it constitute knowledge what roles it intended to play in relation to empirical research and does the FEP even properly belong to the domain of science but it seems like from the introduction that Mel's answer is yes I'm maybe going to put it in my five cents at the end of this talk but it seems like it's good if it's in science but I'm not too attached abstract to the extent that it has engaged with these questions at all the extant literature has begged dodge dismissed and skirted around these questions without ever addressing them head on look at these spatial metaphors and social metaphors begging dodging skirting head on just wonderful these questions must I urge be answered satisfactorily before we can make any headway on the philosophical consequences of the FEP so in other words if we don't understand how we got here and where we're at we won't be able to make headway as we move forward with the FEP so if we don't know the past and if we don't know the present we're not going to be able to design effective policy sounds like active inference I take preliminary steps towards answering these questions in this paper first by examining closely close first by examining closely key formal elements of the framework and the implications they hold for its utility and second by highlighting potential modes of interpreting the FEP in light of an abundant philosophical literature on scientific modeling so quite the prospectus for the paper and let's see how we're going to get from A to Z so here's the roadmap and I had to even reformat the roadmap slide because there were so many sub sections but it really helped because it's a paper that doesn't have figures and appropriate enough for a paper called the math is not the territory there isn't any math equations in the paper but it references a lot of quite technical areas so it's kind of like this fractal overview because you can dive into the thermodynamics or dive into the Bayesian mechanics but this is the overview and it's generalizing so it starts with an abstract and introduction and then begins with the sort of topic du jour which is the free energy principle this being a philosophy in history of science paper it's framing the free energy principle in terms of its history and we're going to walk through the history in a couple of minutes history of the formalism of the free energy principle itself leads one to a consideration of Markov blankets free energy and generative models which are things that specifically come into play when talking about the FEP as it's used and understood today reinterpreting the FEP is sort of what we're here for today is to interpret and reinterpret I think and then the seconds or the last part of the paper many of the topics that were addressed in the first half of the paper on the left side of the roadmap to more general questions in the scientific modeling literature so for example the difference between normative and process models the ideas that came out of mathematical math mathematical psychology related to neuroscience and other fields and then ideas about models that have targets or don't so a lot of different cool generative things that are happening with this paper and it's a lot to cover so let's just jump right in the only hint Friston himself has given us to go on with respect to the meta theory or philosophy of science of the FEP which is what we're here for today is that he labels it a normative theory or normative model contrasting that with what he terms process models and then it continues three fundamental facts about the FEP follow from the fact that the FEP is a normative model so again we have sort of two extremes types of things in the philosophy of science we have the hypotheses those are the theories they're empirically falsifiable and then we have these principles they're the more normative non falsifiable almost axiomatic aspects of science so three things are going to follow from the fact that the FEP is a normative model and not a hypothesis one of them is that the FEP is not falsifiable and does not lend itself to the direct generation of predictions or hypotheses so it's not a hypothesis two the FEP does not cut at the joints of nature which is to say that it does not illuminate the boundaries between natural kinds and three the FEP does not deliver mechanisms so interesting claims the general questions that this brings me to wonder about is how do we categorize these kinds of scientific ideas so for example if someone says well that's a theory gravity is a theory evolution is a theory that means dot dot dot dot dot so how do we classify what type of scientific thing ideas are whether it's a framework hypothesis idea more broadly just leaving it vague or whether it's a principle how do we map out the implications of different kinds of ideas and then very importantly how do our pre-existing assumptions influence how we judge ideas so we're not the neutral arbiter of truth it's more like we have this culturally prepared field and then the idea is like the stimulus or a seed that flowers or doesn't in a field so how does the status of our field influence how we judge ideas and as far as the cutting at the joints of nature that's something that really fascinated me and we're going to go into the joints of nature in a couple of minutes so free energy principle part three in this paper and remember that these dot zero videos are kind of overviews they're just the notes that I take as I go through reading the paper I try to make that exoteric for those who might appreciate a scaffold but really there's so many fun details and so many nice points inside of the paper so if you're curious or you want to learn more the paper is the place to go section 3.1 the history of the formalism so this is quite different a scientist might not start by saying well the thing you need to know about this theory is here's the history of it that might say is the thing you need to know about this theory is this scientific attribute ABC but this is a more historical perspective the questions most frequently and fervently asked about the free energy principle are is it true what is it true of how do we empirically know that it is true these questions I argue rest on a category mistake so wrong kind of thing you know you ask how many miles per gallon does this thing get and someone's like hey this isn't a car that's a pizza they presume that the FEP is the sort of thing that makes assertions about how things are cuts at natural joints and can be empirically verified or falsified so it's the inverse of these things it turns out that people are interpreting the FEP as if it's a falsifiable hypothesis that makes assertions about how things are and cuts at natural joints and it turns out that is not the case in fact all of those are the opposite why because the FEP is more like a principle than it is a hypothesis pretty cool so again what are these three points it's the idea that in a hypothesis does make assertions about how things are does cut at natural joints and can be empirically verified or falsified and it turns out that now argues the FEP is not that way let's talk a little bit about nature and its joints specifically carving them so just to quote from the abstract of a 1993 paper by Ali Khalidi carving nature at the joints descriptive title it's quite an interesting point which is this paper discusses a philosophical issue in taxonomy at least one philosopher has suggested that the taxonomic principle that scientific kinds are disjoint so in other words when we're doing taxonomy classification organizing of scientific kinds are they disjoint or might they be overlapping for example when we're doing a taxonomy of insects there's some groups that are disjoint like ants and cockroaches are a disjoint group but ants and insects are non disjoint there's are not disjoint groups because ants are nested with inside of insects and an opposing position is defended here marshalling examples of non disjoint categories which belong to different co-existing classification schemes this denial of the disjointedness principle can be recast as the claim that scientific classification is interest relative but why would anyone have held that scientific categories are disjoint in the first place it is argued that this assumption is needed in one attempt to derive essentialism this shows why the essentialist and interest relative approaches to classification are a conflict so in other words if we have an essentialist perspective on scientific kinds then we can probably make them disjoint because if like the red cube is red and the green cube is green then we can separate them at the joint and the red will be red and the green will be green but if the red and the green are like ink molecules that are swirling around in solution then it's more like it's relative to it's interest relative we could say in that case there's a non disjointedness between the redness and the greenness which makes it a little bit more difficult to be reductionist or essentialist and demands a relativeist point of view and that sounds a little complicated it is a little interesting and in 2011 there was a book called carving nature at its joints they just go with the same title because it's the same ideas and in a review by PD Magnus I thought there was a couple of points that were just really crisply brought up so PD Magnus wrote to be fair part of the problem with this book I believe is with natural kinds themselves the phrase natural kind is too often used as a term of philosophical jargon with a presumption it has a precise meaning even though there are a number of separate precise meanings it can have first natural kinds may refer to categories which will support inductive inference second natural kinds may refer to whichever categories ought to appear in scientific accounts of the world third natural kinds may refer to metaphysically robust categories which are written in the fundament of being or which structure the space of possible worlds fourth natural kinds may refer to categories which appear as predicates in the laws of nature fifth natural kind may refer to categories which anchor a special type of reference although these five aspects do overlap somewhat there are difficulties with each and tensions in between essays in this volume document some of these difficulties so when we're trying to understand this topic this area reduce our uncertainty and we want to know about separating natural kinds it's not quite so clear what these natural kinds are or what joints do or don't separate them but that's what we want to pursue and so that's why it's fun how could we go even deeper down this natural kind rabbit hole well we could go to the original next it's what we're going to do this is from Plato's faders in 265 so Socrates writes we can think about dividing things by classes where the natural joints are and not trying to break any part after the manner of a bad carver so that's this idea of carving at the joints if there are two things and you're not making a clean separation then you're going to make this violent cut but if it's a perfect separation then it's going to be a clean cut at the joint as our two discourses just now assumed one common principle unreason so in other words is it that reason is a principle and unreason is something disjoint or are reason and unreason in a continuum just like the red and the green from the previous example so in other words natural joint on the right side can we separate class a and class b perfectly or are they going to be a little bit more intercalated just as the body which is one is naturally divisible into two right and left with parts called by the same name so our two discourses conceived of madness as naturally one principle within us and one discourse cutting off the left hand part continued to divide this until it found among its parts a sort of left-handed love which it very justly reviled but other discourse leading us to the right hand part of madness found a love having the same name as the first quite interesting but divine which it held up to view and praised as the author of our greatest blessings so are the left and the right two different natural kinds that we can just cut down the middle or are they continuum on a type of thing Phadrus says very true and then this last little quote from Socrates even though I know we're quite far afield now I myself Phadrus am a lover of these processes of division and bringing together as aids to speech and thought so in other words sometimes just separating things into natural kinds or continuums is useful for thought and if I think any other man is able to see things that naturally can be collected into one and divided into many him I follow after and walk in his footsteps as if he were a god and I like this also because it's from Tufts which is the institution of Mel so I just loved grabbing the Phadrus text from Mel's institution for this extremely interesting historical dive let's go back to the paper in order to understand the FEP however and why it's closer to a statistical technique than it is to a falsifiable theory of biological self-organization it is important to see that there's a clear precedent for leveraging the maths of statistical mechanics as a method for Bayesian inference so it's like whoa some people know about philosophy some about statistical mechanics other about Bayesian inference but we're going to bring them all together which means that we're all going to be learners here and that's why it's great to include everyone in the learning journey because no one is going to read a paragraph like this and just dust off their hands and go yeah I already finished learning about those topics I'm done learning about statistics or I'm done thinking about philosophy those who are awake and want to learn and want to find out other people's thoughts on these questions just come right ahead section 3.1.1 is called the epistemic turn or the knowledge or the informational turn in statistical mechanics so just a couple of key points because this history section really is quite critical so this is just in the order of these sections a couple of memes or key themes that I pulled out so what was happening in the 1900s in statistical physics and statistical mechanics is that information was getting physicalized and physics was getting informational so all of a sudden there was a little bit more of a recognition that physical things could be equated with information for example if you're talking about the surprise of reaching into an urn and pulling out different kinds of balls that could be equated to the surprise that you might have of different temperature air molecules swirling around in a room so if all of the hot molecules went on one side it would be something that would be quite surprising just like if all the blue balls went to one side of the urn and all the green balls went to the other side of the urn so that would be something that you wouldn't expect and it's through analogies like that that people started to bring equivalency to the informational and the physical side of physics the maximum entropy principle emerges as something like an Occam's razor for infothermodynamics so again if we're thinking about an urn with a different color balls or a room with a bunch of molecules of air this maximum entropy theory is going to basically say well the air molecules are going to be maximally spread out and the distribution of their speeds is going to have such and such characteristics and it's almost like an Occam's razor principle because it's like saying yes it could be another way but this way with the air molecules equally spread out it turns out that's the most likely one because there's an increasing infinite number of ways that it could be an increasingly likely ways of organizing around the room all in this corner, all in that corner half in this corner, half in that corner but the maximum entropy way is spread out and there's more details in the paper and of course there's more to learn about here but the 3.1.1 is just to say that during the 1900s maximum entropy and this infophysical nexus starts becoming a more important threat 3.1.2, the mean field approximation so the mean field we can think of is like as you get a field of vectors you can start to kind of blur your eyes and look at the mean field approximation and so one way to think about that is like the central limit theorem so if we have a distribution and we're just pulling out with a Gaussian error out of this distribution we're going to approximate a draw around the mean value of that distribution so that's the central limit theorem just from 30,000 feet and the mean field approximation is kind of like the central limit theorem because it's basically like saying in the mean limit there's going to be this asymptotic approximation to collective behavior that might be comprehensible so as we get to more and more and more cars we can start to model them like they're a flow of a fluid even though we know that they're discreet and so you can't do a flow model with three cars but maybe with three million cars you could and this is getting us to this point of idealized multi-agent systems so not just draws from a distribution but potentially there could be multi-agent systems like lattices of electrons or chemical solutions that also have these mean field approximations there's then this introduction of the variational methods and the variational free energy it's brought up as a trick but who's tricking who are the mathematicians tricking everyone else? are the scientists in on the trick? are the engineers also tricking everyone? um it's a joke but the variational methods and the variational free energy are definitely a heuristic and an extremely useful tool that help us take something that might be intractable and then perform computation on it in a way that's good enough free energy which is a concept that again just like a lot of things in this paper have a physical analog or a physical tether earlier in their history like temperature is related to the distribution of the air molecule speed in the room but then temperature in a machine learning model is something like how fast it's trained like a high temperature model it sort of never settles down it's always just jumping around whereas a low temperature model is going to be trapped in a local region but it might come to a higher sampling intervals in that local region. Mel in the live chat, mathematician Isgober so true and the free energy in machine learning came in when machine learning researchers were asking themselves how are we going to model multi-scale systems at the appropriate gain or bias so we don't want to have an outrageous false positive or false negative rate we also don't have the ability to go and do an atom level simulation even if we wanted to we couldn't do these atom level simulations so how are we going to coarse grain and make models that are useful but don't go into every single possible detail and one question or thought I had is is this variational free energy physical for molecules so when we talk about Gibbs free energy and in about in a given reaction a given chemical reaction the free energy of that reaction is that a physical thing or is it energetic or is it conceptual and then when we're talking about the free energy of a neural network is it physical or not but in this 3.1.3 we get a sort of grand connection between ensemble densities so the wood lice on the pavement you know the ants in the colony that density of the ensemble is the Bayesian posterior density so the updated prior is the posterior and the posterior is going to be the prior for the next time point but the updated prior is the posterior and that is going to be a mean field approximation and that's going to be governed by this expected free energy in section 3.1.4 the variational Bayes or VB method is brought up and here's where VB comes into play analytical Bayesian methods are basically intractable and can be implausible to calculate exactly for example if you're doing Bayesian phylogenetics and you have a bunch of insect genomes and you're trying to infer the tree that is the most likely hypothesis about common ancestry amongst these insects to make an exact analytical Bayesian model for every base in the genome you're going to be calculating a while however we can empirically explore probabilistic graphical models in order to simplify that situation and so graphical models are just relational models nodes and edges that means a graph that's graphical so a network is a graph doesn't always mean like x y axis you know quadratic line graphical just means relational and we're talking about the relationships between observations and hidden causes or two hidden causes and those are probabilistic relationships whether it's probabilistic emission of an observation given a hidden state or a probabilistic relationship between two different hidden states they're both graphical models and the idea is that we as investigators want to fit our model using Monte Carlo sampling and or metrics of distributional divergence so Monte Carlo sampling it comes from gambling because Monte Carlo is where a lot of gambling occurred I believe and the idea is if you want to ask a question like how likely am I to get a deck draw from this deck of cards that has such and such characteristic there might be some that are really easy to calculate like okay the chance of getting a two well there's four out of 52 cards are two so there's the probability but there might be another type of question you want to ask about that deck that's a little bit more nuanced like what's the chance it's this card or that card but not this one and it's followed by this and it has to do with this what you can do is instead of even trying to specify out the formal or analytical probability what you can do is just simulate thousands or millions of draws from a simulated deck and then you can look and say okay ten out of a million had this characteristic ergo that's my empirically derived probability so that's using Monte Carlo or Metropolis Hastings coupled sampling to figure out the tractable way to do these Bayesian approaches another way that you can actually tractably deal with these Bayesian approaches is this variational Bayesian approach which we're not going to go into too many details here but basically unlike the sampling based Monte Carlo Bayesian methods the variational base uses the distributional divergence and it no optimizes the accuracy minus the complexity just again first pass so that it tries to hone into the tightest distribution that actually covers the density of what you really want to estimate but doesn't get too tight it doesn't overfit and then lastly there's this section on the innovation in Friston's free energy minimization so if you're curious about what Carl has done and the Google Scholar is stretching on a little bit too long let's take a look at this section to check out so let's pull together and just ask where are we at with the free energy principle so what did Friston do what is the free energy principle and how was it discussed in Mel's paper so the first is the Falker Planck and Mel writes in the context of the free energy principle the Falker Planck equation describes the evolution of the state of a system as such it can be thought of as a trajectory through one abstract state space which is a probabilistic representation of some lower order abstract state space representing what state a given system is in over some definite time window so it's like it's a density evolution function that's a trajectory in an abstract probabilistic space very chill we can supplement the Helmholtz decomposition so let's think about that definite time window from the Falker Planck and pick up here a three dimensional vector field that satisfies the appropriate conditions for smoothness and decay so smoothness means that it's not a rough path in other words you can take a ruler and just lay it on the hill and get a derivative and you can take that ruler and move it over the hill and get continuous derivatives everywhere if the appropriate conditions for smoothness and decay are met then this vector field can be broken down into solenoidal which is a curl and irrotational which is a divergence component this is known as the Helmholtz decomposition the fact that we can perform the Helmholtz decomposition is then known as the fundamental theorem of vector calculus so in order to have fields of well behaved vectors we need to have this ability to kind of separate out and take the partial derivatives in a few different ways that's the decomposition so we're going to be thinking about the time evolution of a trajectory in the abstract state space that's Falker Planck what is that state space going to be decomposed into well it's going to be decomposed using the Helmholtz decomposition into a few different vectors some of those vectors are going to be solenoidal they're going to be like curling around the mountain at the exact same elevation that's like an isocontor and then there's also going to be a non-rotational divergent component one which goes uphill and one which goes downhill now we're going to be doing that on non-equilibrium steady state densities this is related to this idea of the Markov or post Markov blanket I don't even know what to call it Mel and anyone else let's clarify this when we're talking in our group discussion a Friston blanket or a yellow blanket what are we going to call them but when it comes down to it when we want to be maintaining this infothermodynamic non-equilibrium steady states it's required to have a relatively better because you don't need to have the perfect one you just need to have a relatively better model than other agents like you you need to have a relatively better generative model of what? of action not necessarily just of accurate state estimation so the bacteria isn't just doing estimate oh boy we're running out of sugar I wonder if there's sugar somewhere else it's actually doing inference on its behavior on policy its actions so that it can maintain that infothermodynamic barrier Mel writes Markov membrane that's my contribution if only we could all make such a great contribution this need for a maintenance of an infothermal non-equilibrium steady state and therefore a generative model of hidden causes the world entailing policy selection paths leads to this imperative to be self-evidencing or an information forager that takes this multi-scale cybernetic goal-seeking good regulator requisite diversity characteristics so in other words if you want to be staying alive maintaining that asymmetry where you have more organization or more information that's that infothermal if you want to be maintaining that competitive strategic asymmetry then you're going to be needing to have the ability to engage in effective policy effective policy is only going to be happening if you have an effective generative model of the latent causes in the world and that means that if you're going to be a cybernetic agent that's able to have a generative model about the latent causes of the world and respond appropriately you're going to be needing to be an information forager that has these good regulator and requisite diversity characteristics that is what's subsumed under the free energy principle now what are the corollaries so the corollary hypotheses or theories are those that are going to be making specific testable hypotheses or multiple and a couple of them that we've brought up today we didn't write the ones that according to the alias interview didn't fare so well like Gibbs particle filtering or Gibbs sampling but we can write up here active inference predictive processing, Bayesian brain and what else so if you have any other ideas for what is a corollary to the free energy principle let's hear about it that'd be fun and each of these three make specific predictions about kinds of observable things you might observe or the implications of different kinds of perturbations into the system so these are a lot more axiomatic and background more foundational whereas up to the top we see in the corollaries these are more like theories that make specific testable hypotheses for example active inference could be falsified if somebody shows that external states are able to alter internal states without going through sensory states for example so that's the free energy principle in this next section which is 4.1 we will turn to the implementational details of the FEP which is where it will be demonstrated that the formalism does not latch on to any features of the real world systems of interest so we sort of built up how the FEP draws in historical context from info and thermodynamics and then draws in this cybernetic thread to result in a paenopoly of these testable hypotheses so let's talk about section 4.1 but before we jump in just general questions and this is because it's easier for me to come up with questions than answers is what kinds of models do latch on to features of real world system of interest and what is of interest of interest and how is it related to what is useful for basic and applied researcher or for the philosopher of science so for example if your friend says hey you might be interested in this well for the applied researcher or for the engineer maybe of interest would be this is useful hey you might want to learn about this software package it might be of interest to you because you might use it effectively to a basic researcher somebody who's working more in the theoretical or the abstract areas of interest might be like you'll find some pattern in here that's very interesting you might have some thought cascades that are exciting and so it's a different interest and the philosopher of science what is of interest for them I don't know let's trace a little bit about this Markov blanket question and the Markov blanket we're going to return to it a lot more in later on in the year when we are in April on this emperor's new Markov blankets and in a few other cases we're going to be talking about Markov blankets but again we're always learning about these topics so not too surprising we're going to come back to them multiple times so this is what Mel wrote I love in the live chat people are coming up with all kinds of funny Markov related terms Markov train chains by Markov just yes chains of metaphors metaphors for chains why not we come to the conclusion that within any non-trivial network any node ought to have a multitude of Markov blankets first in employment of the Markov blanket construct differs from Pearl's original 1988 construction in the subdivision of the blanket into sensory states nodes whose influence is directed towards the blanket of nodes X and active states nodes influenced by X okay so here's 1988 book by Judea Pearl probabilistic reasoning in intelligent systems network of plausible inference now this idea of Bayesian computation on graphical models or Bayes nets is discussed in this book and elsewhere the Markov property though there's a couple things that have Markov's name attached to it and I wish there was often a clearer way than just invoking some person's name but we'll just use it the Markov property in probability theory and statistics refers to the memory list property of a stochastic process the term strong Markov property is similar to the Markov property except that the meaning of it is defined in terms of a random variable known as the stopping time and so the term Markov assumption is used to describe a model where the Markov property is assumed to hold such a hidden Markov model so for Markovian systems we can think about a blanket that is insulating a node X now as Mel pointed out in Pearl's original construction it was in a graphical general abstract context the node X was insulated from the broader network by a blanket set of nodes Friston was going to do something a little different which is the nodes that are incoming into the node of interest so here mu is the internal state that's like the X node from the Bayesian notation the nodes that influence the node of interest are the sensory states that's like the incoming states but again it's whichever perspective is inside or outside so we're speaking from the point of view relationally of this internal state the sensory states are coming in and the action states are the ones that are downstream and so notice that the sensory states actually include states that don't directly impinge upon internal states of interest but actually relate just directly to active states and so that's this diagram we've seen as the Friston called blanket as a statistical boundary with the parents children and the parents of the children so it's kind of like your family for the person of interest includes this person who maybe they're not directly related because it doesn't have to be every arrow on here but they're a parent of their child so it's important now again it would be awesome to have people who know better and have thought about it more to give their perspective but I just tried to make a little bit like a continuum of Markov blanket ideas so on the left side we have the actual Markov our colleague from Russia over here and maybe I'm kind of straw personning these arguments I'm just trying to throwing them up there trying to understand where we're coming from and where we're going so Markov stated it using mathematics pencil and paper and this side of the spectrum is held down by the abstract the general and no need to appeal to computational bases Pearl is somebody who had a computer not computers like we have today but good ones nonetheless I believe Pearl's framework is based upon wanting to integrate empirical data in a way that mathematics isn't so that's one of the distinctions between statistics and mathematics because it's dealing with statistics and real data sets potentially weaker assumptions might be in play so this is sort of somewhere in the midpoint and first in is way on the other side of the Overton Markov window and this is where we're getting the integration of ideas like the node consists of sense and action states it's not just insulating states it's actually there's this temporal nature where things that happen or are like sense states coming in and things that happen afterwards are like actions heading out and that's like a Bayesian directed graph that has a temporal component we also bring in these attributes like the cybernetic imperative to maintain the non-equilibrium steady state the idea of a generative model of non-local dependencies or latent causes in the world cybernetic model of the environment the good regulator or the requisite diversity ideas and also some of these other threads that might not be downstream of some QED proof from Markov but nonetheless are important for understanding of Markov blankets in the free energy principle like the inactive embodied and cultural embedded perspective so can the Markov blanket idea have interpretable parameters so do any kind of blanket capture what we think they're capturing or what should we think they're capturing is this implementation tractable will it be useful for different kinds of systems is there clear philosophical commitments or not and is narrative thermo cybernetics a metaphor or not just you know I'm curious about that few more quotes from the paper and thanks everybody who's watching and posting in the live chat it's pretty fun and little hard to be responding to everything but it's fun times Mel writes this is how the Markov blanket construct is operationalized the upshot is that the power to select systematic boundaries rest at least in part on the researchers intuition which I would also say is true for regression coefficients principle component analyses T-SNE dimensions okay that's just what it's like being a scientist we didn't sign up for this because we wanted objective non-personalized recommendations we wanted to be part of that process the Markov blanket formalism were to independently track natural joints we would need to equip it at least at the outset with some threshold value which would determine precisely the degree of conditional independencies that would count across the board for the possession of a Markov blanket so that's like when you do principle component analysis on a real matrix of data let's say principle component one explains 40% of the variance and then the next one's 30 and next one's 10 and you get this long tail of hundreds and hundreds of principle components that explain less and less and less variation there's no clean line you can't just say well if it's less than 1% of the variance then it's not an important principle component there isn't a measure or a value like that the threshold is always going to be in the details it's always going to be how it's enacted by the scientist if such a threshold existed ask yourself if it could and we're baked into the Markov blanket formalism then we could consider Markov blankets to be in some sense real features of real world systems so again like p-values you can say I mean come on p-value of 0.0000001 come on let's just say that they're different but no you can't do that because there's no natural kind with p-values it's just a continuum we might discover them measure them and count them we might meaningfully ask whether some existing system does or does not possess a Markov blanket so again let's just say there's two tree species and one is 10 plus or minus 1 foot and the other one is 3 plus or minus 1 foot there's no overlap and you do the t-test and the p-value is just ridiculously low and you say come on can't I just say it's a natural kind that they're naturally different and so the answer is no so this is a similar realization in the context of the empirical detection of Markov blankets potentially from empirical systems if we can conceive of some thing as a discrete thing as a coherent system then it is possible to formally represent it as possessing a Markov blanket there are no Markov blankets to be discovered in nature and they are not in the business of illuminating natural joints so it's like there's no p-values in nature there's no structural equations out there in nature we're not discovering the structural equation model we generate it in response to observables so that's why the FDP is not in the business of illuminating natural joints finally thermodynamic entropy and the Shannon entropy are only equivalent under the generalized Boltzmann distribution which it has been argued applies only at thermal equilibrium thus in general information entropy and physical entropy are distinct living systems are by definition far from equilibrium systems thus information entropy and thermodynamic entropy do not converge in the regimes of interest to us so just a couple general questions on this part what are the formal axioms and the logic or ontology of the free energy principle this is what Mel has taken such an awesome effort and stab at with this paper is what are the commitments and entailments of FEP what did we have to do to get here where are we going to go philosophically which of the model assumptions of active inference or FEP are flexible so where would we find that the model is still performant on data sets that might violate some expectation of the model so for example a t-test might have the assumption that there's equal variances between the two groups you can do an unequal variances t-test it's a different kind of test but for an equal variances t-test it assumes that the variances are equal but they don't have to be equal to the highest possible degree it turns out you can kind of fudge that one a little bit or it might still give you directionally accurate findings but with potentially a reduced or an inflated p-value so in other words what assumptions are we bringing to the table and what are the entailments and then which one of these things brings the table are a little bit flexible what are living systems doing about living systems this is related to that life mind continuity it's related to Schrodinger's question what is life from 1944 it's related to complexity theory it's related to so many discussions that we're having about agency autonomy individuality personality multi-scale systems there's so much happening here and what other questions do you have because the questions that all of the participants bring to the table are often the most exciting once it might seem then that the FEP could perhaps be thrown at real-world systems and the degree of biological or cognitive complexity can be read off of features of the systems generative model the FEP however does not dictate what we should write down for a given systems generative model and once we have written down the generative model of the system under investigation we have switched from the purview of the FEP to that of a process theory so again let's think about it with a t-test the t-test by itself doesn't dictate which exact statistical test we should do for a given system so the t-test framework doesn't say this is what you measure about ant colonies the t-test says given what you give it it's going to give you a p-value and so that's something that the FEP is analogously going to be providing us with so soon as we start asking about specific deployments of the FEP in situ in an actual case then we're talking about falsifiable process theories so it's almost like linear aggression theory is in the background that's abstract and then you can ask in specific situations whether you're using a falsifiable model using a linear aggression on a real data set but that's not at the level of talking about what linear aggression is which might be an in-principle discussion this I thought was an interesting part as well a single hydrogen atom at rest in a vacuum is not meaningfully interpreted as performing approximate Bayesian inference over its environmental states a time series of mere abiotic self-organizing systems say a whirlpool or a candle flame would show it throwing off its mark-off blanket and establishing a new one critically this does not entail the FEP cannot in principle apply to such systems it means only that we have no reason to apply it to such systems we have nothing to gain epistemically from applying it to such systems so that will be quite interesting to hear Mel's perspective on and everyone else as well what is different about systems that seem to just kind of dissipate and don't have this auto poesis characteristic like a candle burning or a landslide or a lightning strike they're energy-dissipating events but they're a little different than living systems and so it makes me think of Le Vosier working with oxygen and thinking about how wow humans are taking in oxygen and breathing out CO2 candles are taking in oxygen so to speak releasing heat breathing out CO2 kind of the reverse plants reaction so are they similar are they different what's happening there just a couple more quotes this is really fun though we found that the framework accommodated certain things very well this is Mel in review at the latter part of the paper the Markov blanket does a very nice job of representing systemic boundaries the temporal depth of the trajectories being solved by the latent generative models postulated under the FEP is a very elegant and informative representation of something like cognitive complexity and the free energy parameter itself maps onto a systems attunement with its environmental context its cohesion and internal consistency among other things we also found however that the FEP does not pick out or discover these aspects of natural systems even in silico but only provides a useful model of them so this is really awesome because it just is such a positive example of the yes and mindset that we want to be fostering with active inference and FEP we're drawing from all these different areas areas where we know we're going to be learning for a whole life about information theory and cognitive complexity and machine learning we're drawing hopefully together different ideas leaving the door open for those who can contribute in ways that we don't even know or expect building together and then also just clarifying the FEP is not doing everything it's not picking out discovering aspects of natural systems even in a simulation however it may yet be something quite useful for us to deploy and this is just a few more points from the modeling section on this paper yep it's just it's a Saturday, it's a fun paper we're looking forward to 14 and so Mel writes this comes in two forms first the literature on scientific modeling has increasingly come to acknowledge the status of non-representational and non-target directed models and second even under the presumption that all models represent target systems the utility of a model is generally understood to stem from idealizing black boxing or core screening away from essential, inessential or distracting details of a target system in other words deliberate misrepresentations or omissions make a good model not fidelity so very interesting it's a semantic compression and this question of lossy or leaky compression and in the context of scientific modeling it's related to whether we're talking about representational models or non or target directed or non-target directed so first point here how are models scientific models or other kinds of models that's the demarcation problem related to semantic compression how can we think about semantic compression as encryption and it could be encryption that's reversible or not how is semantic compression related to narrative how are narratives semantic compression events how are narratives the stories that we tell able to compress vast swaths of syntax or even other semantics how is course graining or removing the inessential or acting details of a target system discarding the relevancies as Bucky would say how is this course graining related to the causal entropy of a system or maybe even its slow synergetic modes of operation good models live on this complexity accuracy trade-off in the philosophy of science space so you want your model to be something more than nothing if it's not reducing people's uncertainty at all if it's not helping them get more accurate about something it's nothing but then there are model complexities that can outweigh how accurate or how much it assists an agent's accuracy so if you have to carry around 100 things in your head and then it only helps you a little bit maybe you could have had another thing in your head that would have reduced your uncertainty in a better way so it's just so cool that we see this complexity accuracy trade-off manifested in the actual formalisms of the FEP but at the same time qualitatively it's long been understood in the philosophy of science that models have to ride this edge between being too complex and being accurate can we separate these boxes at the joints for use and truth and so I forget which discussion it was on or who said it but it was maybe an in-8 with Blue or with Alec how when we want transparent machine learning models this idea we're going to look inside the black box potentially might be a little bit misleading because you're going to look inside and there's going to be millions of parameters so just looking inside of the box may not get semantically somebody to where they want to be however could we cut up that box into specific structures of smaller boxes and then we could look at how variables or information is passed between different boxes to be able to have a sense of what is almost like accountability or transparency in machine learning models so just to give one example there let's just say somebody has a data set with movement data and pictures of people and there's a neural network that's predicting whether they're going to be committing a crime and somebody says hey I think this is being used in a non-equitable way I want to have increased transparency I want to be able to see inside of the model and someone goes sure here it is look at all the parameters yourself but again they're not interpretable they're parameters inside of some neural network model so would there be a way where we could separate the model into these specifically related boxes that are doing message passing passing information back and forth in a way where we can see oh yeah you know you told me that socio-economic factors don't play into this part of the model and now I can actually look at that and see yes these variables are getting concluded into a summary variable and that's getting combined with socio-economic status in this way so it's not playing a role at this stage of the model it could be confounded because it's confounded in the real world but by breaking up the black box into smaller boxes that are a little bit more graspable a little bit more comprehensible we actually get a level of understanding that isn't just peering inside of the model because again do we really understand a linear regression? do we really understand addition? I mean you can really go down the rabbit hole do we understand a sine or a cosine? lots of fun stuff happening today alright we got a couple more slides another quote from Mel in the latter half of the paper the organization and dynamics of the living organism the functional architecture of the brain and the structure of human social systems these are the most complex systems known to exist and they're nested inside of each other the systems that study these systems oh sorry the sciences that study these systems are comparatively very young and may never reach the maturity of the sciences oriented towards far more simple systems the life, cognitive, neuro and formal social sciences are still in many respects at the stage of trying to get a methodological foot in the door highly idealized models such as normative or optimality optimality models assist us in getting traction on otherwise intractable phenomena so we saw that a little bit with the scripts paper and the variational approach to scripts because that was a social science that was methodologically getting its foot in the door by borrowing from the legacy of the kinds of approaches we're discussing here loose 1995 introduces four distinctions to the mathematical modeling of cognitive processes so this is about the cognitive psychology, mathematical psychology science modeling literature process models are contrasted to phenomenological models normative models differentiated from descriptive models dynamic models compared to static models and a distinction is drawn between noise and structure so that's another breakdown of models of kinds of scientific ideas not fristens breakdown but looses breakdown and that also helps us understand a few more spectrums that exist in the space of the possible models another quote from the modeling section under a normative model which is again the kind that we are suspecting FEP is it is presumed that reasoning should accord with formal logic induction and beliefs with e.g. Bayesian dictates for inference and credence belief or decision making with the results of optimizing a utility function a descriptive model or phenomenological model in contrast represents the cognitive process of making choices reasoning through sub-problems and drawing inference as it is observed to happen messy, sub-optimal and irrational though it may be what then is the significance of taking the FEP to be a normative model in looses sense Friston and colleagues stress that the FEP as a process theory is not falsifiable I'm not sure if this is intended it will not be possible to articulate a version of the FEP that can be held up against some real-world process in such a way as to undermine or legitimate the model because the FEP is the principle that specific process theories are drawn from the FEP will not directly generate predictions, tests, or hypotheses well, I would wonder what if it brings the group together and then they come up with predictions tests and hypotheses could you say that the idea is making cause? Just a thought another idea on this is the normativity here is presumed to be a logical normativity, in other words the reasoning should accord with formal logic induction and belief but what if the normativity were cybernetic for example, what if the normative claim were not that A can't equal not A so that's sort of one of the claims of let's just say normal Boolean logic but one of the claims is the normative claim is might is right, for example so this describes cybernetic systems which might not be following formal logic or Piano's axioms or any other inductive strategy that's consistent with truth they might be following normatively a power centric approach now if that diverges too strong from what's actually there they're not going to exist in the niche but still that doesn't mean that the normativity has to be derivative from this reality it means that it could be a normativity that arises from natural selection and also Marco thanks for that comment wrote I want to flag this part in my opinion FEP is a unique normative principle that doesn't prematurely assume sound reasoning so that will be great to unpack your understanding there and just a few more quotes in the section that follows the philosophical literature on adjacent sorts of models will be reviewed in the hopes that this will lend a sense of how it is that a highly abstracted, unfalsifiable formal model such as the FEP can have scientific utility ultimately of course whether the FEP turns out to be useful in this way will be an empirical matter in both senses great puns here it's like if it's really true that the FEP is useful then we'll find out because it will be useful and how could it be that a highly abstracted non falsifiable formal model could be useful that's what we're finding out we're learning by doing that's what we're enacting the FEP is thus an umbrella framework out of which predictive processing predictive coding and a version of active inference its process models fall with decreasing abstraction and increasing granularity awesome sentence as a normative model the FEP is intended to aid in the generation of process models and to furnish constraints on viable process models the FEP itself is however not beholden to empirical data its virtues are not in its very similitude very similitude it reminds me of the Silmarillion but maybe there's a better way to say that word it means truthfulness though so this is just really awesome it's kind of like by allowing reward learning to evolve into active inference we saw how preference becomes baked into the model so we don't separate from our values and preferences by focusing on policy actually focusing on policy is what allows us to be clear and define our values and so similarly by going to this highly abstracted unfalsifiable formal models such as the FEP we're able to clarify where scientific utility is going to enter the picture where in specific situations task specific modeling or setting specific modeling and it's going to be developing falsifiable hypotheses but the FEP itself there's no data set that comes along and knocks out the FEP so it's kind of at the level of evolution rather than at the level of these process theories alright just the last couple few I feel like I said that a few times this is the field site where I did my ant research in grad school in Arizona but I like this because we'll see in a little bit Weisberg has likewise laid out a useful taxonomy for scientific models he differentiates concrete mathematical and computational models cutting across this tripartite distinction is a division between what he calls target directed modeling and modeling without a specific target in this latter category of targetless modeling we find generalized modeling minimal modeling hypothetical modeling and targetless modeling a target directed model is one built for the purpose of predicting controlling and explaining the behavior of a specific system under specific conditions for example just what an example we might have data pertaining to the foraging behavior of a particular species of ant across a number of colonies observed in the Amazon rainforest and we might construct a model for the purpose of discovering something about the mythology and ecology of this particular ant species then again we might want to think about foraging behavior much more broadly and abstractly divorced from the minutiae of particular, minute particulars William Blake, populations of particular species in particular ecological settings considering the problems posed by foraging in a patchy habitat as faced by organisms in abstract terms we might perceive a resonance between foraging and economic models that deal with optimizing decisions in scenarios in which we face diminishing returns we might come up with a heavily idealized organism and context generic model of foraging the latter model the latter approach is what Weisberg-Dubs modeling a generalized target so at this field site for over 30 years my PhD advisor professor Debra Gordon was observing exactly the specifics here the minute particulars the very specific behavior of ant colonies at this field site now there was other ants other species at the field site there was the same species in the next valley over but we were really specific about this specific field site these exact colonies we'd have stones next to the colony here's colony 6, 10 and so that's specific at the same time it was through observing the specifics that one is able to abstract and generalize for example Professor Gordon's papers in 2014 on the ecology of collective behavior which draws links between the algorithms from computer science and behavior across ants as well as swarm behavior in other systems so it's like by going into the specifics we can go into the abstraction and we can make it even more clear it's kind of like an electrical charge separation that we're polarizing it because it's clarifying and distilling the hypothesis and the specific ants on the ground as it were from these more general ideas about ants foraging the live chat goes wild with the foraging behavior in ants models of this sort can also serve as an abstract or analogical touchstone or entry point into an unexplored domain so this is really exciting models as exploration this will serve as the basis for more targeted more fine-grained modeling work or empirical investigation later on models in this genre also aid in the process of theory production by allowing exploration of the nature and interrelation of conceptual objects and by inspiring theorists to draw connections and ask questions that would otherwise have gone undrawn or unasked such models offer leverage where other scientific methods stop short systems and dynamics far too complex or too new to be treated under standard approaches so with the ant example you're out there in the field you're watching the ants forage and you start to abstract and generalize foraging or think about general foraging processes but now we're stepping out even a level beyond the idealized process in nature are these abstract entry points into unexplored domains for example the game of life which is a cellular automata game it was like put out there as a toy as a game yes everyone is losing their mind in the chat Mel I hope that you can recover by the time we have this chat next Tuesday the game of life was just like saying hey what would happen if we had a grid with a certain rule set and that ended up going so much in detail people making gliders and touring complete games of life and Stephen Wolfram and all the research downstream of this so these models it's just really nicely placed you have the targeted versus the target list models so the target models ants on the ground in Arizona we're doing linear aggressions specific falsifiable predictions and hypotheses about this ant species on the ground we're using that to abstract more general questions about foraging like hey I wonder if there's a patchy resource then the ant species will probably try to recruit its nest mates to take advantage of a patchy resource that might not be around but if it's a distributed resource that's not patchy then maybe the ants will forage without pheromone trails we're going a level even above that to this idea of like generators for ideas last quote from the paper figuring out what the FEP is and what use it holds for scientists is a worthwhile project in and of itself how we come down on the matter and what route we take in getting there and who we include and how we communicate about it all these things and more important implications for the philosophy of biology philosophy of cognitive science and the philosophy of science in general the dialogue unfolding in the literature on FEP raises important questions about the relationship between science and philosophy of science between theory model and data and about the scientific method and the aims of science even perhaps most especially about what counts as science in the first place taking us back to that first sentence in the introduction that with respect to the demarcation problem Mel is going to defend the claim that the FEP is scientific and just by reading this paper and thinking through some of these topics I feel like I learned a lot about science and about scientific modeling so closing notes this is a project in progress that will include many backgrounds and perspectives so let's have a participatory and an accessible conversation about active inference sounds like it'd be a ton of fun because FEP and active inference is bringing us all to fundamental conversations from the philosophy of science within philosophy of science but also about how science is related to the philosophy of science it's also bringing up debates that are related to other fields of philosophy in activism consciousness computationalism semiotics communication meaning evolution time space all of these fun general topics are being brought up through the work that hopefully we're all participating in so this was an awesome paper and I'm really looking forward to the next two weeks of discussion we can just throw up our general questions what would we enable for ourselves or for our teams if we had a good understanding if we understood what the FEP was if we understood that the as Marco just put in chat the math is not the territory but the philosophy is the landmark if we come to understand that what will we enable what are the unique predictions and implications of different understandings that we could have what are the next steps for free energy principle and active inference as research communities what are the goals of this research what is the goal of a philosopher of science who might be curious about this topic and perhaps most importantly what are you still curious about if you're listening to this video because this is a super fun topic it's so excellent to have appreciative co-authors Mel thanks so much Dan this was fantastic while the paper was fantastic and really a learning catalyst for I know multiple of us so we're just starting off the two-week period of this paper so everyone thanks for participating we're going to provide follow-up forms for the live participant we request feedback suggestions and questions as always and everyone just stay in touch it's been a nice stream and a nice conversation we're going to be having let's just get that calendar up again January 19 and 26 2021 we're going to be talking about this paper at 7 to 9 a.m. pacific Mel is going to be joining us for the stream and so check out this rv.gy slash kvn pysc link and yeah this is going to be an awesome discussion so any background any familiarity you have with the topics you're more than welcome to join our group discussions and I hope to see you all soon so have a good day and peace