 Hello and welcome to the Active Inference Lab. Today is April 14th, 2021, and we're here in guest stream, number 4.1 with Elliot Murphy. So this is gonna be a really interesting talk and Elliot, we appreciate you coming on. So I'll just pass it to you to introduce yourself, introduce the topic and then share some slides. So anyone who's watching live or I guess later in the comments just ask questions and we'll have time during this presentation and at the end to go over some comments. But thanks again, Elliot and looking forward to hearing this. Awesome, yeah. Thank you very much. So I am a postdoc working at UT Health in Houston, Texas. I work on intercranial recordings of epilepsy patients doing a bunch of language tasks trying to explore the neural basis of language. And today I'm gonna present some slides kind of reviewing a recent preprint that is currently in review on Active Inference and the free energy principle. So I'm just gonna share my screen. Yep, looks good. Okay, so yeah, I'm gonna give you my introduction to the topic and then talk a bit about linguistics and then try and wrap up, you know, we'll make this more of an opposite conversation comments, questions, objections, obviously more than welcome for anybody. So yeah, like I said, this is the preprint that's currently in review with Emma Holmes and Karl Friston at UCL. The argument is that natural language syntax complies in some degree with free energy principle. So just outlining some kind of really core, you know, basic principles. I wanna make this conversation a little bit more philosophical than normal. Obviously, this lab is active in this lab. So I wanna kind of introduce some linguistics topics more than the FAP topics because the audience might be less familiar with the kind of syntactic theory. So to begin with, natural language syntax yields what linguists then call an unbounded array of hierarchically structured expressions. Unbounded meaning that they can potentially go on forever. There's no upper bound on the limit of sentence length, right? You can always make any sentence longer by simply adding John said that at the beginning of it and it gets longer. The only thing that stops sentences going on forever is the working memory, the age of the average human being and indeed the age of the universe, you know, sentences can't go on forever. They have to stop at some point. And what the generative component in principle could do that. But that's the interesting part of language, the fact that you can in principle generate an unbounded array of expressions. So I kind of argued in this paper that these are used in the surface of active inference which is in turn in accord with the free language principle. But the general goal is to kind of align certain concerns of linguistics with those of the normative model for organic system behavior associated with the FAP. So I'm gonna be relying on theoretical linguistics with special emphasis on syntax, which is the system that determines the kind of order that words go in, the structure and organizational relations between words and sentences. So a lot of these design principles are kind of general to the biological component of language, not just for specific instantiations of language. And so they're not specific to English or French or Swahili, they're kind of general design principles about how the language system seems to be organized. So like I said, I wanna kind of emphasize more on the linguistics topic more than active inference just for the depth of the exposition. And although the preprint itself has all the kind of relevant details if you're interested in reading more. But yeah, the brief historical background here is that since around the 10th century, about 20, 25 years ago, linguists have basically been developing theories of linguistic computation which invoke economy principles. So basically the idea is that when you pass and instruct a sentence, there are certain computations that enter into that process. It's not just a single kind of chunk that you memorize or produce, you do it in discrete operations. And yet for some reason, there's currently no means of grounding or motivating these ideas through more general non-linguistic domains. So recently proposed principles of economy, such as minimal search or least ever criteria, which I'll explain and give you definitions of soon. I argue that they adhere to the FPP. And if this can be shown, then this permits a greater degree of explanatory power to the FPP with respect to higher language functions and also presents language with a first principle grounding of notions pertaining to computability and so on. So in other words, the idea is this. Natural language syntax is a system, it's an organic system, it's a formal system if it can be described using the language of theory of recursive functions and computability imported from computer science and all the rest of it. But there's a problem and all of these principles of economy and language design in the literature are unsurprisingly kind of language focused, right? They have a very kind of linguistically encoded background. And so there's a bit of paradox because one of the original goals of this program was actually to see, okay, let's see how much of language is actually can arise through the kinds of formal principles that govern the organizing shape of snowflakes or the morphology of lightning bolts and so on. Kind of just general domain, general laws of nature essentially that can give rise in different ways across different domains. And yet the linguistics history is still kind of encoding these in language specific terms rather than relating them to more general principles of brain organization or mental computation. So that's the kind of background. So many historical insights into syntax, I argue are kind of consistent with the FPP which provides a novel perspective under which the principles governing syntax are not limited to language but they actually reflect domain general processes that underpin a variety of cognitive computations. This is also consistent with a strain within theoretical linguistics that explores how syntactic computation may adhere to general principles that may well fall within extra biological natural law in particular considerations of minimal computation. So that set in linguistic theories might be engaging with general properties of organic systems that impact language design. So I think that's kind of a beautiful idea that all the language that we speak, English, French, German, not only are they biologically grounded but actually the rules that kind of govern the what you can say, what you can't say and indeed how your brain computes and parts of sentences. Part of that process, you kind of get for free if you just assume extra biological natural law one of which you can argue is the French principle. So that's the kind of background here. So the stripping influence of the FPP can be detected in language. Not only has been argued recently at the complex level, so Colfriston and his colleagues have published a bunch of papers in the last year or two, mainly in 2020, arguing that active inference can relate to narrative comprehension into personal dialogue like when two people talk, cooperative intentional communication and speech segmentation. But I'm gonna kind of argue that all of these things rely on something much more lower down and that syntax. If you can't syntactically construct a phrase, if you can't put two words together and form a phrase, you're not gonna get very far, right? You at least need to do that in order to engage in intentional operation and narrative and storytelling and speech segmentation. You at least need to be able to form a phrase, right? If you can't do that, you can't do anything to do with language. So I'm kind of arguing that all of the ways that the FPP can be related to all of these things here is arises from all kind of fundamental lower level consideration to do with the way that phrase level computations are executed. Hey, Elliot, have you changed any slides? We can't see any slides changing. Just... Okay. Yeah, I have changed a few slides. Just maybe unshare and reshare. But it stands alone what you've said as well, but it will be also good to have the visual. Let's see. Okay, how about this? So we see your mouse and then just advance like on the left bar, like click on to your next slide. Yep, perfect. Okay, what about now? No. Okay, I see. My bad, my bad. Okay, well, I might have to... Like window in focus? Yeah. This just as a non-linguist was really interesting that you appealed to like kind of bigger laws than linguistics, like complex systems theory or snowflakes and lightnings, like you said. And then just pointing out that you're kind of bringing linguistic level rules or patterns into a bigger scope. Okay, cool. So we see it with Chomsky now. Perfect, perfect. Okay, well, I've only skipped like five slides. So everything that I've just said is... So yeah, I showed the paper, here's the paper. I simply regaged at this, I slice this. All of these things I've already read out. So yeah, if you heard what I said, then that's good. Great. Okay, so can you see slide number eight? Yep, cool. Awesome, okay. Okay, so if the process of constructing hierarchically organized sets of linguistic features into words and phrases and sentences can be shown to adhere to principles of efficient computation. And this process must also operate within certain fundamental constraints on your old dynamics, such as those implied by the FPP, to which the homeostatic brain minimizes the dispersion of intraceptic and epraceptic states. The FPP can also allow us to understand how natural language complies with the constraints imposed on worldly interactions, deriving certain features of language from first principle. So again, at the moment, this is all pretty abstract. I'm gonna give you some much more concrete examples. So I apologize if it's a little bit philosophical at the moment. And so a number of robust findings from theoretical linguistics can be used to support the image of the brain as a constructive organ, assembling and inferring linguistic representations in the sense of surprise minimization and related goals. Those integrity structures are not mind external entities. They're not the kind of thing that our physicists could easily examine, but they're rather actively inferred by the brain. Seemingly with the help of endogenous slow-frequency activity, coordinating cross-courts or gamut atomization of linguistic features according to a bunch of recent models on your linguistics, which I'm gonna get back to at the end. So in other words, we have a general framework for how syntax is implemented at the abstract level, and we also maybe have a kind of a decent understanding of how that's implemented in the brain. That's the kind of the long vision of this presentation. At the end, we're gonna try and wrap it all up by returning to the brain. But for now, I'll give you a more kind of concrete example. So I said that syntax structures have to be interfered. They're a form of inference. They're inference generation. They're not just passive perception. So if you have a structure like, we watched a movie with Jim Carrey, that can mean two things. It can either mean the movie stars Jim Carrey, the Truman Show, so we watched the Truman Show, or you actually watched the movie whilst sat next to Jim Carrey. It depends on how you pass it, right? It depends on which phrases get merged with in which order, basically. So there's a kind of general rough schematic here, but you can kind of find an elaboration of it in the preprint paper that I mentioned. But the basic idea is that there's structural ambiguity that arises from syntax, but therefore the whole process of syntax has to be an inference process. It's not given to us on a spoon. We have to do some homework in order to construct every possible pass. So that poses the problem, how does the brain do that? It is an inferential process. So what are the operations involved in that? So here's kind of a standard, what we'll call three structures in linguistics. There's two different ways of passing the sentence. You can either shoot an elephant who's wearing your pajamas, which is definitely possible, or you can shoot an elephant whilst wearing your pajamas. And it depends how you, in which order you merge the phrases, right? In which hierarchical relationship is exhibited, if that makes sense. So there's kind of a lot of theoretical background here in linguistics that I won't go into because that's a whole different lecture. But the basic idea here is that there's an ambiguity in structure generation. It's not always, language is not, a system of beads on a string, it's a structure. So along with that kind of theme, I'm also assuming a distinction in linguistics between what's called I-language and E-language. And this is a really crucial distinction to get clear. So I wanna make sure that this makes sense to everybody. So an I-language is the actual internal knowledge that an individual human being has in their mind brain. I stands to individual, internal and intentional, intentional with an S meaning, that literally means meaning, generating meanings. On the other hand, what I call the E-language perspective. And that is actually arguably not really a formidable or coherent. It's the idea that language is a kind of external system, a mind external, an extra mental system. Like the English language is somehow out there in the world. And when we learn the English language, we kind of approximate some kind of mind external system. And we all have something in common, i.e. we all approximate to the English language in different ways. But the I-language perspective assumes that when we actually communicate with each other, the reason why we can successfully communicate is not because we share an E-language in common, it's because our I-language is successfully sufficiently overlap to the degree that we can actually communicate to each other. So everybody's I-language is different. It's often said that there's 7,000 languages on the planet. But that's obviously not true. There's actually, however many people there are on the planet, I can't remember if it's like seven billion or something like that, seven and a half. And that's how many languages there are. Every human being has a different language faculty. It's set, the parameters are set differently. We all have different ideal X, different understandings and so on. And that kind of sounds really obvious, right? When you think about it, it's like, obviously that's true. But the implications for the study of language are actually pretty kind of helpful. In other words, when we study linguistic competence, we are studying like a mind internal computational system. We're not studying something outside. So the English language is not something that linguists actually study. We don't know, that's not a coherent concept. It's like the concept of culture or community or whatever. These are not things that physicists could identify. That's not to say that we can't say things about them. We can't abstract. The human mind can't structure theories of e-languages, I guess. You can talk about the English language changing over the decades, which is a kind of definitely coherent way to talk about language and you can study it in that way. But from a naturalistic perspective, from a biological perspective, that's no use. The only way we can study it is based on what an individual mind brain is doing. So to give an example here, if you take a recent film that IGN released a trailer for yesterday, Hitman's Wife's Bodyguard, the defining property of language is said to be this unbounded array of hierarchical and organized expressions via recursion. So recursion is the defining property of natural language syntax, but it turns out to be distributed unevenly across the world's languages. So some people have argued that certain languages don't exhibit an recursion at all, but this actually tells us nothing about the actual biological language. So in other words, speakers of these languages can easily learn Portuguese, right? So the idea is that language can easily and readily be exhibited by most of the world's languages. Again, when I say world's languages, I mean a kind of convenient description of every individual's fixed language faculty. The most Germanic languages only allow a single pronominal and genitive limited to proper nouns. So German doesn't allow recursive embedding of possessives. So in German, you can say John's house, but you can't say John's sister's friend's house. So the above movie when it's released in Germany, I assume it's gonna have to be called something like the bodyguard of Hitman's wife, which it has a slightly different tone to it. So the idea is that there's a fundamental capacity of language to execute recursion and you can find it all over the place. You can find it in phrase embedding, center embedding phrases, the way that we actually can put phrases or you can find it in other ways, like in this example here. So relating this back to the FPP, while the FPP is a variational principle of least action, such as those that describe systems with conserved quantities, a relatively recent program in linguistics has suggested that natural language syntax in here is the principle of least action and minimal search. Modern theoretical linguistics remains comparatively remote from other fields in cognitive science, but certain postulates from this field resonate with the FPP and its the wrapper. So many biological and cognitive principles of efficiency might be special cases of a variational principle of free energy. Assuming this assessment should allow researchers from distinct disciplines to reevaluate their hypotheses and empirical evidence in terms of low, low free energy, which is the kind of goal I have in mind here, but which is also stressed as a recent paper does that the FPP can be used as a methodological heuristic for research, but it's not a theory or everything, it's just a framework. And actually I find this very similar to a framework in linguistics called the minimalist program, which is a program. It's not a coherent, not yet at least, it has a couple of theories, working theories, but it's not a coherent body of doctrine at the moment. It's an ongoing research program with a particular ideology and framework, and that's exactly what the FPP is, right? It's a formal principle which deals, and can contribute to a discrete individual and separate sort of theories depending on what domain you're looking at. So I kind of like the idea, they're both programmatic notions, and you can implement them in different ways. So we'll be working here with the FPP, and the paper in question doesn't work on the FPP so much, it's an applied approach to the FPP. So the FPP has been argued to be more of a kind of conceptual mathematical model for self-organized systems, and as a recent review by Andrews makes clear, there's a number of ways that the FPP has served as an aid to scientific work without constituting like falsifiable decisions about the state of nature. It's a program, it's a way of usefully describing certain drums of nature without necessarily carving them. So while we argue that natural language syntax, so when we argue that natural language syntax complies with the free enterprise rule, I'm not necessarily implying that the FPP, necessarily bears specific direct predictions from those of the behavior. It's rather a way of motivating the construction of novel conceptual arguments by how some property of organic systems might be seen as realizing the goals of the FPP. So as I said, they're repeating the our language, your language distinction, I'm focusing on knowledge which is internal to the mind of the speaker, exploring their apparent competence rather than what they happen to produce. So one example that you can think of it immediately is the existence of syntactic structures that contribute to a unique form of epistemic origin through maximizing model evidence and minimizing surprise and variational free energy. The actual paper in question that I'm citing here kind of goes into more detail, but the basic idea is that the highly restricted set of syntactic projections, what I call the way that you can categorize a phrase and nouns and verbs and components as a phrase and so on. That restricted set, that finite set of ways that you can define a particular phrase as being a verb phrase or a noun phrase, achieves that goal. So you can also assume that languages, language users during comprehension, select the phrase structure that's least surprising from the perspective of the hierarchical Jamton model. Again, this goes back to the, we watched a movie with Jim Carrey example. It's prediction plays a pretty big role here. So our search phrase structure building can be cast as an internal action. It's an actual internal action, an individual does, in the sense of the active inference framework that selects someone competing hypotheses, i.e. syntactic structures. You can parse it this way, you can parse it another way. And also the same goes for lexical semantics on syntactic syntax. In other words, you can interpret an individual way one way or the other. And I'm gonna give some examples later. So pretty much all of language from individual words up to sentences are inference generation. And also a little side point here, writing more directly to existing way of conductive inference and communication. It's, I think it's pretty interesting to note that the recursive combinatorial apparatus of syntax has been argued to facilitate recursive theory of mind, right? So the ability, your ability to know that someone else knows, that she knows, that you know this and so on. And it could be seen therefore as deriving or piggybacking in some way active inference-based properties of pirate order and linguistic communication, which in turn serves to unveil the latent or hidden states that are other people's mental states, which are the kind of standard assumption and theory of mind active inference research. But I think it can be related much more directly to language once we understand that the extended kind of use of theory of mind in language is, it comes directly from this recursive property, right? Okay, so I'm gonna move on to kind of the more concrete examples, everything I've said so far is pretty philosophical and pretty abstract, but I wanna kind of give you some actual concrete examples here, so it makes a bit more sense. So free energy provides additional constraints on what a protectional system can be physically realized as, which is very useful. So take the first three principles in classical recursive function theory, which allow functions to compose. Substitution, primitive recursion and minimization. These are all designed in a way that you might think of as computationally efficient. They reuse the output of earlier computations. So for instance, substitution replaces the argument of function of another function. Primitive recursion defines a new function on the basis of a recursive call to itself, what may happen previously defined function. And minimization produces the output of a function with the smallest number of possible steps. So the notion of minimizing surprise can be used to ground observations from theoretical mistakes pertaining to the grammar's propensity to reduce the search space during syntactic derivations, permit no tampering of objects during syntactic derivations. So it restricts the number of resources able to be combined in a given moment. So in other words, this is very kind of an abstract example, but I'm gonna give you some more concrete one soon. During any particular part of a sense, you don't just search the lexicon and extract in your unit just because you can. You only do so if you need to. If the current number of lexical items that you've already searched the lexicon for and suffice to generate a given interpretation, then you don't need to search for more items that's sufficient, right? So in other words, don't do more search when less search is possible. And also when it's the grammar's propensity to limit the range of representation of resources able to be called upon during any given stage of comprehension. So examining some core principles of occasion, natural language clearly exhibits minimization, right? While binary branching of structures limits redundant computation. The binary branching calls back to that slide I showed a few minutes ago where you have these binary branching pre-structures. So I shot an elephant in my pajamas and that all relates to that back and will be derived through successive implementations of a binary branching operation. That simply puts two things together and gives it a sound identity. So natural language syntax also exhibits discrete units, obviously there's individual words, which leads to a discreteness continuity geography. So syntax is driven by closeness of computation. So objects X and Y form a distinct object, X and Y. It's objects that are also bounded. So there's a fixed list. So there's nouns, verbs, adjectives, adverbs, prepositions, converters, tense objects, little nouns and little verbs, aspectual elements and so on, all these different particular functional and content related notions. Their hierarchical ordering is based on a specific functional sequence as well. And which imposes direct restrictions of commentaries. These objects can be combined into cycles. So recursive embedding, which can be extender to form non-local dependencies. So an example of that is you have like number plurality marking. So you can say the keys to the old working cabinet are on the table, not the keys to the old working cabinet is on the table. Because obviously it is those of keys that they have to mark but they have to agree in the number feature, right? So that's an example where you have a non-local dependency between two elements. So you have to hold it in memory even though they cross phrase boundaries. So syntax still keeps on generating new structures. But you have to associate one element with a new element further down the road. So these properties are in turn guarded by principles of minimal search on the step. Cheryl, show in a minute. Fulfilling the goal of active inference to construct meaningful representations as efficient as possible. And I think that's really what it comes down to. That's the kind of core crucial message here. If it can be shown that the language system constructs meaningful representations as efficiently as possible, then therefore it must be encoded with the FPP. And again, that contributes to surprise minimization amongst other goals. So from the perspective of FPP, the range of possible structures available to comprehenders provide alternative hypotheses that generalize as such preclude overfitting sensory data. So if the complexity of linguistic stimuli can be efficiently mapped to a small series of regular and regularized in the sense of length, syntactic formats, this contributes to the brains or general goal of restricting itself to a limited number of characteristic states. So in other words, only change your belief about things if you have to, right? By mapping syntactic structures to a language external conceptual interfaces, now that's a kind of a key term. Language external interfaces, what that means is you have a narrow component of language, the kind of narrow, what's called the narrow faculty of language, but you can call it whatever you like. It's just the capacity to construct phrase structures. But then you have language external mental modules and memory systems, and what have you, attentional control systems in the mind that are located across different cortical surfaces. It's kind of the standard modular framework, I guess. So by mapping these structures that have been generated by language to these language external interfaces, in a manner adhering to principles of economy, language can be seen as engaging in a series of questions and answers with sensory data and how specific. So other recent work in theoretical semantics assumes that there are frettable concepts that language can interface with, and there are also non-flexible concepts. So in other words, there are systems, there are properties of human thought that the language system seems pretty keen on and that it likes to use and exploit a lot. But there are other modules of cognition that for some reason can't be linguistically encoded as easily. That's kind of a weird property, right? Like, why should we be able to linguistically encode and communicate about certain thoughts from others? So one example is many of the world's languages, again, when I say world's languages, I mean that in quotation marks. What I really mean is many of the individual human beings are out of it. They may use sort of quantification or numerosity, often, you know, typically seen as via this front of right or quantification network, which is interestingly closely linked to major language sites. So that might be one of the reasons why it makes use of it. But they make little use of color. Despite color featuring just a prominently and ordinary experience, right? Our sensorian is filled with colors. And this might be due to the remoteness of occipital visual regions in the brain. Maybe, maybe not. That's one explanation. Maybe it's wrong, but it's one possible reason. So for instance, one might imagine some functional morphine encoding brightness or shade of coloration in language. You can imagine it like if you were to an event in new language, you could encode color features somehow as some kind of inflection of morphine. But that doesn't seem to be the case, even though, like I said, color is a pretty important part of working life. At the same time, language seems to make considerable use of certain contentful concepts, like eventuality, but not others, like worry if you're like, you know, no morphine's mark, how concerned somebody is about something. I'm very concerned, I'm a little bit concerned, I'm not very concerned. Which seems to be making potentially newly efficient use of specific, easily accessible representational resources rather than less easily accessible, cognitive modules, right? To map complex meaning onto natural language expressions. So it's further observed that language acts as an artificial context, which helps constrain what representations are recruited and what impact they have on reasoning and inference. So words themselves are highly flexible and metabolically cheap sources of price throughout the neural hierarchy. This is a really cool idea that I'm gonna expand on in a few minutes of time. So to give you some more examples of, again, writing this back to active inference, take the second blue ball to the left of the large box. That's a pretty simple spatial direction, but it can only be encoded via natural language syntax. You need that kind of hierarchically organized free structure to generate that particular thought. It's a very simple thought, very simple direction, but you can only generate inference about it through using the recursive combinatorial operative language. Similarly, you have structures like the young, happy, eager, either going to Oxford, Cambridge, pleasant math. This involves unbounded, unstructured coordination involving the destruction too. So the destruction is X or Y, which is a highly complex structure to compute. It's a highly complex conceptual structure and your language users can easily and readily infer it. But it also opens the door for a whole new species of inferences to be generated, new thoughts about the world, new possible hidden states, new things that you can think about. One of the examples of that Jerry photo, the philosophy he used to give is, he said, if you take a pen and a piece of paper, you can easily draw a man. You can easily draw a zebra, but it's kind of difficult to draw the thought that there is not a zebra next to you. So if you try and conceptualize the thought, there is not a zebra next to me. How can you draw that? How can you depict the fact that there is no zebra next to me? That's kind of weird. It's kind of difficult to do. I suppose you could draw like a man and then a zebra and a line through the zebra. Well, that's kind of weird. That's still presupposes that there's a zebra next to him that you reject. So language generates a whole new species of a new format for thought, basically. It generates a new format for thought, which seems to be unique, in other words, not readily translatable into other domains like visual representation, so on. So these rapid influences of our properties and states, hidden states, can be generated relatively effortlessly by language. And like I said, no other computational system in human cognition can achieve this. Well, that's the idea anyway. So a number of economy principles have been proposed in theoretical linguistics. These are all very kind of technical syntax related notions. So I'm not gonna explain too many of them, but I will give some examples. And these have all been framed, like as at the beginning, they've all been framed exclusively within a linguistic context, invoking highly domain-specific notions, despite a core part of the intended project of modern theoretical linguistics being to embed linguistic theory within principles general to cognition, right? So for example, the inclusiveness condition maintains that no new elements can be introduced in the course of a particular syntactic derivation. So once you're in a place, once you're parsing a particular sentence and you're deriving a particular representation of that, you're not just gonna randomly introduce a new lexical item or a new hidden element, just for the sake of it. You're only gonna do it if it contributes to immediate interpretation. And so only existing elements can be rearranged restricting available resources. But it's also clear to what extent this computational principle finds analogous examples in non-inquistic domains, right? So one way of motivating these language-specific generalizations by making direct reference to the FPP will not only foster, I think, more fruitful relationships between theories of how to publish in neurobiology, but will also broaden the explanatory scope for the existence and prevalence of a particular syntactic phenomenon. But what's interesting to note is that linguists readily admit the lacking of specific theory of computational official language. Like I said at the beginning, it's kind of a program, right? It's a programmatic notion. So in a recent paper, Galeo and Chomsky point out, to be sure we do not have a general theory of computational efficiency, but we do have some observations that are pretty obvious and should be part of that theory, right? That's basically the statement I feel. Linguists have very well hung theories of language-specific efficient efficiency criteria, but translating that into kind of more domain-general goal areas has not been so successful. But we can at least suppose that whatever definition will be forthcoming will be related to more generic notions of economy like Hamiltonian notions or minimizing energy expenditure during language processing, shortened description length, a minimal description length, reducing combogon complexity and the degree of necessitated belief updating. Again, like I said, it requires one to believe if needed and we're happy. So one of these so-called minimal computational procedures is what's called relativized minimality. So this is the principle that states that given a particular configuration, X, Z and Y, a local relation cannot connect X and Y if Z intervenes and Z fully matches the specification of X and Y in terms of the regular features. I'll give an example of that. So in other words, if X and Y attempt to establish a syntactic relation, if you call back to the long distance dependency thing I mentioned, with some element intervenes and can also provide a superset of X's particular features, i.e. X's features plus some other features, this boxer relation. That sounds extremely abstract, so I do apologize for that, but this is a more concrete example. So in the sentence you have in one, which game provides a subset of the features hosted by how, which results in unacceptable. But the equivalent does not obtain in two and so a relationship between both copies of which game can obtain licensing interpretation and the strike-through denotes the originally merged position after a movement has taken place. So it's originally merged down here and then you move the structure to the beginning of the sentence in order to form a question. So question formation often involves just moving while I'm in the middle of the end of the sentence to the beginning to form a question. So if you say, how do you wonder which game to play? That sounds pretty ungrammatical. Or if you say, which game do you wonder how to play? Which has the same approximate interpretation? That's okay. And the reason why is because how hosts only a queue feature, a question feature. And as it searches down the structure, it encounters which game. Now which game bears a queue feature because of which, but also has a non-feature because of game. And so therefore it can't reach its final destination. The final destination being its originally merged position after play, which only has a queue feature. On the other hand, if you move which game to above do you wonder and you leave how in situ, then which game carries these two features here and it's skipped over how, because how does not satisfy the full featureal specifications of which game, right? So it needs to search further. And then only when it searches back and reaches to its originally merged position, does it interpret it. So what this means is when you say, which game do you wonder how to play, you're interpreting which game at the position of play because you're not interpreting it at the position of wonder. You're not asking which game do you wonder how you wonder. You're asking which game do you wonder how you play? It's all about the question is about the play on it. And so this has been argued to emerge directly from minimal search, allowing this higher level of representational principle to emerge directly from properties of efficient computation. This is one example where you have a kind of rhetorically kind of gross, higher order principle and linguistics being reduced to a lower level and more simple kind of element. So translating that into minimal search, you cannot even consider, as I just said, when you search the structure for matching features in two, the minimal search procedure would simply skip how we'll find the original copy of which game because it's searching across the full structure. So another example of economy can be found in the principle of full interpretation, which simply states that there are no separate symbols allowed at the two linguistic interfaces. So these interfaces are assumed to be the conceptual and central systems. So in other words, the two things you can do with a linguistic structure is you can interpret it or you can externalize it. You can produce it, you can say it, you can sign it, you can kind of like to interpret it and so on. Or you can simply think it. So this ensures that the system need not compute symbols that are ultimately subvert with the goals of either interpretation or externalization. So for instance, in three, this has an argument that does not have a semantic role. So therefore it's unacceptable. So you can imagine the sentence, Walt gave Jesse a gun, that's fine. All these semantic roles are filled. There's agent, there's a patient and there's an instrument being involved, a particular tool. But then if you add to sort, there's an additional kind of location, preposition being marked there, but it doesn't have a semantic role. So it can't be interpreted. And that sounds kind of trivial, right? Like most people, when you give them these examples, they say, well, yeah, obviously, that's kind of not a very deep thing. But actually it's a pretty puzzling like phenomenon. It demands explanation, it has to be explained somehow. So one of the operations that's been involved in this minimal computational procedure is called merge. So the operation merge simply takes X and Y and forms the set XY. It just puts two things together. And this constructs the binary branching structures that I mentioned, but merge itself can also derive some core set theoretic properties of linguistic relations, such as membership, dominate, term of and so on, these different hierarchical relations between nodes and in a tree. You know, one branch of a tree being higher up or more deeply embedded than the other one. As well as other relations called C command, we can put that aside. So in brief, much of the complexities of syntactic relations and can be derived from successive instances of this simple merge computation and reducing complex visibles to simple invisibles. So for instance, if you, the example I gave here with which game, you can imagine the set AB being constructed and then if you would take the same element B and you merge it with the set again, you get BAB and then you can delete the first appearance of B and then you simply get the linear order would be B, A, rather than the actual set being interpreted as BAB. And this seems to be what's happening here where you have a B over here, which game, and then A, the full structure, and then B again, being originally merged. So all of these kind of superficially complex linguistic phenomenon, which honestly has seemed very complicated and very elaborate, they can all be boiled down to a very simple operation, which is just to take two things and put them together. And what's interesting about this is that related phenomenon like a junction do not involve modification of the semantic content of the structure. The adjunct is concatenative. So an adjunct is something like a prepositional object, like in the park or to the beach. So for instance, if you take a structure like John and Mary talk in the park, the fact that they talk in the park doesn't change the fact that they talk, right? So the actual original interpretation, the syntactic structure of the integrity and the meaning of John and Mary talk is not changed by adding an adjunct. So you can add an adjunct in the park, but that doesn't change the fact that John and Mary still talk, right? Again, that sounds like a very trivial property. It's obvious, right? But again, it demands explanation. It has to be a reason for that. And the reason why is it turns out that when you add an adjunct, you simply mechanic. You simply linearly add it to the end. You don't change the actual identity of the phrase itself. There's no, what's called labeling. There's no labeling of all in transitional merging. So you don't change the identity of the phrase. So the FPP has been equated with the principle of least effort and its process theory is at the reference. So strictly speaking, the FPP basically is a computational principle, right? The probabilistic beliefs it's concerned with are directed absently, namely external states of a self-organizing system. And in a similar way that the FPP is a research statistic, so too is much a recent theoretical linguistics guided by programmatic concerns, right? Like I said, it's a program, it's an ongoing kind of ideology. On the other hand, linguists have developed theories of syntactic least effort, like I said, but the process theory is a little bit less clear, right? How it's actually implemented is slightly less clear. But I would argue may become more clear if it can be accommodated within existing frameworks and knowledge that the FPP can bring with it to help solve the puzzle of how languages implemented in the brain. So here's another example for you. Routinely poems that Ryan evaporate. So in this instance, routinely exclusively modifies evaporate. So the way routinely goes with evaporate, that's how they're interpreted. So it cannot modify Ryan, even though this word is closer in terms of linear distance to routinely, right? So Ryan comes one, two, three words after, but evaporate comes one, two, three, four in terms of linear distance. So they're more linearly remote and yet in terms of interpretation, they go together. And the reason is the matrix predicate evaporate is closer in terms of structural distance to routinely. And the reason why is that Ryan is embedded within the phrase headed by poems. So it exists on a different hierarchical plane, if you like, it's kind of lower down in the hierarchy than evaporate. So language computes over structural distance and not linear distance. So sentences are not simply beads on a string, they're not linear objects. They have to be linear objects in terms of centri-modric standardization because we live in the universe we live in. We can't speak in parallel, we can only speak in linearization. Although there is some evidence that sign languages can communicate to some degree in parallel. I do sign one element with one hand and then another with another hand. So there's some evidence that sign language might be able to defy the laws of physics book. It turns out that's probably exaggerated to an extent. It is still a form of linearization but just kind of a co-linearization of different. So language prioritizes the demands of the syntax semantics interface over other systems like morpho-chronology. So while two structures might exhibit different linear orders, they may exhibit the same underlying hierarchical order. So here's a really good example in English and Basque. The verb, direct object dependencies are the opposite for the interpretation it can serve. So John has read the book. You have John, then auxiliary, then the verb, then the object. And in Basque, you have a different order. You have John, then the book, then the verb, and then the auxiliary. And yet they mean the same thing. They have the same interpretation, right? So this suggests that a kind of more fundamental operation is going on here, namely syntax encodes the verb and direct object as an abstract phrase which omits the subject. So in the words, in the syntax, in English and Basque, you have the same underlying syntax, which is subject and then verb and direct object being merged first. So you merge the V and DO first, and then you accommodate the subject. And then when you linearize that, when you communicate it externally, you do it in different ways. In English, you do it one way, in Basque, you do it another way. But the basic idea is the same. You have the same underlying interpretation. That also accounts for something pretty obvious, mainly the fact that you can translate one sentence into another language, right? That's a fairly obvious thing that you can do a language. And so therefore there has to be some kind of commonalities somehow. But the commonalities might be much more deeper down than most people kind of appreciate. So through the various stages of language development as well, interest in children don't typically produce expressions that deviate from general grammatical principles pertaining to the structure dependence of rules, even when they produce so-called mistakes. But there's been a lot of research on child language development, that's lovely. So in other words, when children do make mistakes, they seem to make mistakes which accord with the grammatical rules of their language. We suggest that sensitivity to structure dependence forms a core part of language design. So corpus studies of infant language exposure reveal that there are actually very few backgrounds, let alone trigrams. So statistical procedures can help, but there seems to be some more kind of innate sensitivity to structure dependence, which seems necessary. And as a recent paper also reviews, human lends prefer to induce hypotheses that have a shorter description length and logic with simplicity preferences, possibly being a governing principle of cognitive systems falling according to what the FPP would predict. So simplicity based preferences and get a range of formal language models too, relating to the notion of minimal description length. And you might also invoke principles of minimal redundancies and so on. So this is a kind of a really important idea. Minimal computation and efficiency seems to be a really general cognitive goal in the brain. And there's a couple of recent papers, one of them in text, I think that came out maybe last year, or maybe this year, I can't remember. I think it was called memory as a computational resource, which showed that across a bunch of domains, human memory in its various cases also exhibits an adherence to principles of efficiency. So I think it's not too surprising to when linguists come along and say, language also it is to principles of efficient computation. So all of these ideas seem to be out there at the moment. Everyone's kind of coming to more or less the same conclusions, but just using different language, just using different background assumptions. But the general idea, I think, is kind of they all mesh well together. So as I said, linguistic computation seems to be optimized for the generation of interpretable structures rather than for the generation of maximally communicative messages to specifics. So in other words, whenever there's a conflict between principles of computational efficiency on the one hand, principles of communicative clarity on the other, the former typically wins. Now, this is not to say that when we do communicate with each other, as were by Tristan and Pippus, I'll pass them to that, that it's not done efficiently. When we do communicate, we do do it in an efficient way. But that's a separate question from whether the language system is designed in a way as to maximize that communication. The normal functioning of syntax seems to lead to instances which reduce communicative efficiency and prioritize inference generation. So the goal of the language system is to generate particular inferences and representations about the environment in an efficient way. So here's a pretty clear example of this, right? If you take the sentence, you persuaded Saul to sell his car, the individual and the object can both be questioned, but questioning the more deeply embedded object in terms of the hierarchical structure forces the speaker to produce a more complex second execution, right? So you can say, who did you persuade to sell what? But you can't say, what did you persuade who to sell? Even though they mean the same thing, right? Same words, same interpretation. All it means is, you know, what is the individual and what's the object? That's it. Just tell me who the individual is and what the object is. Well, you can only say it if you construct it in the most computationally efficient way. I search for the first possible element of question, right? If you search for the more deeply embedded object, can't do it. So the structures in 11 involve the same words, same interpretations, yet the more computationally costly process can't be licensed. So this is a pretty good example and there's plenty of examples like this, by the way. I've ever written a paper about it in Glosser. Plenty of examples in which there is a clear conflict between syntactic priorities of just generating a meaningful structure and generating possible structures that would actually aid communicative efficiency and communicative flexibility. That's not a priority of language. So other examples show that the acceptability of sentences can be impacted based on the extent to which the construction makes and it contributes to a novel, non-redundant contribution to one's mental model's beliefs. Again, this is really directly in the core of what active inference will predict rather than those of conspecifics. Again, reinforcing the role of syntactic processing in inference generation rather than communication. So the degraded acceptability in 12B, by the way, the reason why these are numbered 12, 13 is because that's the enumeration in the paper. So the reason why 12B seems degraded relative to 12A seems to stem from the fact that the speakers are unlikely to be ignorant of the relevant content, right? It's a kind of a pragmatic reason. So Kim knows where the soil's in bed, that sounds okay. Well, Kim knows where the iron in bed, sounds kind of weird, even though it's a technically grammatical sentence. It sounds weird because you would never say it. It doesn't contribute meaningfully to revising or contributing to one's mental models or beliefs about the world. So therefore, the language system doesn't like it, which again, brings in closer contact language design with the FEP. And also cases such as 13 real-how, even processes such as contraction are sensitive to hierarchical structure and can't be executed over any random word boundary. So you can say soil's taller than Kim is, but you can't say soil's taller than Kim's. And the reason is because there's an invisible phrase boundary between those two elements. Other examples are in 14 and 15, you can say, what do you want to do? And you can contract and say, what do you want to do? But if you say, who do you want to read the book? You can't contract that to generate. Who do you want to read the book? That sounds a bit weird. But you technically can say it. If you say that to me, I would know what you mean. I know you mean straight away, there's no problem. But it sounds a bit more awkward. Again, the idea is that there's an invisible phrase boundary there that permits, that stops contraction occurring. Efficient computation or at least structure dependence, I should say, is also exhibited in more classical examples in the literature. So if you say, the man is happy, you can question that structure by moving the auxiliary to the front and saying, is the man happy? And so you might be forced to conclude, well, maybe to form a question, you simply search the structure and take the first possible element. But that turns out not to be sufficient. So you can say, the man who is tall is happy, but you can't say, is the man who tall is happy? Because who is tall is again, similar to poems that rhyme routinely, is embedded more deeply in the man headed phrase, that is the question, is happy, which is higher up the hierarchy and easy to search for. So therefore you have to say, is the man who is tall happy? Because it is, is actually closer to the element you're questioning in terms of structural distance than it is in who is tall. So again, syntax cares about structural proximity and not linear proximity. And there are also constraints on this as well. So you can say, John ate chicken and bread for lunch, and you can question the whole phrasal conjunt, chicken and bread, you can say, what did John eat for lunch? But if you want to efficiently question one element, let's say you already knew that John ate chicken, but you're not sure what else he ate. You can't say, what did John eat chicken and for lunch? Which there's no reason why you can't do that, right? Like, why can't you say that? It's a perfectly fine thought. You already know that you ate chicken. You know, someone's just told you you ate chicken and something else. But you can't say, what did John eat chicken and for lunch? Because the whole, the syntax respects the integrity of the phrase chicken and bread. It has a phrasal identity that syntax respects, and you can't just violate the phrase boundary and only interrogate while. Other examples relate across phrase boundaries and contract boundaries, not just within them. So you can say, Sam gave a guitar to me and loaned a trumpet to you. And you can question both elements. You can say, what did Sam give to me and loan to you? But you can't say, what did Sam give to me and loaned a trumpet to you? Even though, again, in terms of communicative efficiency, that's a pretty simple structure to generate. You already know that he loaned a trumpet to you, but you want to figure out what Sam gave to me, right? And again, these relations of hierarchy, you can find them all over the place. So pronoun reference is a pretty good example. You can say, Mary said that he has a lot of talent and that Peter should go far, in which case the pronoun he is being connected with Peter. In which case you have a pronoun he coming before the actual element Peter. But then when you simply take that phrase and question it and state it, it's no longer acceptable. You can't say he has a lot of talent and Peter should go far. That sounds a bit strange. He should refer to someone like John. And the reason why is because when you embed that structure in one, a large structure, it changes the actual identity of the conjure. So the conjure that headed by that is a compromise phrase, whereas the conjures headed by he and Peter are simply tense phrases, TB. And other puzzles exist here as well. So you can say John, you can say John said he is part of his house, in which case he goes with John. But it sounds weird to say in John's house, he organized a meeting when he refers to John. Again, you can kind of parse it, you can force it, but it sounds a bit more awkward. It's more natural for he to refer to Peter if you say in John's house, you know, someone else, Peter organized a meeting. And the reason why is because co-reference via this phenomenal phrase going in his bar, since syntax preserves interpretation across movement. So the original structure that's generated is John said he's in John's house, he organized a meeting, right? That's generated from a more original structure. He organized a meeting in John's house, in which case you have he coming before John in the same kind of tense phrase structure that I mentioned earlier. So in other words, syntax seems to win over linear precedence. Although kind of quick examples exist too. So you can say, I gave her the book that Sarah always wanted. If you say, I gave her the book that Sarah wanted. Again, that sounds slightly strange. Changing the syntax by adding the adverbial element changes the actual content of the phrase itself, which allows more easier co-reference. So stepping back a little bit, this whole framework of like merging and generating hierarchical structures has been argued in the literature to kind of pull you down from a more domain general, lower level computational procedure. So some people have called it the universal generative faculty, which is just the ability to construct hierarchical structures and map them to different interfaces. So the idea is that when we have our system of moral judgment formation, which involves agents, patients, events and so on, that still requires some kind of combinatorial apparatus to generate those judgments. Same with music. It's been known since the 70s that musical structures have a kind of hierarchical relation to them. And then also with numerosity, with numbers, you can imagine that it's been hypothesized by Chomsky that if you restrict this operation and manage to a single element and simply reapply it, you can kind of generate the natural numbers, right? So you can kind of form the anti-set and then manage it with itself and then manage it with that object and so on. And that kind of gives it, you can call it zero and then call the next one one and call the one after that too. That gives you the natural numbers. But the general idea is that you have an underlying generative faculty that can interface with different subsystems. So when it interfaces with the sound system, you get music. When it interfaces with whatever morality is, they remind structures or judgment formation or whatever. You get moral judgments. When it interfaces with the system of quantification and numerosity, you get the natural numbers. And then when it interfaces with the lexicon, whatever that is, even more mysteriously, you get language, which is actually what the other one I'm saying right here. So you get major sound, music, so on. Or interestingly, only language seems to attribute to these major elements an independent identity. So with quantification, music, and morality, you simply involve a generation of a chunk, some kind of chunking that's happening. But with natural language, you seem to get an additional operation. You don't just chunk things, you chunk them and then you give it an additional identity, right? You give it a kind of the sum is greater than the parts, I guess, you can call it that way. So what's interesting is that you can use that merge structure to then call it again. So we can kind of treat it as an independent object, independent of its constituent parts. Whereas you don't really seem to be able to do that in musical or language. Sorry, non-musical domains. So there's also a recent paper by Standard Hain who argues that this class to generate nested tree structures is a human-specific kind of a species property, right? And he gives a bunch of different neuro-reactable instantiations of this. But the basic idea is that only humans have language. Alongside that, you get an interesting, a unique level of tool complexity, possibly due to this linguistic capacity. So for instance, spears are constructed from a rock and a shaft, but a spear is not just a rock plus a shaft, right? A spear is a rock plus a shaft with an additional abstraction, the functional use of it, the utility. I'm gonna come back to that in a couple of minutes, but language seems to be uniquely concerned with functional abstractions, like use, not just form. And then on top of that, it seems that only humans have what Chomsky has loosely called the science of homophobic faculty. So only humans are scientists, not surprising them. And we can do things like piercing abduction or non-deductive complex infant generation. We get some weird event, ear cares, but then we posit that if A is true, and then E would simply follow naturally, right? So we assume A. And then of course, on top of that, we have the other example I mentioned, theory of mind, also by a hierarchical language. I know that you know that, Chomsky. So there's all these kinds of weird, human-specific cognitive traits, all of which can potentially maybe be boiled down to, or connected to some linguistic capacity. So here's a good example that the Hain gives. And these are five different types of sequences that you can generate. Only human being seem capable of generating nested tree structures, as I've argued here. And in the paper, we generate tree structures not just randomly. We don't just do it because we feel like it. We do it efficiently, and in the pit for the explicit purpose of active infant generation. So transition and timing, chunking, or non-knowledge, algebraic patterns, these are all non-human capacities too. Ibnobo's, Makak's, Ferd'song, they all exhibit these forms of things. It's only humans that can do option number five, the nested tree structure business. So again, this is the idea that when you merge car factory, you don't just merge two nans, right? Car and factory, you actually create a noun phrase, a structure that's bigger than the two parts. So in other words, a factory is not a noun phrase, and car is not a noun phrase, you need both of them to create a noun phrase. And then you can use that noun phrase as an independent unit with its own kind of computational identity. So this leads to a kind of a more important question, I think, which is what is language? We're all human beings have language, we all have very strong opinions about what language is. Well, consider the fact that geometry was originally the study of land measurement, right, back in the day, but developed a sufficiently rich body of knowledge to abstract away from its original objective of inquiry, and departed also from common sense intuition. So our common sense intuitions about what language is actually have no place in science. Ditto for common sense notions of mass and energy physics. So MIT Professor Ed Fedorenko recently conducted a mechanical tech survey of study asking ordinary people what they thought language's primary function was. Now, most of them said communication in line with common sense. And she used this data to criticize the idea from a certain part of linguistics, that language isn't basically an instrument of thought, it's primary purposes to contribute to conceptualization. But a physicist obviously wouldn't conduct a mechanical tech survey, randomly asking people what they thought the neutral light is. And a biologist wouldn't concern themselves with people's intuitions about how hard it's on their way. And so natural language syntax, I think should be investigated using the same standards of scientific inquiry as any other object in the organic world, right? There's no reason why people's intuitions of our language should be needed, right? In fact, if that were the case, we'd just, there's no need for linguistic departments. So let's just ask the one to random people on the street. That's all we need to do. So on top of syntactic phrase generation, we can also frame this as contributing to policies used in particular free energy minimizing actions and not just generating linguistic objects. So the rapid and reflexive identification of objects, states and events in the external world through simple linguistic means can yield complex flexible interpretations for some of the most common nonals, is just a fancy word for it now. Aiding in the successful generation of internal models of the environment, using a limited number of resources. So objects in the world have to be identified and they have to be identified now immediately, right? In order to be successful in navigating the world, you have to understand things straight away and rapidly identify things. That's kind of obvious. But at least the same puzzle though. So for instance, complex forms of what's called polysomy. Polysomy just means a word having multiple senses. And polysomy also turns out to be much more widespread than most people think it is. Almost half the way is in the OADL policies. And complex forms of, I think it was 46% or whatever. Complex forms of polysomy generated via multi-way constructions allow for a more precise and exact localization in conceptual space than discrete symbols, signs and gestures, right? With natural language syntax allowing the generation of a more accurate unveiling of hidden states in the world. So natural language syntax allows us to more accurately position ourselves in conceptual state space, right? Again, I gave the example of the second blue box to the left. Well, here's an example for you. You can say the poorly written newspaper that I held this morning has been sued by the government. That's a perfectly fine sentence, but it's referring to three different senses of a simple word like newspaper. So a newspaper can simultaneously be a piece of information. It can be a physical object. It can be an abstract organization. And we can also call upon all of these senses at once. And yet notice that this sentence cannot possibly refer to anything in the world, right? There's nothing, this is not a kind of thing that a physicist could explore. Something that's poorly written, something that you hold, something that can be sued. It's not a coherent entity. And yet language allows this simple, plus and minus word, one single lexical item to generate a very rich range of perspectives to interpret experience, which is exactly what you would expect from the activist framework. And so since there can't possibly be any object in the external world that a complex, blissless word like newspaper can index a one-to-one mapping with, another framework we are developing here, lexical items could partially be seen as hypotheses about the structure of likely co-occurring sensory input, right? Or hypotheses about ontological and myriological relations between objects and states in the world. So in other words, a word is not simply something that has a conceptual meaning. A word does not simply fetch concept. And a word is basically a hypothesis about what the world is. What we can interpret experience to be in any given moment. And we basically test the hypothesis. So we use the word newspaper as a hypothesis about what's going on outside. And maybe it fails. Maybe it succeeds. It depends on the context. It depends on our state of mind. And also it depends on our interests and our concerns. I'll give some more examples in a second just to kind of illustrate that. So our recent paper by Carl Pristin's lab headed by Demika Sital 2020 and the paper in frontiers. They note that from the perspective of active inference, things only exist as a label or hypothesis or inference about hidden states. So the contention that I'm kind of presenting here is that forms of complex meaning derived from natural language semantics form a core component of this labeling mechanism and active inference. So linguists like to talk about lexical items, book, table, walk and so on. These are basically just hypotheses composed from distinct core knowledge systems in the mind. So our sense of geometry, our sense of place, our sense of social relations, which can elucidate environmental regularities essential to active inference. So here's some more examples. And again, a nice little quote from Demika Sital 2020. Let's take the notion of vagueness. So I was once an infant, but I'm no longer an infant, but I'm still me, right? And the boundary between infancy and childhood and childhood and advocacy, there are legal terms for that, but that's kind of just arbitrary choice. The actual concept of infancy is an arbitrary boundary. Some philosophers do actually think that there is a specific nanosecond, which transitions you from infancy to childhood, but I think that's unlikely. I think applying fixed quantified notions like that to intentionally and inherently vague notion like infancy is kind of a paradox. It's meaningless. Infancy is just infancy. It's not meant to be a precise boundary unless you're a legal scholar, in which case that's fine. Defining legal boundaries between things, but that's kind of irrelevant for cognitive science. So consider something like infancy. You also have things like pile. So we say there's a pile of sand and you keep taking bits of sand off. At what point, how many grains of sand are sufficient to make a pile, right? That's called the Serotonous Paradox. The vague notion of pile is great for active inference. It's great for generating rapid inferences and assessments, but it's indirectly interrogated sufficiently. The system becomes exposed. The system's flaws become exposed once you're actually subjected to all that much scrutiny. And then same for things like a book. Imagine you go into a library and there's let's say a thousand physical books, but there's only 800 kind of actual abstract books in the sense that every library has multiple copies of different books. So the library will have 10 copies of the Bible, 10 copies of the Quran and so on. Let's say John goes into the library and he reads every book in the library and then leaves the library and he's fed up and all the books are rubbish. So he just decides to bend it down. In that case, you can say John banned every book in the library or John read and burned every book in the library. In which case he banned more books than he read, right? He actually bans a thousand books, but he only reads 800 books. So the phrase every book does not pick out a fixed quantity. There's nothing in the external world that actually exhibits a one-to-one relation in terms of quantification. It could be 800, it could be a thousand. It depends on our perspective. And that's the crucial thing about even simple words like book, they generate these very rich, polysemous perspectives that you can use to interpret experience but have no necessary component to them. Another example is something like a city. So you can say London bent down and was rebuilt 50 miles of the Thames. London can still be London, even though it's physically completely changed. It's in a different location. All the Londoners are dead and so on. London is a very complex, polysemous sense that you can decompose into organization sense, location sense, population sense and so on, government institution. But the single word London does not refer to anything in the world. So in other words, there's no such thing as London. That's just a kind of convenience. It's a convenient abstraction that we use to interpret experience. But there's nothing in the world that the word London refers to, right? Coherently. You can use the word London to refer, but that's an action. It's an act of human will to actively do that and choose to do that. There's a choice and it's an action. Again, while on the code of active inference, you can choose to voluntarily and willfully refer to one specific component of your representation of a city to refer to something in the external world. But that's a context by context case. The idea that London invariantly refers to a particular structure is just not true. So by permitting a more refined accurate positioning in conceptual space, natural language syntax aids, agents in the formation of novel policies to navigate and make inferences about the environment. So cognition is an ongoing process of dynamic interaction between an organism and its environmental niche. Yet notions like event are also not predefined, external entities, but are actively generated and parcelated by language system. So again, events are not things in the world. Events are things that the mind and the construction. So consider also that simple electrical items like city have properties that go way beyond the semantic complexity of other atomic representations. So you can say the large school with large windows next to the river starts at 9 a.m. and has a strict headmaster and unruly students. So there's nothing in the extra mental world that could possibly be a location, an artifact, an event, a social group, right? Surely not. Absolutely, that's not an ontology. And also using these kinds of sentences surely doesn't commit us to the belief that there are such things in the world. It's a convenient abstraction. It's a fiction, right? It's basically a fiction, a useful fiction that is used in the service of active inference. And it does a very good job of it. It's very successful. And the fact that it's so successful is evidenced by the fact that philosophers have only just begun to really investigate this phenomenon. This phenomenon is called complexity. And it's taken centuries of inquiry to actually realize that some of our most basic common nominals do not have reference to things in the world. They're just convenient fictions. So taking up a sentence, the average man is concerned about wage cuts because he needs to afford insurance. Does language commit us to the belief that the world is made of things like average men and wage cuts and relations of concern, right? That's not surely not something that we're committed to. London can be, as I said, London can be fun and polluted and burned down and rebuilt 10 miles up the river and so the good London. So these nested pre-structures that I mentioned earlier are widely considered to be abstract. Well, even simple words exhibit considerable abstraction. So in other words, the paper I've written with Friston and the Holmes focuses on nested pre-structures. But it's also worth pointing out that even simple words themselves exhibit a considerable degree of abstraction. And perhaps just as relevant here is Bertrand Russell's invitation for us to consider applying physicists who knows all the physics, right? In some kind of hypothetical physics-complete scenario. So what is it that a sighted person knows that the blind physicist doesn't know? If this physicist knows everything. Certain experience of contentually, right? So when it's like to see the color red, that's not part of the blind physicist's knowledge. So therefore physics can only capture the cause or skeleton of the world. We can at least conclude from this that my experience of seeing the color red simply is a property of the world, but one that we can't provide any naturalistic account for. And the reason why I mention that is because I think that may also be the case for words like London and city and book. Abstractions are considerably much more intricate and much more complex than we usually give them credit for. Our minds have managed to achieve an analysis of the concept of number. Right, number theory is very rich, a very serious field in mathematics about number theory, but there isn't really much of an analog in linguistics. So lexical semantics is nowhere near as opposed. Nowhere near as detailed and nowhere near as sophisticated as number theory. Lexical semantics is pretty much the meaning of water is the set of all things that are water. That's basically, if you pick up a semantics textbook, semantics textbooks are pretty much just that. Like the meaning of it is raining just means it's raining, right? That's it, it's just like a re-description. So in other words, linguists are very far from actually having a serious naturalistic account for even some of the most simplest words, which again, might just simply be because of our cognitive limitations, right? We can't actually construct theories for these objects. So one of the ways to exhibit this rich polysomy is by looking to the philosophy literature. So in the philosophy literature, there's something called externalism, which is the position that I've just been critiquing, which is the idea that where it's gonna have a kind of one-to-one reference with things in the external world. And there was a survey conducted not too long ago which showed that the majority of philosophers are externalists, they do believe in fact that, you know, the way water refers to H2O or whatever. So consider this famous thought experiment which I think contributes to our understanding of active inference as I'll show in a few slides. In some parallel universe, it said that water is not made of H2O, but rather some of the substance, right, XYZ. So in the parallel universe, planet F2, the exact same as planet F1, except water is not made of H2O, it's XYZ. So the question is, can the inhabitants of this twin air use water to refer to the substance? So externalists say no, externalists say that the meaning of water can't be applied to the substance. On the other hand, in contrast to the externalists, there's what's called the internist position, which says that the meaning of words is simply a conceptual structure, and that's it. There's nothing in the external world that these things refer to. So the internists obviously say yes, of course they can use it, it's just a concept. So the term water seems to be polysemus between some more kind of common function-based sense and a more concrete technical sense. So you can imagine that, let's say, one of the examples that Noam Chomsky has given is, imagine that there's a tea factory that kind of explodes and some of the tea leaves in the factory get into the local water system. And so what comes out of someone's tap is chemically identical to the cup of tea that they're making in the kitchen. And yet one of the substances is water and one of the substances is tea, even though they're chemically identical, right? They chemically do the same thing and they're one of them's water and one of them is tea. And the reason why is because it doesn't set, one of them satisfies the functional-based criteria and they want to violate it. Indeed, you can imagine another parallel universe. So then Paul Petrosky offers what he calls paternal Earth, where doppelgangers of our scientists discover that when, what they've all been loosely called in mud, in fact has a deep uniform structure. So obviously on planet Earth, off planet Earth, there's no uniform structure to mud, right? Mud can be anything. But it turns out that in this parallel universe, all of their examples of substances of mud actually exhibit a uniform structure, X, Y, Z. And so the argument is that they can use the concept of mud to refer successfully to all physical structures of mud. And that's good for them, right? They could successfully use the word mud to just refer to X, Y, Z. But does it follow from this that the inhabitants of paternal Earth could not then travel to our universe and use the word mud to refer to our chemical diverse samples, right? If they came to a black hole. And I think the answer is no. The extent, this would say yes, right? The extent would say, well, then many good mud simply refers to X, Y, Z. And since we don't have X, Y, Z in our universe, when these people jump in a black hole and come to us, when they talk of mud, they have to be speaking of something else. But that surely violates the actual meaning of the term mud. It's a conceptual representation. It's not a physical structure. So the idea that their natural kind-performing use of mud could not readily be extended to a political sense doesn't really seem to be well supported. And it's just not a good description of what language actually is or what it cares about. Language doesn't really care about the word. It cares about, I want to say, it doesn't care about the external world as it actually is. That's what science tries to do. Science tries to achieve reference to think of the word. The language system just cares about active inference. Just cares about making sense of the word. That's it. Doesn't actually care if water's made of H2O, right? That's not relevant. So we can use simple words like water to access multiple concepts and then use those concepts in the sales of active inference. In fact, Petrosky goes further and he shows that using government statistics, U.S. government statistics, he notes how Diet Coke has a high percentage of H2O than stuff from my world. In fact, I'm drinking a bottle of Dr. Pepper. I'm not sure if actually has that information, but Dr. Pepper has almost definitely a greater content of H2O than stuff from the world in your backyard. And in fact, Diet Sprite includes so there are even more like H2O and yet they're not deemed water for reasons purely to do with intended purposes, right? So I think a cup of tea is like 99.7 or 99.5% H2O and yet it's called a cup of tea, right? It's not water, it's tea. So moving even further away from this, consider that even scattered entities, forget about water, let's talk about scattered entities. Scattered entities can be taken to be a single physical object under some conditions. So imagine a picket fence with breaks or a cold immobile, right? The latter is a thing, a cold immobile is called a thing, whereas a collection of leaves on a tree is not a thing. Unless of course these leaves are placed for like the pepsas of decoration or art installation. And so the reason seems to be that the mobile is created by an active human will. Again, the functional notion is important here. So here's the question, how are these human specific notion of function and intention coded into the lexicon? And how are they coded as part of any gender model under active inference? That's a really tricky question, right? And indeed, going beyond this to Bertrand Russell and Femes claim that objecthood is based on spatial temporal contiguity, but that also seems to be not sufficient. But the four legs of a dog can be seen as a single object under many conceivable contexts, such as if they were cut off, tied together and used as a doorstep, but they could still be understood by its user as part of a dog. So abstract objects do not bear causal relationships and they're also not spatial template located. So an object is usually understood to be a concrete thing, hence the confusion, you know, when someone denied spatial temporal relations. So an object is an object, if we deem it so. And in addition, I think it's important to note that a cycle linguistic lens is needed too. So when philosophers talk about externalism and internalism, we often just talk about language without actually knowing anything about linguistics, which is kind of like a philosopher of physics, like a philosopher of physics, trying to do philosophy of physics without knowing anything about physics. That's kind of strange, right? If you want to do philosophy of linguistics, philosophy of language, you should really know about linguistics. So here's one particular paradox. In the philosophy literature, the following contrast has been called a paradox, like a problem. So you can say Batman fights more mobsters than Bruce Wayne, and what we also know that Bruce Wayne just is Batman, right? So therefore we should be able to say Batman fights more mobsters than Batman, right? But we can't say that, because it sounds weird. And the reason why is due to linguistics. It's not because of philosophy. So then there's a constraint on discourse interpretation in language through which whenever there's two referential expressions in a clause, their default interpreted as non-identical, okay? And this feeds into redundant computation, which again, feeds back to efficient computation. And as such, reference is obviative. So we have two instances of Batman as a problem. So the sentence and B forces us to say it's for different reference, even though we know it's the same reference, right? So this paradox in philosophy of language is not due to mind, will, relations. It's just due to linguistics. It's just a cycle linguistic phenomenon. So it is, it's pretty simple. And again, one of the problems here is that a lot of these quirks of language and interrogation, and they're not immediately obvious. They kind of, language is very good at constructing an illusion that we kind of become susceptible to. We like to think that the things we talk about are really, really existing in the world. But in fact, you know, language use, I like to think it was kind of a fairy tale. You know, using language is kind of more akin to the constructing a fairy tale than it is science. Because we're just constructing concepts and using them in any kind of loops for concern in language game sets. So here's an example, what are called Escher sentences. So there's the famous staircase painting of Escher. The endless staircase would go round and round and round. So the visual system doesn't care about that. The visual system just sees what it sees. If the world, if that turns out to be a physically impossible construct, that's not relevant to the visual system. We just see whatever it sees. And it's the same with language too. So language also has things like analogous things, what are called Escher sentences. So if you say more people have been to Russia than I have, or in Michigan and Minnesota, more people found Mr. Bush's ads negative than they did Mr. Kerry's, that's actually a meaningless sentence. It doesn't mean anything. More people have been to Russia than I have is a meaningless sentence. It kind of sounds like it makes sense, but it's completely meaningless, right? Because you're trying to compare a fixed finite quantity, like 50, to a simple binary yes or no, right? You've ever been to Russia yes or no, and then more people means five or six people, right? So it's syntactically legal, but it's semantically incoherent. And there's plenty of sentences like that where it's kind of, the language system is very good at generating the illusion of meaning, but generating the illusion of structure. When in fact, if you interrogate it, there's no meaning there. And again, this feeds on the idea of A, a fishing computation of our income generation. You know, the language system doesn't want to interrogate too deeply. It just wants to generate inference, right? What is this person trying to say here, okay? On the other hand, it also feeds into the idea of like anti-reference. So this sentence sounds meaningful, but there's nothing in the external world I can refer to. So there's no real comparisons being made here between more people and me being to Russia yes or no. So in conclusion, it seems that any object is much more than its material constitution or its function. We can also use its origin. So Thomas Hobbes talked about rivers. His famous example was, a river can be maybe defined by its origin and they can kind of scare and diverge and go into different paths and it may be converging it. But also a sense of continuity. So John Locke's theory of personhood was that, you know, a person is defined by a sense of continuous identity and not physical constitution. So when a child watches a cartoon of the handsome prince getting kissed, he turns into a frog and he turns into a human again once he's kissed again or whatever. There's a case and then something happens. The child knows that it's the same person, right? The child watching the TV knows that it's the same entity. It's the same person that the prince being turned from a human to a frog. And yet, again, that's got nothing to do with reference. There's nothing that could coherently exist in that sense. So all these representations seem to inspire. And in addition, we also have a kind of a fifth element. So we can kind of call this actual linguistic biases the shaping objective. So that pertains to one example. There are a lot of examples, but one example is default marking of object surfaces. So if you say John painted the house brown, this implies that he painted the external surface brown, not the internal surface. Because we seem to have a sense of objects as being concave objects and single scenes as well. Like scenes events are kind of, you know, we're inside scenes or outside scenes. So we seem to have a kind of visually imposed sense of what our house is. So if John and Mary are both stood five meters from the surface of the house, but Mary's inside the house and John's outside it, John is near the house, but Mary's not near the house. Mary's inside the house, even though they're both equidistant to the actual physical structure of the house. So the house is, again, a functional notion. It's not just a physical object, it's also a functional based interpretation. So in other words, at least these five components, they all contribute to active inference. They all contribute to generating structures, but at least these five components are somehow encoded in language. And I consider that to be one of the biggest mysteries in research, right? How are these things encoded in the lexicon? And how are these networks across the brain interpreted and activated during language comprehension? That's an extremely problematic issue. Because when we, like I said, when we talk about schools being, having strict headmasters and being large and near the river and et cetera, we're using pretty much all of these concepts at once. And we're doing it effortlessly. And yet somehow, how they're actually implemented is kind of a mystery. But there is interestingly precedence in recent active inference literature for these suggestions. So in a recent paper, perhaps I'll argue that, the MVP is most compatible with an instrumentalist theory of mental representations, through which representations are useful fictions for explanatory goals, right? Which is exactly what I've just been saying about linguistics. And this is also compatible with certain models in philosophy of language, the endless perspective, which assume that lexical items have no one-to-one direct reference in the external world, but are basically useful fictions, the composites of distinct representational domains that are used for successful efficient interpretation and ultimately agent survival, right? And it's also compatible with internalist models, Markov blankets, which have been argued to be to form kind of neocantin and help hunching the counts of cognition, whereby the boundaries of cognition are delimited by its school, emphasizing the interactive constructive nature of higher cognition and generating interpretable actionable concepts, right? Again, the crucial concept of actionable concept, it's a useful concept. It's something that you can use in some meaningful way. So the long-term storage of frequently generated lexical characters and the combinatorial rules underlying their creative deployment in language production and comprehension, all of this seems to allow speakers to categorize novel sensory data into a discrete set of object hood and event hood representations. So there are a few events that cannot actually be passed through the simple and unique schemes provided by language, which increases the likelihood of speakers avoiding surprising states. The more efficiently and readily you can pass a particular situation as an event, that leads to the surprise minimization, right? So another recent paper notes that the active inference model of the brain assumes an imperative to find the most accurate explanation for sensory observations that is minimally complex, which has been recruited in Barlow's exploration of minimum redundancy and which seems to accord with how the language system provides the most computationally efficient format for solving the problem of mapping linear sensory input, right? So linear sensors to hierarchical interpretations. So from the perspective of active inference, individuals need to minimize the effort involved in meaning making. So we propose that there is increasing evidence for theoretical linguistics and natural language syntax and that the structure exhibits design principles in keeping with least accurate area. So another recent paper by Kristen, Kristen's group proposes that the goals are speech, there's segmentation, and all sampling data in a way that requires the most paramount degree of belief updating in the course of Barlow's principle. So we basically extended these claims to the main of natural language. Indeed, active inference has only one underlying imperative to minimize generalized energy or uncertainty and much work in cycle linguistics, so things like eye tracking, now tracking people's eye movements during the reading of sentences, during fully gap dependencies, long-distance dependencies and sentences where kind of elements have to relate or agree in features shows that phrase structures are generated predictively in anticipation of uppermost stimuli. In fact, it turns out that even something as simple as adjective noun phrases are constructed predictively. So things like red boat, simple two-word phrases, they involve rich prediction. So we're going to just stop there for a second to evaluate them. This has been an awesome learning experience as a non-linguist. There's a few questions from the chat and there's also just a few other things I wrote down, but a lot of the questions have to do with how things happen in the brain. So maybe it's worthwhile for you to just share however much more you'd like to share and then hang out as long as you'd like to answer some questions and you're always welcome back. So no worries, just whatever's comfortable today, let's talk through and then we'll continue the discussion. Cool, all right, well, yeah. This is the final section just to kind of record all of the other stuff. Only a few slides more, yeah. The question of how all this relates to the brain is absolutely essential. And so I'll try to explain that. So just to begin with that framework, yeah. So far I've kind of just outlined the kind of basic philosophy of how languages implement in a kind of computational theory how it seems to be involved in efficient computation and so on, how it might on a cognitive level contribute to inference generation. But what about actual neural implementation? So that's the kind of the next frontier. So there's the recent paper by Van Rouge and Pazio arguing that what makes a good theory is not just generating testable predictions, it's invoking plausible possible mechanisms, mechanisms that are plausibly realized in nature either in your biology or genetics or physics, right? We're kind of a nice framework. The idea is that theory is not just there just because they can generate true testable predictions. It's to generate things that are plausible. I think that's a very important point. So much of current, the neurobiology of language involves quite reasonably testing some hypotheses and then generating fully available explanations for results. So you'll read the results section which shows like let's say hippocampal theta power increases will automatically go here in sentences maybe just to choose a random example. And so the explanation is just kind of a re-description of the results and description section. So it'll be like, you know, we found hippocampal theta increases in XYZ. So therefore, simultaneously coherent sentences are indexed by hippocampal theta. It's kind of a re-description. But then what we need is really a pre-existing mechanistic understanding of the possible computational properties of hippocampal theta. So what is hippocampal theta in the first place, right? Otherwise, what's the point in looking at it? There's no point in looking at some neural response if you don't understand the computational capacities of that neural response. What is that property? What is that lower level mechanism? And indeed there might be, there usually are multiple mechanisms, multiple possible mechanisms for realizing that particular signature that you get through brain data. Well, if your simple output is to simply do the experiment and then re-describe the results using different rhetoric, then that's not contributing to conceptual progress. So a lot of these ideas are outlined in the recent book of mine, The Isolatory Nature of Language. So I'm kind of just gonna tease them apart, if you're interested in more of the details here, free to contact me and I'll send you a copy. So one can derive some elementary properties of linguistic computation through a direct line of communication from the FPP through to endogenous excitatory synchronicity and linguistic behavior, which kind of comes out of the output. So under the FPP, endogenous oscillations are the type of dynamics, brain dynamics that neurons would expect them to counter since they have genetically encoded beliefs that the cause of excitatory postsynaptic potentials follows the same pattern. So active inference can synthesize various in silico neurophysiological responses via a gradient descent on free energy, such as the mismatch negativity, phase procession, feed to sequences, place selectivity, feed to camera, phase amplitude, and so on. And the reason why the support is with all of these mechanisms have been involved and implicated in language. So moving forward with these incentives, neuronal dynamics and plasticity appear to minimize variational free energy under a simple generative model, which entails prior beliefs that pre-synaptic inputs are generated by an external state with a quasi-periodic orbit. Recent papers show this. So the implication is that ensembles of neurons make inferences about each other while individual neurons minimize their own free energy. So generalized synchrony kind of comes for free. It's an immersion property of free energy minimization. De-synchronization is induced by exogenous input, explaining event-related de-synchronization and structural learning emerges in response to causal structure in exogenous input, explaining the functional segregation of neural clusters. So an external, like neuron external state with a quasi-periodic orbit is assumed to generate the pre-synaptic inputs of a given neuron. And what's interesting for me is that low-frequency phase synchronization emerges directly from this assumption. And also the coupled assumption that neural dynamics maximize variational free energy. So the reason why that's important for language is that one particular implication is that models of syntactic computation grounded in these dynamics. So the model that I have in my book, but also some other recent models from DeRue, Coffield, and Havana-Wethel can be said to comply with foundational principles of the FAP. So for instance, endogenous low-frequency delta phase trapping of syntactic nodes could be seen as emerging as a direct function of generative belief updating in a core good-backed remembrance, supplementing the association of delta oscillations with the cortical computations, responsible for creating hierarchical, linguistic structures. So what that means is whenever you get a particular phrasal node, a phrase boundary, you seem to get some kind of unique low-frequency response. The details can be found in the actual papers, but there's basically some kind of unique low-frequency signature, which seems to code, okay, this is a phrase, here's another phrase, here's another phrase. And it kind of goes beyond the level of abstraction contributed by syllables and words. So it was kind of a phrase-specific signature. And so the active inference framework provides clear predictions about the neurodynamics of language and can help bring together research programs that are presently pursued independently. So exploring the possible neurobiological basis of a core feature of language, recent paper argues that a theta gamma phase after coupling in language, which codes syllable recognition and also predicted coding, theta gamma coupling's been associated with predicted coding for a little while now, can be brought together. So this theta gamma coupling has been applied to syllable parsing, modeling theta gamma coupling for dialogue as well, which appears to form part of belief updating factor of inference, whereby beliefs are simultaneously updated at high, fast levels, but also low, slower levels. Also, theta gamma coupling has been applied to a constrict, well, it's been assumed to be a constraint in working memory. So for instance, the idea is that the number of items you can hold in working memory is based on the number of gamma cycles you can embed in a given theta phase. And it's a kind of trade-off between fidelity. If you wanna have a whole more items in memory, then you can indeed increase the number of gamma items to seven or eight. Gamma cycles will be in the theta cycle to about seven or eight, but they're not low as the actual resolution. And in fact, it's been shown that there's a relationship between a number of items held on a physical causal manipulation of theta gamma coupling in people's skulls using some kind of extra, you know, tags or TMS. And their actual performance in working memory does. So theta gamma doesn't, in fact, seem to be causally implicated in just constructing sets in working memory, which I think is kind of a cool idea. So in this particular model, phrase level inferences generate words contained within the phrase before lower levels reset for the next phrase, right? Manifested as theta phase alignment. So each transition at the high level is accompanied by a resetting at a lower level status. So this is in line with the suggestion that low frequency phase can coordinate the bundling of lexical features indexed by fast gamma cycles within a given structure via forms of cross frequency coupling, namely phase ampute coupling and also phase phase coupling, ensuring serial readout of features alongside the transfer of syntactic identities to language extent systems. So there's the kind of top-down information between different types of phase ampute coupling. This is kind of elaborated in my book. Well, the basic idea is that low-frequency phases coordinate and set the identity of whatever representations are being fetched and linearized and combined and chunked via these faster gamma cycles, cross-cultivate. Cross-cultivate just means whatever part of the brain happens to be responsible for storing the representations that you care about. Again, language is very good at talking about extracting fetching concepts from different domains. So that will yield predictions for which cortical services are being extracted and kind of spoken to by this low-frequency phase coordination operation. So the gamma coupling has also formed part of recent models of scheduling and updating the list of syntactic semantic features being associated with a given chunk of linguistics stimuli with gamma cycles indexing distinct data structures being coordinated by database. Data structures just means linguistic features. So semantic or syntactic features, right? Like first person, number, gender features, we'll have you. So these proposals are potentially analogous from a neurocomputational perspective. So you have the same lower-level generic algorithm, neurocomputational algorithm that's simply fetching discrete domain-specific representations which also kind of feeds back into the Haynes idea I mentioned earlier, where you have these different systems, morality, music, mathematics, language where the computation seems to be analogous but the representations are different. So the same computation operating over different representational domains. So through specifying a process theory that explains neuronal responses during perception and action, neuronal dynamics have previously been cast as a gradient flow on free energy. So that's to say any neural process can be formulated as a minimization of the same quantity used in approximate base inference. So the brain seeks to minimize free energy which is mathematically equivalent to maximizing the relevance. And so this view of neuronal responses can be conceived with respect to Hamilton's principle of least action, right? So all these ideas kind of weave together. And in fact, recently, a deep temporal model for communication was developed based on a simulated conversation between two synthetic subjects showing that certain behavioral and neurophysical correlates of communication arise under variational method passing. In particular, theta-gamma coupling, right? So theta-gamma coupling arose from this particular synthetic dialogue. So the model of syntax I've assumed in the paper assumes that syntax, sorry, in this particular paper by Christian, assumes that syntaxes are sequences or states or words of a terminal node at the end of every sentence. So we kind of phrase around it like the wrap-up effect, right? Like a consolidation period. And with each form of syntactic structure being limited to questions and answers in a game of 20 questions in this particular paper. But the conclusion of this paper of pretty much in keeping with core assumption in linguistics concerning the inherently constructive nature of language. So elementary syntactic units which are hardly robust then considered across speakers in the same language provide specific belief structures that are used to reduce insanity about the world through rapid and reflective categorization of events, states and objects and their relations again in compliance with the FPP. Sentential representations can be thought of as structures designed partially to consolidate and appropriately frame experiences and to contextualize and anticipate future experiences. So the range of possible syntactic structures available to number-henders provides alternative alternate hypotheses that afford parsimonious explanations for sensory data and such they preclude over it. So if the complexity of the linguistic stimuli can be efficiently mapped to a small series of regular syntactic formats, this contributes to the brain's goal of restricting itself to a number of states other than the other. And again, by mapping syntactic structures to conceptual systems in a manner adhering to principles of economy, language can be seen as engaging in a series of questions and answers with sensory data itself, right? But also non-liquistic mental states. And only through natural language can we generate the full complexity of double-h questions, you know, the questions I mentioned before, cross-serial dependencies, long-distance dependencies, where different elements across different structures depend on each other, fully get dependencies and so on, which permit an expansion of what kinds of querying the brain can execute over assembled data. So in other words, only with natural language syntax can the brain execute these particular type of queries and generate responses. So all the ways that language-centered aspects of human cognition can be motivated to conformity to the FPP and active inference, so things like communication and narratives, all of that can actually be further be derived from a more elementary focus on syntactic computational complexity. And in fact, there was a paper published a couple of days ago, again by Kristen's lab, which showed that neural dynamics under active inference are metabolically efficient and suggest that neural representations in biological agents may evolve by the use of neural dynamics. And biological agents may evolve by approximating steepest descent in the information space towards the point of optimal inference. And again, there's a pretty... No, that's not a bad idea to pursue in connection to neural linguistics in terms of the optimal inferences afforded by not just syntactic structures, but again, also lexical semantics so individual ways that I mentioned before. So it's moving to a couple of conclusions from the slides. I've tried to show that natural language syntax renders meaning-making and pie-order inference a computation efficient process. And it seems we make it right for what the cost et al. call as a key question for future research about active inference, which is how biological organisms effectively search large policy spaces when planning into the future. So regardless of whether you have to issue in particular economy conditions, x, y, or z, my motivation's been kind of more general. It's just been to consider how the FMP can in principle provide a novel explanation for the prevalence of efficiency encoded linguistic rules. And indeed other linguists might disagree with me and disagree with the actual framing and maybe do other linguistic frameworks like maybe Ray Jeffendorf's framework of parallel architecture. That might be more appropriate too, depends on your background assumptions. So it's specifically natural language syntax and it's capacity for potential complexity that allows language users to expand the scope of their prediction about their future position in the state space, but in think of more possible future scenarios and plan for the quality. So we've arrived at a number of suggested explanations for the way language implements the construction of hierarchical synthetic objects, namely to minimize the sanity about the causes of sensory data, to unveil a species unique format of external interstates, to adhere to at least that principle. And in fact, in some cases, this involves externalization, but not always. So exploring our proposal empirically may demand a more mature development of the science of competition complexity in the brain. So I basically argue that all of the ways that language syntax is a combination to be motivated for the FPP can simply be grounded through syntax. So in other words, narratives are strong candidates for constructs adhering to active inference, but the generation of syntactic framework is a necessary feature of any narrative, right? You need to at least construct a phrase to generate narrative. So I've reviewed how the FPP that underwrites active inference is an expression of the principle of the facture, which is additionally a principle implemented in the model of syntax. So ultimately, both the FPP and syntactic theory are empirically and conceptually well-supported constructs. And as I've argued, they share an unburdened treatment commonalities. So while the FPP has produced formal simulation supported models of many complex cognitive mechanisms, like action, perception, learning, attention, and also communication. On the other hand, models of syntax have explained grammaticality intuitions, send positive stimulus issues, i.e. how kids acquire language, and the pervasive organizational role that hierarchy has in language. So hierarchies just seems to pretty much organize and determine most all linguistic features. Further, language is not the only domain which exhibits economy, right? Suggesting a deeper grounding of this bias in natural law. Other domains include concept learning, cause of reasoning, central mental learning, and also, as I said before, memory. So importantly, active inference has been used to account for also creative functions that have to do with exploration and novelty. And the reason why that's important is because linguists have also long argued that the hallmark of natural language is actually its creative aspect. The ability to freely construct an unbounded array of hierarchically organized expressions with novel interpretations. So you can say sentences that have never been said before. Linguistic creativity can be framed in terms of adherence to physical, never dynamic conceptivity. If it does so, it's a minimized insanity unavailable in states within an individual model of extent. So in other words, the more efficiently a language user can internally construct meaningful, hierarchically organized structures, the more readily they can use these structures to contribute to the planning and organization of action, consolidation, exogenous and endogenous monitoring, and adaptive environment assembly. I think it's worth recalling Gregor Mendel's application of complex algebra at the Barney. At the time, this was deemed by many people to be kind of weird and almost surreal. But in fact, the same may be true of novel conceptual directions for natural language syntax and semantics, right? Unconventional approaches often soon turn out into the new normal. Only time can tell how far these directions can actually be pursued though. All right, thank you for listening. I hope you're sorrowful. Awesome. You can unshare and then I hope we can ask a few of the questions. Yep. Yeah. Do you want to unshare your screen? Oh, great. Okay. Awesome. All right. Just jump right in with the first question. Okay. Again, as a non-linguist, was like looking up a bunch of words, learning a lot, so really fascinating. The first question was, and I think it was related to when you said that language is not just about communication, despite that being a common conception. So potentially it's about the structure of thought or the structure of thinking. So the question was, it would be great if Elliot could define what is his definition of thought and what is potentially the contribution of the intracranial language research towards answering the hard problem. Wow. So you started with the easy question. Okay. Yeah, that's a really good question. So I like to think of thought as I do any other language. It's kind of like a metaphor. So in some languages, when a human being does a long jump, they only jump. So in English, we just say they jump. In Japanese, they actually allowed flying. So if a long jumper, if a Japanese linguist is at the Olympics and they see someone do high jump or long jump, they could technically say they're flying. But it's just a metaphor. I think it's the same with thought as well. So thought is just a metaphor. I don't think we have any, it's not a well-defined, scientific, natural kind. It's kind of a useful convenience. So that said, I think the way I see it is natural language syntax allows us to fetch particular domain specific representations from all sorts of different bucket domains and then construct them into novel interpretations. The important thing is that all of this is outside language. So the interpretation process is an extra linguistic process. The actual only, the only linguistic specific process is just combining a phrase, just combining items, putting them together and then shipping them off to an external interface for interpretation. And so that's the thought process. I don't have any deep particular reflections on that. I think the best, the best book about this is Paul Potrosky's Conjoining Meanings, I came out in 2018, which is the idea that what language contributes to thought is kind of what I said here today, like a kind of uniquely encoded, kind of functional abstraction. And whereas different cognitive subdomains like the visual system or the olfactory system, I guess, you know, the sense of normal sense or geometric reasoning, these all contribute distinct sub-representations of a given sentence. Well, language seems to uniquely care about function, so abstract function. So I guess I would say that if linguistic thought can be defined at all, it's almost definitely gonna closely approximate human specific interests and concerns and needs and things like that, which is kind of surprising because a lot of people just think of language as kind of, you know, in an unbiased way, communicating with the conceptual structures like thought, like languages kind of thought system. But I don't really see it that way. I kind of see is language does seem to bring with it some unique conceptual contributions. Namely, it seems to encode these human specific functions, which is why I said, you know, if you look at water, water cannot simply be defined as haste law. It's way too simple as it is. That the actual meaning of water is way beyond that. And then with respect to integrating or stuff, I'm not too sure. Integrational research is fantastic at actually examining in real time actual neural responses, right? So getting really down to the degree. It depends on the type of electrode that you have. It's resolution, it's listening radius, how much of cortex you can actually sample. There's also something called the sparse sampling problem in internal research where you have very great coverage of specific cortical loci. But then of course, each patient will have different electrodes in different parts of the brain. There's no patient that has electrodes everywhere. They only have electrodes where their epilepsy is supposed to be targeted. So for research purposes, that obviously brings limitations. So I don't think we'll ever be able to have a coherent global whole brain in, you know, intrapatient understanding. What we do is we sum up, we combine across all these different patients and generate a more kind of average brain response, if that makes sense. So in that sense, it can contribute just the same way any of the MEG or extracranial EG can do. It depends on the paradigm, it depends on the analysis. I think the real crucial point here is conceptual innovation. So what we really need is we have tons of data from natural language experiments. We have loads of data, but we don't have all that much conceptual novelty. So linguists are very good at coming up with very smart paradigms for, you know, very well carefully controlled experiments. But then when it comes to actual novel, conceptual frameworks for how these things actually map onto brain processes, I think we need much more of that. So I think actually integrating a research will not be as useful as conceptual theoretical, you know, changes. I think that's much more important actually. Yeah. One area of utility that you didn't even go into at all would be machine learning and the long-time challenge of natural language parsing in generation. And the recent approach has been throw the big neural network at it with the GPT and just large-scale text modeling. And then that reminded me of what you said about multi-scale models, how we don't overfit the semantics even when we hear a ton of syllables. I really, really, really, really, really am hungry. And then, you know, a computer might just spit out a p-value. It's kind of like oversampling, but we don't want to oversample semantically. So there's gonna be really interesting space for active inference models. Okay, here's the- Absolutely, absolutely. Okay, I'll go to the next question. Can free energy principle and this syntactic theory framework help explain how and why the brain computes inner speech the way that it does and provide the possibility to predict what's about to be computed in the future? So how do we think about externalizing speech like vocalization versus our inner experience of speech? Are those structured? Are those in our voice? Like what is happening? I think that's a really cool question. Well, I think there's also a lot of misconceptions about this. So I actually don't see... So most people think that external speech is kind of crude and it has nothing to do with thought, it's kind of out of there. But internal speech is like some kind of angelic, platonic space, like Plato's cave. But actually, I think that's the opposite. I think they're both the same. So internalized speech is basically internalized, externalized speech. So when you listen to yourself speaking in a monologue, you're doing it in the format of externalized speech. You're not doing it in a different format. It's not as if inner speech has a different structure and a different encoding and then external speech is something different. When I think to myself, when I wake up in the morning, I still think to myself in English. I think to myself in linear sentences. That's how the thought is externalized. So this is why I said about the interface between language and conceptual systems. What's happening in inner speech is that you're externalizing in your own head. So inner speech is basically a form of externalization. Not all forms of externalization involve me literally saying things. You're still externalizing it. You're just externalizing it to yourself. So in other words, actual linguistic thought is pre-conscious. It's subconscious. And that's why we need linguistics departments to kind of figure out what are these subconscious operations that are happening, right? So we have a bunch of subconscious merge operations that are happening. But we don't have direct conscious access to those. And why should we? There's no reason why we should do that. But we do have direct access to the externalized output in our kind of self-generated auditory encoding of that structure. So I can't remember what the question was, sorry. But yeah. That reminds me of thinking through other minds. A recent paper in our active inference community. And then also, yeah. There's just so many interesting aspects about how you really pointed to these domain general attributes of language. And yes, I'm rethinking language and some of the ways that we even communicate on the stream because you'll see people say, I'm just not sure how to say it. And it's like, but are you sure how to think it? Oh, no. Because actually if we think through it and then we have feedback writing and it's like Stigmergy. We're making these meaning marks and then we're reinterpreting that. And so it's gonna be quite interesting there. Here's a third question. Thanks, great talk. Given that the sequences are generated in nested hierarchical structures, where would linear externalization fit in here? Can we say that they're bound by linear externalizations? And then maybe if that's a linguistics term, what does that mean? Yeah, it is. Okay, great. Where do linear externalizations fit into here? That's a really good question. I'm not quite sure what the question is trying to ask. I kind of understand what they're saying. So I guess, okay, okay. So it's typically assumed that linear externalization is like the output of this attack system. And that it turns out upon better analysis that most of the world's languages differences, the way that the world's languages differ, is based on more phonological differences. So in other words, the way that more themes and sounds are produced with the actual semantics and syntax is kind of fairly linear. So in other words, all the universal things about language I kind of care highly on and all the differences in the world's languages I care very late on. So late in the interpretation process. So when you generate a structure that you wanna say, you at least need to have syntax first because that's the most uniform structure. And then once you've got the syntax in place, you then go about filling in all the sound details. But the sound details are kind of irrelevant for language. That's just an annoyance basically, right? So I gave you the example of English and Basque. So English and Basque have the same underlying phrase structure, but they linearize it in different ways. So they just have to linearize it based on whatever arbitrary conventions happen to arisen in those different cultures. But the actual interpretation is exactly the same. So I would say whatever linearization whenever it happens, now it's late stage and it happens as an inconvenience to the languages. Interesting. And sometimes people will point to different languages or a word that appears unique to even a cultural experience, but it's sort of the other side of that coin is the truly 99.9% that's structured functionally like that can be translated. And so it is the exception that proves the rule just like when we violate syntax sometimes to make a point like repeating a word and then in our internal monologue maybe even like singing or alternate characters, it's sort of like, it's a dramatic externalization but it's the exception that proves the rule or it's kind of like the grandmaster chess player who violates a principle, but that is mastery over the syntax and we can't let those exceptions that prove the rule lead us to throw out the baby with the bathwater or however else they say it. But by highlighting language as functional and the structure of thinking rather than like rhetorical only in its deployment was just connecting two nodes and you're really opening it up to think about what's inside of the cranium as well and taking measurements from there but it's what's happening with our thought that's being revealed in a linear string, so. So one of my favorite examples of modern literature is magical realism. And magical realism is basically getting a blender, putting lots of different random concepts in there and blending them all together and seeing what comes up. So a lot of literary, like novelty, literary conventions, novels, poetry is novel precisely because it's kind of experimental. It's saying that's what happens when you put these two random different concepts together and see if they can generate a meaningful interpretation that is either emotionally resonant or conceptually intriguing. And in fact, some of the most important poetry that's been written in the English language has violated rules of English grammar precisely to generate these novel interpretations. It's an intentional violation of some kind of linguistic rule which it's like a signal to the reader that there's a reason for this. It sends a message, it reflects something or whatever, right? But yeah, absolutely, these exceptions to the rule are extremely important to think about. My final question is how might digital discourse and multimedia be influencing structure of language, structure of thought? Like are we synchronizing on the functional aspects with memes or are we diverging in our narratives? So how is that playing out when it's more multimedia visual video than ever? Less of it is spoken in red which is the linearized sort of classical language and now there's unconventional languages. So where does that, how's that gonna work? Yeah, so I mean, I know that the propensity and the prevalence of like a long-term digesting of like, you know, long Sunday times articles is kind of declining. The propensity for bite-sized chunking information is like easy digestible. It involves less critical thinking, judgments, ideological, otherwise can be made rapidly, inferences can be made and so on. That is definitely on the rise. However, there was a linguistics book that came up, I believe around 2010. I can't remember who it was by. It was basically just analyzing the idea that so around the 10th century, obviously mobile phones came about, right? And everyone started texting using all these different, so instead of saying please, you would say you type PLZ. Instead of saying mate, you would say M8 or whatever, all these different abbreviations. And at the time, a lot of people in England were very concerned about this because they said, well, all our kids are gonna be, you know, they're gonna grow up stupid because they use all these slang deviations of English. It turns out that there's almost no evidence that this affects intelligence, like people's ability to, you know, be smart and use language and identify language grammatically because in fact, when you think about it, if you use text language, if you open a WhatsApp chat and you start and you text with somebody and you use all these emojis and you use all these abbreviations for words, it presupposes that you actually know the correct version because in order to violate the rules, it presupposes that you understand them. So flouting the rules of grammar and flouting the rules of spelling actually exhibits comprehension of genuine, you know, rules of English, whatever I call them, right? And that was the example of that little English gate in this book on texting. Again, I can't remember it, but then, so I think that's a pretty good example of when there's panic, there's this moral panic in society over like, you know, text, you know, it's, the answer is I don't know. You know, I have no idea, but then. On that sort of moral panic, I guess, I thought it's like, then at some point, maybe even they forget how to spell it, P-L-E-A-S-E, you know, how to spell please. And then it's kind of like our word roots, you know, oh, I can't believe you don't speak Greek enough to know that the word roots of this, it's like, it's become modularized. So units that were novelties at first and exceptions become reified within a cultural context, shared niche and shared narrative so that like the crying emoji does mean this functionally. And then someone could say, I can't believe that you don't know that it used to mean this in a different language. It's sort of like, it speaks to so many of these excellent themes. So, yeah, yeah. Elliot, awesome guest stream. Thanks for your first appearance. And we're always looking forward to what you might want to share with us in the future. So thanks again. Yeah, totally. Well, yeah, this is an ongoing project. So I'm sure I'll have more to share with you. And thank you very much for having me. Thank you for the conversation. I've really enjoyed it. Thank you for the questions too. So yeah, totally. Thank you, Elliot, and to all audience. So see you later.