 Hello and welcome. This is active guest stream number 39.1. It's March 16th, 2023. We are here with Shuji, Eric Elmosnino and Guillaume Dumas. We're going to have a presentation followed by a discussion section. So thank you all for joining and off to you for the presentation. All right. Thanks, Daniel. Yeah, I'm Eric. I'm a PhD student in Yoshua Bengio's lab. I'm Shuji. I'm a postdoc with Yoshua. Yeah, and we're really excited to be here. Thanks for the invite. We're going to be talking about why we can't describe conscious experiences. So this is going to be our take on a really longstanding problem in the philosophy of mind, but we're going to be looking at it through the lens of computational neuroscience and information theory. So yeah, hopefully it'll be fun for everyone. It makes us a bunch of disciplines. So I think the most salient way to illustrate the problem that we're going to be addressing is to ask you to try and think about how you would describe the experience of seeing the color red. So probably the kind of stuff that's going in your head is things like, well, it's a bright, aggressive color, symbolizes love, that sort of stuff. And in a sense, this is the description. It's effective. I would be able to guess which color you are describing. But in another sense, it's really inadequate, right? If I were blind, for instance, your description would be totally useless. I would be no better in understanding what red looks like. So there's this real sense in which conscious experiences are ineffable such that we can't describe them. And this applies to the concepts like red, but it also applies to experiences more broadly. The experience of having a thought is just so ineffable, so hard to describe it. And really importantly, this doesn't happen with most of our knowledge. I can describe most of what I know except for experiences. They have this special place. So it's a big topic in the philosophy of mind because it relates a lot to this thing called the hard problem of consciousness. So basically all the hard problem is it's the problem of how and why physical processes give rise to conscious experiences in the first place. So this is probably the oldest and most debated problem in the philosophy of mind. And it's even led many to the conclusion that actually we're going to give up and consciousness can't be explained with physical theories at all. That it can't simply emerge from neuroscience, computation or known physical laws. So that's a bold statement and I want to get into all the details about what makes the hard problem so salient. But just here's a few things. So first of all, we can logically conceive of what are called philosophical zombies. Philosophical zombies are very different from Hollywood zombies. They're basically beings that are physically identical to us. They behave exactly like us. In fact, they even have the exact same neural activity that we do, but they just aren't conscious. So similarly, instead of these alternate agents being unconscious, we can also imagine them as just experiencing different things when they're in the same states. So for instance, it's conceivable at least that in an alternate universe, the same exact brain mechanisms that produce an experience of red in this world would instead produce an experience of green in the other and that all experiences would kind of be like flipped in this way. So intuitively, we think that these two thought experiments aren't actually possible in our universe, but even just their logical conceivability suggests some kind of explanatory gap where consciousness doesn't seem to neatly emerge from or be determined by physical state or function. Another really prominent thought experiment is the knowledge argument. And this is the main sort of problem that our work is going to address. We're not going to address all facets of the hard problem of consciousness, but this knowledge argument is one of the biggest arguments against physicalism. So how the knowledge argument goes is that it's another thought experiment. It asks us to imagine Mary, who has grown up in a black and white room her whole life, and despite this poverty and stimulus, she actually knows a lot. In fact, we're going to assume that she knows all the physical facts about how color perception works, so all the relevant physics, all the relevant neuroscience, etc. And now we're going to ask, well, does she learn something new when she steps out of the room for the first time and actually experiences color? And if we say that she does learn something new, which I think is the intuitive answer, there seems to be a problem. We assumed that she knew all the relevant physical facts about color perception, so doesn't this mean that what she learns had to be something that was non-physical? And if there was nothing more to color perception than physically embodied information and neural activity, then the experience is something describable that she presumably could have read about, understood, and would have already known. So it feels like this thought experiment really pushes us against physicalism towards this idea that maybe conscious experience has something on top of just information content. Now, of course, we don't want to reject physicalism. It's been very fruitful for us in the history of science. So we really do want to say that experience just consists of information that can be described. So there has to be something wrong in this knowledge argument here. And what we're going to do is we're going to bite the bullet and acknowledge that experiences actually are ineffable. So maybe it is the case that experiences can be described in principle since they consist of nothing more than information. But maybe they can't be described in practice, or at least they can't be fully described in practice using something like language. So then the challenge becomes explaining why experiences are ineffable under a physicalist framework. And that's what our work is really going to be about. Okay, so the structure of the talk is going to be like this. I'm going to summarize briefly why characterizing richness and ineffability is an interesting and important question. Then Eric is going to talk about how there's a natural correspondence between ineffability and information loss, which is going to let us link biologically plausible attractor models of working memory to ineffability, essentially because information loss is inherent in attractor dynamics. Then I will talk about if you consider the ineffability of conscious experience to verbal reports as just a special case of information loss between two specific points in the communication pipeline, then you can actually generalize the notion of ineffability more broadly by considering, for example, loss from sensory processing to conscious experience and even interpersonal information loss. We're going to prove using Kogomarov mutual information that your conscious experience being ineffable to another person implies high cognitive dissimilarity between the two of you under our model. And finally, we'll talk about whether an exact definition of conscious experience is required, at least for characterizing the nature of ineffability and richness, and also some open questions. So yeah, why is richness and ineffability an important question? There are sort of several reasons. So first, as Eric said, understanding consciousness is not just of general interest. It's a longstanding central problem in philosophy of mind because it's not easy to reason about, which has led some dualists to believe that consciousness must be at least partly non-physical in nature because they can't see how it could be explained by physical processes. But it's also a topic that is of great interest in machine learning because if you assume that consciousness is essential to human cognition, then it follows that we won't be able to engineer machines that think like humans without incorporating consciousness. And so we're actually going to argue in this talk that the confusing aspects of human consciousness such as its ineffability can be understood with information theory, which, as you all know, is born from computer science. We're going to use some primitives from information theory to shed light on why consciousness is ineffable and hopefully to dispel some of the mystery surrounding it. All right, so one of the main primitives over here that we're going to use is to talk about attractor dynamics in the brain and in conscious experience and how that results in ineffability. So hopefully this sort of perspective is a familiar to most people, but we could talk about neural dynamics by saying the brain has a state at any particular time and we could denote this state using a vector. So for instance, in this plot over here, maybe we could denote each axis in the state each dimension using an individual neuron and maybe we denote the activity of each neuron using its average firing rate. So any states at a particular time would be the firing rate of all the neurons across the brain. Now there's nothing special about neurons or firing rates. If you think that things like the synapse strengths or the dendritic potentials or the astrocytes or whatever else is also important to the representation, you could include those in the state. They're just additional axes. The details aren't so important as long as we have some state that describes what the brain is doing. At any particular time. Now of course the brain is a dynamical system. These states are changing. So you can think of craving out trajectories through states based as a function of time and the way that the state evolves will be determined just by the recurrent connections within the brain as well as inputs coming from elsewhere. Maybe we're just talking about the dynamics within a given brain region. The inputs would be, you know, the other brain regions talking to it as well as any sensory stimulus coming in. Right, so this dynamical system's perspective of the brain it's been really instrumental in understanding the computations it performs and one of the primary methods for characterizing these computations is to look at what are called the state attractors of the system. And for us as well, state attractors are going to be a really important part of our ineffability framework in a bit. So basically what an attractor state is, is as a state where once the system reaches it it's going to remain there at least until inputs come in and change the dynamics or noise not just the state out of this attractor. So you can see this thing clearly within the figure. So the X and Y axis over here are like the state of the brain again maybe something like the average firing rates of the neurons. These arrows are illustrating the dynamics of the brain so how the state will evolve given the brain's connectivity given the synapses. And you can see that there's some regions where all the arrows all the dynamics kind of contract to a single point. So that's called a fixed point if it's one-dimensional it's a kind of state attractor and because all the arrows point towards it you know once you finally reach that state you're not going to move out again until the dynamics change because inputs came in or until noise nudges you out. And you can see that each state attractor also is associated with this sort of attractive region an attractor basin it's called where once a trajectory is somewhere in that region it will eventually converge to the attractor again in the absence of noise. So these attractors are really common in the brain and also in artificial recurrent neural networks trained on tasks and a big computational benefit is that they provide a form of transient memory once you're in the attractor you remain there that's where the memory comes in. And because of this and other benefits they're implicated in a wide variety of neural processes like working memory, long-term memory, decision-making and higher level cognition and we'll argue consciousness in general. So they're all over the place in the brain. An additional aspect of them that's interesting and it's going to be relevant for our talk is that you can think of attractors as having a sort of dual, discrete and continuous nature. So what I mean by that is that the discrete aspect of an attractor is given by the fact that there's a finite number of them and that attractors are mutually exclusive right you're in one or you're in another. So this means that we could use discrete symbols to denote which attractor the system is in right there's a finite number of them you're either in one or another and this basically lets us identify or label the states using a discrete variable. There's also a notion though in which an attractor is just a continuous high-dimensional vector in this really high-dimensional state space of the brain. Every attractor basically is just a vector in state space and that means that they're not arbitrarily discrete states they have continuous representations. We can say for instance that one attractor is more or less similar to another one depending on how close they are in state space and that gives a representation to the attractors. That's the high-dimensional sort of continuous part. So just to recap the discrete part is you could identify which attractor you're in in the finite set of all possible attractors but you could also talk about an attractor through this like continuous high-dimensional vector that describes where the attractor is in the high-dimensional state space of the brain. To wrap your head around this there's a good analogy to word embeddings in NLP. So in NLP each word has a discrete ID that's like the symbolic discrete part but also each word is associated to a high-dimensional vector that gives it a representation sort of the meaning of the word where we could say that words are more or less similar to each other. So we're going to use attractor dynamics quite a bit to explain a significant source of ineffability in conscious experience later but for that to be relevant I first need to spend a bit of time to convince you that neural dynamics that produce conscious experience actually do have attractors and for this there's already a lot of work in the literature connecting attractors to conscious experience so this part is not something new from our work. So for instance one of the leading theories of consciousness which is the global workspace theory actually implicitly implies that attractor dynamics are kind of fundamental. So what global workspace theory says is basically like across the brain in all the different regions you have sort of sub-process workers basically and these different distributed mechanisms they can't speak to each other very easily they mainly communicate through this like global workspace just kind of like a bottleneck think of it like a blackboard that all the different processes across the brain could write to you could also think of it as like the ram of the brain but basically it's collecting information from the different workers and one key aspect about the global workspace that's posited in the theory is that for content in the workspace to then be broadcast to the system processes for instance for us to be able to talk about what's in the workspace for it to be written to speech regions the content of the workspace has to be amplified and sustained for a certain amount of time it has to be stable basically and this is clearly where the attractors come in attractors by definition are the states that are stable the states that are sustained so basically what global workspace theory says is that the context of the global workspace that can be used by the rest of the brain are the attractor states and the relevance to consciousness in this theory is that it posits that the contents of this global workspace are the things that we're aware of the things that we're able to report the things basically that we're conscious of and then just kind of like terminologically another term for this workspace is called working memory the idea there is it's a sort of like really short term memory in the sense that you could quickly you know act on the things that are in this global workspace or report on them they're accessible basically okay so that's one very powerful argument for why attractors are relevant to consciousness there's a bunch of other related arguments for instance just introspectively you could think of you know what experiences like is you know a sequence going from one thought to the next to the next we kind of have this like a sequence of discrete events and that clearly looks like a dynamical system kind of jumping between attractor states and then experimentally there's also been work showing that in psychology experiments when you sort of present a stimulus to a subject and you vary the amount of time that the stimulus is presented to kind of bring it below or above the conscious threshold well in the conditions where subjects report being aware of the stimulus the main thing that's different is that the representation is really stable and robust and wise and those are also clearly properties of attractors okay so again like we'll use those attractors to identify a source of ineffability in a little bit but first I need to also introduce how we're going to formalize the very notion of ineffability and for this we're going to use information theory part Shannon information theory I'll do that now and then she will talk a bit about Komogorov complexity as another formalism but hopefully most people are familiar with this but one thing that we're going to be using is the entropy of a state so for instance if you have the entropy of some random variable X to make things concrete maybe that random variable is conscious experience so the entropy of conscious experiences that value would be high if the distribution over conscious experiences was kind of diffused and flat so there's equal probability for all possible experiences and entropy would be low if the distribution over conscious experiences was peaky so that's kind of like the green distribution in this image would have low entropy and the red distribution that's sort of flatter and more uncertain would have higher entropy and basically what entropy is going to quantify in our framework is the richness of a variable in particular the richness of conscious experiences why this is a good measure of richness is because if conscious experiences could kind of take on many possible values with equal probability they're very kind of diverse in that sense well then it's something that's rich some other notions that are going to be important are mutual information so here that's denoted by I of actin Y and what that denotes is how much shared information there is between two random variables so we have to make things concrete X we could take to be a conscious experience whereas Y maybe that's a description of the experience maybe a verbal description so in this case the mutual information will be really high if the verbal description uniquely determines the experience so basically another way of saying that is knowing the verbal description we have no more uncertainty about what experiences going on to perfectly determine each other so that would maximize the mutual information and in contrast if the message told you nothing about the conscious experience and you were just as unsure as when you didn't have the message well then the mutual information would be zero this is useful because we can use it to define a notion of ineffability so this conditional entropy that we've written H of X given Y is equal basically to the richness of X so the entropy of conscious experiences minus the mutual information between conscious experiences and other variables like their descriptions that's the mutual information what this captures basically is how much information is lost when going from an experience X to a description Y if all the information was lost if Y doesn't tell you anything about X the mutual information is zero and H of X well the conditional entropy will be equal to the richness of X if Y perfectly determined X well then H of X and the mutual information cancel and the conditional entropy would be zero so what conditional entropy describes basically is how much information is lost when moving from X so like a conscious experience to Y a description of it and that's a perfect description of what ineffability is intuitively right it's how much information is lost when I try and describe something like an experience okay so I think we have most of the background now of the way and now we're ready to describe one main source of ineffability that arises due to attractor dynamics so very quickly to describe some variables over here that we're going to be using throughout the presentation if you look at that top rate diagram we're going to have X which is basically the trajectory of neural activity that's relevant for a conscious experience so for instance in global workspace X would be the trajectory of the working memory state we're going to assume that this trajectory of neural activity produces a conscious experience we're going to be calling that conscious experience but also the trajectory importantly is going to follow attractor dynamics so they converge to states such as X so how does this result in ineffability well basically the idea is that whenever you have attractor dynamics there's a many to one mapping from trajectories to attractors and this induces information loss just knowing the attractor we're not able to go the other way around and infer what the original trajectory was so that's this conditional entropy this ineffability of X given A is going to be high basically alright why does this matter who cares if there's information lost between the trajectory of working memory and the attractors that it converges to well important thing here again is that global workspace postulates that for working memory contents to be accessible across the brain it needs to be amplified and maintained over a sufficient duration thought to be around 100 milliseconds so this means that working memory is going through these trajectories X but only A only the attractor can be reported and only the attractor could broadly affect behavior only it can be written to memory only it can be broadcast to these other processes so that means that the contents of working memory kind of these transient contents in the trajectory X are inherently misleading right there's a rich information that was encoded in the moments in the moment of the trajectory but the cat later be recalled or reported because only the attractors can be recalled or reported and kind of if we zoom out and go back to our philosophical notions of ineffability this also explains why it's difficult to sort of catch yourself in a thought right working memory encodes these rich and subtle thoughts through the trajectory but we can never quite pinpoints or report in words what those thoughts consisted of were they verbal were they pre-verbal they consist of visual imagery it's really hard to say exactly and the reasoning would be that well there's just a lot of information that's lost going from these trajectories to the attractors that again we could report on or that we could recall or they can just broadly affect behavior and so that's one huge source of ineffability arising from attractor dynamics in working memory now these attractor dynamics also give a lower bound for ineffability during verbal report right so now at the top right I've added another variable M that's basically some verbal message that you can be using to describe your experience and M is a function of the attractors according to global workspace only the attractors can be reported on or used across the brain so this means that basically the ineffability of an experience given a verbal message has to be at least as great as the ineffability given an attractor because M is a function of the attractor and you can't gain information when applying a function you can only destroy it but we're actually going to argue that the ineffability is not just lower bounded by the ineffability of the attractor it's actually in practice quite a bit greater than that and the reason is that M this message is a discrete variable right it's language whereas A the attractor is this really high dimensional snapshot of some cortical state so because of the asymmetry between the richness of these variables there has to be information loss and intuitively what's going on here is that you can think of there as being a many to one mapping from attractors to messages as well so for instance if I say that I saw fat cat right that paints a picture but it's also leaving out significant details about the original attractor which presumably encoded things that you were aware of like the cat's color, size its pose, the background all these things that you were aware of and that were encoded in the attractor but you can't really put into a message without it being prohibitively large so again the idea here is that the ineffability of experiences given messages higher than given attractors because the message divides the space of attractors more coarsely it's a simpler variable it adds additional information loss so one problem is if language is so course and simple why can it describe experiences at all and the solution is that while attractors and the message both share this discrete part right so the message because language is discrete can be used to identify compositional attractors there's this comparable richness basically between the message and the discrete part of the attractor which means that in practice the richness of the experience H of S is typically much greater than the ineffability of the experience given the message so that explains why we still can communicate somewhat with language so it's this really low dimensional simple thing so to recap quickly what we have over here is the argument says that attractor dynamics are empirically ubiquitous in neural activity across brain regions they've been proposed as computational model for working memory and prominent models contends that conscious experience is a projection of working memory states and one of our key contributions is to connect these theories to argue that attractor models working memory are an account for the ineffability of conscious experience because attractor dynamics induce significant information loss that's the general argument there's one quick thing to add over here so I've been talking about working memory a lot less about conscious experience so far so let's link the two what's going on over here is that there's many ways to link conscious experience to X to working memory basically and there's kind of two main options so one is we could say that the instantaneous state of working memory is always conscious so all the states in between attractors are also consciously experienced and how attractors would result in ineffability here is that again what you're able to recall what you're able to report is just the attractor not the actual experiences in between attractors that you were having but there's another possibility which is to say well maybe only the attractors themselves are experienced however importantly ineffability exists regardless in this case if only A is conscious A can still encode the fact that there's information loss during processing in working memory during these trajectories information loss is something that is computable it could be encoded along some dimensions in the attractor for instance which explains how we could be consciously aware that there's information loss during working memory processing so now I'm going to talk about generalizing ineffability once you view ineffability as information loss from a source variable to a destination variable then there are myriad of different pathways that you can characterize which we're going to split into two groups so pathways can find within an individual or intrapersonal communication and pathways that extend between individuals or interpersonal communication so in the intra case you've already seen conscious experience, the working memory trajectory the attractor state and the message but you can also consider D which is the input data to the system and V which we used to denote the state of processes considered outside the delimitation of working memory and you can also consider the cognitive parameters of the individual being communicated to so we're going to work with Alice and Bob and we're going to assume that Bob has a brain which is structurally identical to Alice's but has different parameters Phi tilde instead of Phi and we're going to denote Bob's cognitive state using the tilde so empirical evidence from neuroscience suggests that the brain is hierarchical in nature and there are many levels of organization many instances of attractor dynamics across organizational levels and cortical regions so one example is the inferior temporal cortex is a sensory processing area that responds discriminatively to novel sensory stimuli whereas the prefrontal cortex or PFC appears to be originated in maintaining the attention modulated representations of working memory and to a first approximation in this simple model it's easy to obtain the result that richness of one process constrains the richness of another so here we have that richness of working memory trajectories HX is upper bounded by the richness of its inputs and subsequently we also have that the richness of conscious experience HS is upper bounded by the richness of conscious experience and the richness of sub-process states now we're going to zoom out and consider the interpersonal communication case Shannon entropy has some drawbacks when it comes to characterizing interpersonal ineffability a major one is that Shannon entropy assumes you have access to phi Alice's cognitive parameters so note that H phi S given M for example is the description complexity of S given not just A but also phi and since we can't assume in the interpersonal case that Bob has access to Alice's brain parameters we're going to rely on the Kogomarov framework for information theory if you're unfamiliar with the Kogomarov complexity the first thing to note is that Kogomarov complexity or KX is defined on instances X hence the lower case rather than variables which are noted by capitals KX is roughly defined as the length in bits LZ of the shortest binary program Z that prints X and holds so there's no explicit dependency on the probability distribution over X unlike in Shannon information but we can introduce PX by taking an expectation over the Kogomarov complexity of states to obtain the Kogomarov version of Shannon entropy and likewise for ineffability we characterized it before using conditional entropy in the Kogomarov framework expected K of X given Y is sort of the analog to Shannon conditional entropy and that represents the length of the shortest program that prints X given that you know Y or that Y is accessible to your program and it's approximately equivalent up to logarithmic terms to KX minus IX colon Y where IX colon Y is Kogomarov mutual information which can be interpreted as the difference in program lengths for printing X depending on if you know Y or not roughly speaking there are a lot of similarities between Shannon information and Kogomarov information for example mutual information in both cases is maximized X is equal to Y in the Shannon case because that means Y uniquely determines X and in the Kogomarov case because it means that given Y you don't require any extra bits to print X in fact if you assume knowledge of the probability distribution P then expected Kogomarov complexity and Shannon entropy are equivalent up to some constant factor because it turns out that the most efficient way to describe a state X on average given that you know the distribution is to encode X with a description of length minus log PX bits but if you don't know the distribution which we want to make use of in the interpersonal case because the parameters of the speaker's brain are not given then the higher the descriptive complexity of that unknown distribution then the higher the ceiling on the difference between expected Kogomarov complexity and Shannon entropy and in our case we'd expect that to apply because the space of cognitive states is enormous the probability distribution over those states is very complex and the states themselves are in general complex to reconstruct being very high dimensional vectors the second advantage of Kogomarov complexity in addition to allowing us to explicitly avoid conditioning on the probability distribution is that Shannon entropy is a measure of statistical determinability of states as opposed to difference in unique states so in Shannon information entropy is fully determined by the probability distribution on states and it's unrelated to the meaning or structure or content of the states whereas Kogomarov complexity is concerned with the difficulty of reconstructing states i.e. absolute difference which corresponds more closely to the lay definition of ineffability here are a number of results that we include in the paper the first one is quite simple you can't increase Kogomarov complexity by conditioning on more states because the program can simply choose to ignore the input if it doesn't help to shorten the length of the program so we have trivially that K as given M the complexity of conscious experience given the message is at least as great as K as given M and P5 but you would expect this gap to be quite significant because of the complexity of P5 so because in our case and generally in non toy cases the probability distribution itself provides a lot of information so it significantly reduces the descriptive complexity of S if you're able to condition on it this first result illustrates essentially the difference in ineffability from the tabular Raza case on the left where you don't condition on Alice's brain parameters Phi to the case where you can assume access to Alice's brain parameters which is analogous to the Shannon entropy characterization that Eric was talking about but the quantity that we're more interested in is this expression in the green box so K as given M and P tilde Phi which is the ineffability expected ineffability well in expectation of Alice's experience S given the message and Bob's brain parameters tilde Phi you can show that this quantity is upper bounded by an expression that scales with K P given tilde Phi which is a measure of the cognitive dissimilarity or the descriptive complexity of Alice's brain given Bob's brain so in other words if your experiences are ineffable to someone i.e. the left hand term in the green box is is high that implies that your brains are highly behaviorally dissimilar under our model so we can use this result to provide an account for what Mary learned when she stepped out of her black and white room and essentially what it's saying is that neurotypical Alice's experience of color is ineffable to Mary which implies that they are cognitively dissimilar and cognitive dissimilarity is not the same as knowledge inadequacy because knowing how the brain should respond to a particular stimulus is not the same as being able to execute that response when you're actually exposed to the stimulus so the cognitive dissimilarity principle is something that we generally believe intuitively but it's also been studied in neuroscience so the ability for the neural activity of two brains to synchronize which is to say to behave in a mutually determined manner appears to facilitate communication between individuals there's also a connection to theory of mind which is the skill of being able to infer the thoughts of others so if Bob's cognitive functions FX and FS which produce his working memory trajectory and his conscious experience are optimized for decoding Alice's message M into her conscious experience S and ineffability is reduced if Bob's conscious experience S tilde is conditioned on compared to M because it implies that part of the computation of reconstructing S is executed during inference of tilde S meaning that the smallest program from tilde S, Bob's conscious experience and tilde five Bob's cognitive parameters to S Alice's conscious experience would make use of tilde S to reduce the sort of residual work that needs to be the residual information that needs to be supplied in order to determine S which would shorten the descriptive length of that program so in other words making progress at inferring the conscious experience of others in your own cognitive processing could literally be interpreted as reducing the ineffability of their conscious experience quickly we're going to touch on the grounding problem so as Eric mentioned before two individuals will generally understand the same word or sentence in different ways and our measure of ineffability does capture this dissonance for two reasons and first measures of ineffability such as we saw in the previous slide they are bound by a ceiling that scales with cognitive dissimilarity or the conglomerate complexity of P5 given P tilde Phi and Phi includes all the parameters in Alice's computational graph including those that parameterized functions are defined on the input data D and likewise for Bob so that means that Alice's is parameters Phi are grounded in a representation that is at least partly shared with Bob's tilde Phi so if Bob's parameters implement a function that operates differently on input data compared to Alice then they do not inform on each other to a great extent and the ceiling on ineffability is increased via this K, P Phi given P tilde Phi tan and secondly conscious experience S is a function of Phi it's computed using a function that depends on Phi and likewise for Bob and Phi contains Alice's long-term knowledge therefore S is capable of containing information about the associations that Alice makes in the process of generating her conscious experience and that's included in the reconstruction target of K, S which is to say the descriptional complexity of her conscious experience so our model offers an interpretation for the observations of Spurling which remained in 1960 in this very famous experiment where he showed subjects a grid of characters briefly and then asked them to recall a specific row he observed that subjects were generally able to report the prompted row accurately but not all the characters in the grid despite being able to report that they had a conscious experience of they sort of had they consciously apprehended all characters in the grid and our model offers an account for this phenomenon in the following way so upon being exposed to the grid of characters and being prompted to report the characters in a specific row working memory contents represented by the attractor state A presumably contains the identities of those four characters in the prompted row as well as a summary over the grid for example the approximate number of characters or their arrangement and an estimate of the information lost by that summary whilst information sufficient to discriminate all characters would exist at some point in the computational pipeline but primarily in upstream sensory state V from which working memory trajectory X and working memory output A are computed subsequently since the attractor state A is directly accessible to verbal reporting processes the characters of the prompted row the grid details at summary level and the presence of information loss would be directly reportable whereas full grid details wouldn't and note that this explanation would hold irrespective of whether you wear the distinction between conscious and unconscious is drawn so whether X the working memory trajectory which may contain sufficient information to discriminate all characters or not whether X is considered conscious or not so as this suggests one of the points that we make is that at least when it comes to characterizing richness and ineffability the exact definition of what constitutes conscious experience so the definition of FS Phi in the computational pipeline is not that important we utilize this minimal computational model without relying on the implementation details in order to allow us to make general statements about richness and ineffability and as Eric hinted before you can account for the report of ineffability the report of ineffability assuming this computational graph irrespective of how you define FS so if X is considered to be conscious then the attract dynamics bottleneck the amount of information accessible to working memory output and verbal report but if X is not conscious then information loss during processing can still be reported on it can still be approximately computed and reported on leading us to report on ineffability so finally some future directions for this work one of them is to link it to the neuroscience so for example to actually get some empirical estimates of the amount of information loss using neural correlates of working memory state for example and also I think for a deeper understanding of consciousness you have to consider why it exists and not just how it manifests so this is something that we don't touch on on the paper but the use of information bottlenecks in machine learning would suggest that information loss actually has a purpose and specifically the purpose is generalization it improves the robustness of functions learned from data to noise and it improves the ability of functions to perform optimally on inputs that were not seen during training which is what generalization is defined by and this is important because if we want to incorporate sort of observations about consciousness into our artificial models what we really want to capture is the benefit that consciousness affords biological models and we might not need to transfer the actual form that it takes in human cognition and that's it awesome thank you for the presentation Guillaume it would be awesome if you could give some opening remarks and then we can have a discussion and hear any questions from the live chat sure well thanks again for the invitation to mind jazz on these beautiful topics so as opening questions and discourse I would say like a strong message here is that through the formalization that was just presented we have an account that allowed to make the bridge between access consciousness and phenological consciousness something that was described very well in the paper that was cited of the other Nakash in philosophical transaction be how you can you don't necessarily need something more than access consciousness to account for these phenomenal aspects but it's also a lot of open questions like typically we have seen there is an information loss within brain and between brains but we can also complexify the within brain message passage for instance adding metacognition so we are also working on addition of cognitive architecture on top of global neural workspace with typically attention schema theory from Michael Graziano and so there is again in this meta representational steps a loss of information as well and I think from an empirical point of view at both behavioral and neurophysiological level there is also a lot to unpack there on how our metacognitive representation of ourselves is also an impoverished information from the real states that occur underlying and that connect well with the social dimension of consciousness and here we tapped a bit on that with the ineffabilitated interpersonal level but in the case of the attention schema there is a very interesting reversal from evolutionary standpoint of why the brain comes up to to represent others at first and then we are recycling the neural mechanisms to predict others behavior on ourselves and like there is a kind of flipping of the traditional narrative in cognitive neuroscience with Graziano because the others come first in a way and self-consciousness become a sort of side effect of having to have all the mechanisms to deal with others and here we don't talk too much about the representation of the self but I think it's a very interesting pass following this work then there is also something that we didn't discuss too much but we think about the trajectory to the attractor and how many trajectories can lead to one attractor but interestingly from a sequence of attractor you can recover sort of a trajectory so like if I have point A and point B and point C by having the sequence IBC I will tend to have a trajectory that looks like each time and so it's kind of heading towards the dynamics also of social interaction and culture so typically cultural artifacts like music or movies are eliciting stuff from a subjective experience that seems like stronger than just words and even I mean I'm saying words but even words are sequences so in a way there is maybe in the dynamics of communication a little bit of information that is retrieved by the interpolation of different discrete states of different attractors so that's it's in the the inequality that we saw so the inequalities are there but maybe we can play a bit with those phenomena and see to what extent we can retrieve more information by playing on those dynamical phenomena in communication and regarding communication so we talked about the grounding problem and how the message and the attractors are kind of aligned between people through a shared statistical environment and I think the Kolmogorov framework is interesting in the sense of a generative model of the world that you have to share with others and in the sense I'm working a lot on autism and I'm interested in neurodiversity the fact that you have this the cognitive dissimilarity in the information loss is also very interesting to interpret how diversity at the biological level can have an impact on how certain people would have more difficulty to communicate with others specifically in a neuro-typical world like we have a society that is very normative in design for neuro-typical and so people who are more the norm and are more neurodiverse would then have more difficulty to align the generative model and communicate with others that's also something that would be nice to discuss or to explore later and finally so since we are on the active inference live stream I would take the occasion to mention or it could connect with active inference we didn't really talked about the active inference formalism in the paper but I'm coming back from a workshop on computational neuro-phenomenology where we used the previous work of Varela on neuroscience and phenology to try to have a generative passage between first person and third person experience and we discussed during the whole week how computational models a third constraint to stabilize this relationship between first and third person perspective on consciousness and I was among the rare people there not completely convinced by active inference but I was like trying to argue about the need for plurality at the formalism level even from a theoretical perspective and for plurality in those domain but at least from a formal point of view to try to bet all our eggs in the same basket so to say and here it's a very nice example how even you can combine formalism that were mutually exclusive in the literature so for typically coming more from the embodied cognition and Varela's work people with embodied cognition were tends to be more in dynamical system theory and very opposed to computationalism and information theory here we show in this paper that information theory and dynamical system theory can totally work and in hand so it doesn't mean that we need to commit to one or to have all formalism but to maintain this plurality I think is very important from a epistemological standpoint and still the connection with active inference that could be interesting and that's one of the things that I kept from last week workshop is how from a social interaction point of view synchronization and I worked out on interference synchronization and those phenomena synchronization can be optimality for information transfer as we see here but in the context of active inference is also the optimal advisor of free energy because you have the two generative models that are perfectly aligned so you don't have error in your prediction because the other person become a mirror of your own generative model so that may be also the some bridges and predictions that can show that those different formalisms are multiple way of looking at the same thing so yeah thank you Guillaume so feel free to discuss if you have any responses and just take it from there Erika and Shu or I can ask some questions or read some comments do you have any comments on Guillaume's thoughts yeah I think we have some do you have things to say yeah that was a long list of comments I find your comments about the autism link really interesting because actually that's pretty much directly what the message of the result is is that cognitive dissimilarity is associated with difficulty in communicating so yeah actually the last thing that you said about how if the communicator and the listener's brains are synchronized or if they're equal there's no prediction error that that's kind of the machine learning problem actually as a whole I don't know if this is a big link but the whole point of machine learning is to discover the true model that you're trying to learn so that you have no prediction error right so yeah I mean I'm not an expert on active inference by any means but one thing that does make me a bit apprehensive about it is that it's so concerned with prediction the generative models are about reconstruction and prediction whereas cognition is about survival and prediction is part of survival obviously I mean you need to be able to predict I don't know not crashing into a car or something so that you don't die but it's actually it's just a part and more generally the inference of cognitive states in the brain is not about discovery of truth in some sense you know it's not about coming up with true model or explanation for some phenomenon it's about sort of optimizing for your fitness so yeah that's something that I don't quite understand sort of what the active inference perspective on that would be I don't know if Daniel you had some comments or not okay just jump in with active inference then we have more comments in the chat but I think that's a great set of points so if I can make one point to Guillaume's first you spoke about in the presentation the generalized ineffability and it reminded me of a few times in active inference where we're hearing about generalized synchrony generalized which means like not necessarily lockstep or mirror but mutually information encoding not necessarily linearly correlate and synchronize like the end state of the metronomes but there's like complex information transfer especially in this multi-scale setting generalized synchrony which is enabled by the generalized coordinates which are the coordinates of motion of the path so these are kind of the summary statistics to describe the Taylor series expansion around that path or to better describe that path taken in some way and generalized homeostasis which for certain environmental regularities it may be sufficient to be entirely retrojective or there may be physical or cyber physical constraints so that you'll have a system that is purely designed a different way so I wouldn't say any theory is necessarily too opinionated with respect to what you could describe and I think part of the reason to want such a descriptive approach which is what I feel like you all did by deciding to describe an analytics framework for ineffability rather than for example a mechanistic explanation but describing some evidence all of that was just to say pluralism in what we're approaching that ineffable space from multiple compressions or bulls to even describe English words scientific models and it reminded me of the recent discussions with Lance DeCosta who described how other discrepancy measures other than free energy could be used and that free energy has been used in machine learning variational auto encoders all kinds of Bayesian applications because it has a KL divergence and so it has some convenient variational optimization properties as a discrepancy measure but is not the only discrepancy measure and you mentioned too in your discussion which were the Shannon syntactic sort of classical information content and then the Kolomogorov the program and so that is like another distance measure so in some ways these spaces can't or won't or don't even need to collapse below pluralism of at least several relatively stabilized kinds but I thought that was a very important point about explanatory pluralism and how we can be excited about one area of science or one way of knowing and it's compatible with others ways of knowing because we have some of those properties of differences that were discussed so those are first thoughts before even any active technicality yeah I mean KL is go for it no I just want to remark that I mean KL is distance between two distributions um yeah yeah anyway yeah continue I was just saying like in the case of the of autism this typically show also don't necessarily need to go to the mechanistic neurobiology to have potential explanation so in the in the case of Kravr for explaining the brain like there's a very nice definition very nice landscape of different kind of explanation in the same way like those different formalism can give different perspective on how to understand what's going on autism has been very focused on the neurobiology at some point and the genetics but maybe it's not at that level specifically that the functional understanding should occur I don't I don't say it has a no causal relationship because it's highly genetically irritable and so on but sometimes the the functional understanding is not necessarily at the very lowest level of explanation so now to kind of move to active inference and really physics based approaches the trajectory recovery is going to have some maximum information representation with the generalized coordinates of motion it's how paths are represented in many settings and so that path of least action the Bayesian mechanics on that landscape whatever it is empirically it's like fitting a spline in that space or doing some other kind of modeling whether it's SPM modeling or hyper scanning there's some kind of generalized time series modeling between the present and artifacts the past the future other entities what do you mean Daniel I don't get it just that that kind of trajectory recovery is what is being parameterized in physics of consciousness type models or physics of cognition Bayesian physics anything where a probability distribution has a position velocity acceleration and so on it's being path inferred continuous time generative models and there's time generative models in active and hybrid models just like other Bayesian graphs so in the continuous time setting where there's a Taylor series expansion that's trajectory recovery yeah well I think more generally maybe the point is that you could use Bayesian learning variational inference to learn a generative model of these rates that we're referring to right if you had some kind of data sets that actually represents a sample from let's say the true model then you could use variational inference to approximate that model but you don't have to though I mean there are many generative modeling techniques in machine learning that don't fall under variational inference I'll read a comment and a question so Yashua Benjio writes Guillaume Dumas idea about trajectories of consecutive attractors to recover information about likely trajectories that lead into each of these attractors that are normally lost in working memory is very cool and then Shu Eric any relation between what you talked about and the compositional properties of conscious thoughts yeah I think so so in terms of compositionality of conscious thoughts I think what Yashua is referring to now is that our thoughts kind of compose concepts that we already have so something like you know a sentence if I think you know the fat cat I'm composing these mental concepts of fat and cat and that generates sort of a new experience so experiences are always generated by these these building blocks and then similarly one thing that we didn't we didn't talk so much about is that conscious experiences language and attractors can also have this shared compositional structure so like language it's clear you know base space where it's combined to form new little phrases new sentences that have novel meaning in a systematic way how this could happen with the tractors is well kind of in the most basic case different dimensions in this high-dimensional state space might converge to different attractors and so now you know one attractor that's converged to is really a composition of attractors along different dimensions so this is a sense in which you know our experiences are compositional are attractors that generate those experiences can have compositional structure and this could be a reason why language itself is compositional we're basically trying to come up with some way of describing or identifying compositions of attractors I mean I would add that compositionality is a mechanism for reducing descriptional complexity so you know in fact that's arguably the objective for the compositional nature of our thoughts and language features like last time to describe more robustness to noise yeah I mean if you're trying to describe an instance of a state obviously the descriptional complexity is much less if it's built up of concepts that you already know about yeah just to bring this back to something I mentioned in the talk you know I pointed out that attractors have this discrete structure there's a finite number of them and we could assign labels to them but if these attractors are compositional which would allow for a richer number of attractors basically it's finite but the space of total attractors is exponentially large so we don't want a language that's assigning one unique word to every composition of attractors it would be an exponentially large language we want some kind of language that will minimize the description like what Xu said you know you could have a language that's compositional I think that provides a really I think you're muted Daniel sorry it provides a very interesting and justifying rationale for using the description length as opposed to the KL divergence which is maybe in some telepathy enabled world or space the KL divergence does help you move quickly or on some defined manifold but in encoding decoding space and on the computers that we have then writing shorter programs and or making smaller version theoretic compressions becomes one of the imperatives to take the kind of empirical approach that you are describing because though you've described mainly equations these are all also computer packages that can be used to model different data sets so what kinds of data sets do you think this is most approximately applicable to plans and questions are you saying what kind of data sets would kind of minimum description length compositional models be most useful for or just with the approaches and measures that you described today what kinds of data sets or behavioral and cognitive settings do you think that could be used as an empirical tool or measure for like here about does the data already exist is it some hypothetical measurement or are there ways to even reanalyze yes Guillaume or anyone else yeah I think on language already there is like a lot of things that have been done the number of bytes per seconds transferred for instance I guess like on large corpus of discussions or even in clinical settings for instance I'm working on the clinical alliance between a patient and a clinician for instance disformalism of ineffability and information loss would be interesting to apply to as even a form of biomarker I mean even if at the language level it's not bio and then you can also apply those information theoretic measurements to physiological data movement neural data so in my lab we are recording multiple brains simultaneously we can totally try to empirically approach what we are describing in this paper and show that for instance in certain populations that are more challenged to communicate through neurodiversity so what we discussed earlier we can apply the same type of measurements with information theory instead of just classical neuroimaging measurements so that would be for instance an empirical way of dealing with that but I'm very interested in would be to make the relationship between the access consciousness and the V process and the information loss that you have in your working memory right now I feel that neuroimaging technology are not at that level of possibility but who knows maybe we're going to have soon the ability to disentangle that kind of stuff oh yes, yes so the delimitation of what constitutes working memory in the brain I think that's not a settled question I mean there are lots of different possible approaches it's totally up in the air classically in global work space theory they traditionally thought working memory is localized to prefrontal cortex now the new perspective is emerging that maybe working memory is sort of distributed across the brain we see kind of sustained attractor representations all over the place in this distributed working memory so all these different pieces what constitutes X what constitutes that working memory trajectory and also what constitutes the experience what's the function that goes from X from working memory to an experience all that is up in the air writing this paper we had plenty of discussions about what is the trajectory the whole trajectory conscious is just the attractor conscious is their kind of sense in which both these things are true so like Guillaume was talking about access versus phenomenal consciousness access consciousness refers to what I'm able to report whereas phenomenal consciousness kind of refers more to the standard definition of what is my experience like so our framework kind of illustrates that maybe we can unify these things where X the trajectory is some phenomenal experience it's in the moment whereas your attractor is really the only thing you're able to report remember and some of us still know that something is missing between what you're reporting and what you were actually experiencing throughout I mean I think the in terms of our model it's pretty clear but I think actually actually grounding it in empirical datasets is a challenging question the other thing is that Kogomorov complexity is more of a theoretical measure so that actually I think the reason why Shannon information is much more ubiquitous for example in machine learning because basically given a generative model you can approximate it for example just using Monte Carlo estimation but Kogomorov complexity yeah it would be much more of an approximation with a much greater error I think if you attempted to approximate that so I guess that would be a potential concern yeah one advantage of it if you could measure the Kogomorov complexity it would allow you to kind of measure the ineffability of an experience on an individual case for instance one thing we spoke about with theory of mind was that this complexity of the experience given the message we would think it would be much higher than an experience given Bob's experience given the listener's experience because the listener's brain is doing a lot of the work of decoding the message into a similar sort of experience so it would be useful if you could see something like are there certain types of experiences where the message is wildly insufficient the complexity of the experience given the message is massive but actually people do a really good job inferring the experience and maybe it would be different from the experiences that they simply can't reconstruct so I think there's some value in trying to create measures of Kogomorov complexity to kind of handle these individual cases the issue is it's not really computable you can't search through the space of all possible programs but maybe there are kind of ways to get around this for instance you could sort of parameterize a program with I don't know a neural network let's say you have a neural network going from message to some brain scan and then maybe the size of that neural network or the complexity of the network could be sort of a surrogate for the complexity of the program the shortest program that's a cool idea that is really cool it reminds me of using adversarial neural networks which don't have as strong analytical guarantees as for example cryptography but then they kind of converge in being able to protect or change or modify information in certain ways but you aren't getting the same bounds or guarantees on the formality and that's kind of like the discrete continuous dialectic and so it's important to embody even that pluralism with respect to state spaces because of the formalisms in any of these domains the formalism for discrete continuous they're not always aligned and those are definitely the areas that can be studied but for the space of empirical models that we're talking about as part of even then methodological pluralism in science but if we're talking about statistical models we want to be able to intake the data at some point so it has to have certain properties which are even looser than the math and there's probably a lot that could fit in this and I think it'll be interesting like when you have a hyper scanning experiment how do you include the environment or how do you deal with different number of sensors or different types of sensors different kinds of action and report that people encode their symbols through symbol all kinds of accessibility and interface types which I think it returns to Guillaume's point so yes please yeah actually that's also reminding me that there is also a nice empirical playground associated with what we discussed is how we can combine from an environmental perspective instead of the biological perspective this information transfer so I'm working for instance with Michael Lipschitz which part of values as well from anthropological standpoint humans came out with rituals or cultural practice to maximize the alignments of people and those things also are very important in our society how to reach consensus and collaborate and so on and I guess like this can also of issue would be very important when we have to converge on optimal collective dynamics here we are talking with Alice and Bob in a dyadic scale but a nice follow up also on this work would be at the group level how you optimize consensus and collaboration on group scales particularly for climate change or social inequality and very challenging topics of society yeah yeah so actually that touches on a well positively in this of the simple model that we use which is that we don't incorporate any feedback loops so we have this very simple unidirectional model which I think has advantages because as well because it sort of shows basic principles in a very relatively easy to understand way but you're right in that realistically there's always feedback not just at the interpersonal level but obviously within the brain working memories is top down attention actually so that's something that we didn't address in this yeah it would be really interesting to sort of unify our work with the field of crack max which is exactly about this given many back and forth steps in dialogue where people come to a consensus and this could also involve the active inference framework where as a listener I'm going to ask questions that are going to try and reduce my uncertainty about the speed or state as much as possible awesome thank you for these comments so in my experience computer scientists and those who have empirical data measurement experience they know that you can have behavioral measurements or just columns in a database you can have heart rate and you can have video data and you can kind of have this like open-ended it doesn't need to be just a matrix that gets multiplied in this extremely clean way and I feel like that leads a lot of these very open-ended and flexible approaches which on one hand address the different forms and functions and settings that these kinds of questions are asked and at the same time are working if not to address at least to have continuity with classical discussions on a wide range of topics like sequence yeah the question of you know not everything is a vector not everything is a matrix you could have more complicated data structures that's really interesting I think relevance to consciousness so it certainly seems like our thoughts our experiences have this sort of symbolic structure you can kind of think of them as like any thought sort of a graph I have a couple concepts in mind maybe working memory and I have some idea of how they relate to each other if it's not a graph it's something like that maybe some kind of smooth representation of a graph on the other hand when you just look at the brain all you see is neurons all you see is this distributed high-dimensional neuroactivity so there's this really interesting question of how you link them together and I think attractor dynamics are another interesting avenue for getting there and you kind of get a lot of things you get these high-dimensional continuous states that you could represent you get sort of discrete states that you could represent and they could compose together so yeah I think there's there's something there to kind of extend that point to this very meta science area of discussion around pluralism in science that Guillaume opened the box on when practice in the behavioral lab if not much more broadly is using structures that have different column types then certain kinds of decisions that might have been taken to be in principle or like arguments for the superiority even contextually of one framework those are just unveiled purely methodologically as modular degrees of freedom like whether you use a generative model that's continuous time or discrete time continuous state space discrete state space the thermometer data how you discretized it where you put the thermometer all these broader factors that kind of instantiate the experiment in the pipeline they're just all embodied and cultured niche events in a scientific niche in a social niche and that's a very pragmatic or people might use other words or like realist or I don't know what adjective would describe that take on starting with the scientific community and formal modeling in the modeling process as its own behavioral objects and then working to have the empirical practice more general like leading the generality open-endedness with practice though that raises so many second-order questions as well yeah well we can have also from a very pragmatic standpoint like we need to have reproducibility so to have like those norms in design of experiment and format tools make sense but we need to just remind ourselves often that they also constrain our way of thinking and typically the history of science show us how mathematics is guided by the problems that we have to solve and typically if we take quantum mechanics the formalism was popped out at first and then quantum physicists use it it was a generative process and a collaborative effort between mathematicians and physicists and interestingly they come up even with two different formalism that were shown to be compatible later on so it's a nice tales in a way showing how you can have even a stronger intersubjective by adopting different formalism that then later on are shown to be compatible and equivalent so here I have unfortunately to live at some point to pick up my daughter but we can go into category theory and more meta mathematics to try to show that those formalism in the end are potentially equivalent but from a history of science I think it's not just about the equivalence also to have different viewpoints that confirm each other to make our idea, our model of the world intersubjectively stable so to say I'll give you a small short closing comment Guillaume and then we'll talk a little bit more when Eric I first saw some of your figures for hyper scanning and there were nodes that were being connected with different edges that were within and among brains and there was this one part slash dimension of my experience that was like you can't do that they're not connected it's like but it's a statistical causal graph so they are exactly the same type of explanation and empirical data structure it's the same exact mutual information, linear causation whatever you want to do an edge is an edge in that representation and that's the map territory situation they're not the ones that are touching and it was just like by seeing two levels and the way that the representation could be used it was approved by example what it meant at one level it was hard to publish at first because I think of that reason thank you, farewell Guillaume yeah thanks again for the invitation take care so what direction would you like to go or I can read another comment or anyone else in the live chat in our last minute okay let's read the last comment from Yashua wrote but these attractors presumably correspond to different concepts in the same sentence are not independently sampled sorry I was like verbatim but what are these structures in language in terms of I thought that was a very memorable thing to say that words are sequences too it's like they're modular with respect to the dictionary but then even the phonemes or the writing structures have their own compositional logic like a periodic table and then there's sub-logics and then there's just different uses of different levels of description which is why we don't speak the same language unless we do yeah well directly I think what Yashua is referring to is like when you sample a sentence I guess like clearly these are their own discrete things but you're not going to be sampling them independently from each other it's like different words will cohere more or less and also they'll cohere more or less given a context you want to form some kind of sentence that describes some context or some observations so yeah in regards to these compositional attractors again it's a simple picture where you have different attractors converging along different dimensions obviously these attractors are going to affect the convergence of others so it's not like everything is just moving independently along different dimensions yeah I mean simplifying assumption that we make in our simple computational model used in the paper is that we hide sort of dependencies as noise in the variables to avoid needing to consider these links explicitly to sort of highlight the main points about richness and their ability that we wanted to make but I really like your point Daniel about how there is sort of the data there's the physical reality and then there's the compositional structure that is sort of imposed on that I mean it's a model right so it's a modelling assumption that you make and you have to choose at what level of abstraction to define that model and it can feel a bit arbitrary and then you also discuss kind of an idea of what we all speak different languages yeah an interesting question for that in our work is well okay would that imply that we have different attractor structures across different cultures that have different languages is there an interaction there or are these different languages equivalent ways of describing the same state of attractors and I think it's somewhere in the middle regardless of what language we all have I think similar concepts and similar conscious experiences but there's some kind of cases where you have a word in a given language that maybe identifies sort of a fundamental attractor and that's kind of part of their conscious vocabulary I guess whereas you know regardless of what language we speak we'd be able to you know get in that conscious state ourselves but maybe given our language like the attractors that you would need to compose to construct this conscious state would be a bit more complicated you know you'd need like something like a longer sentence a longer thoughts to have the same sort of experience awesome well one thing that brought to mind was neural patches on the skin like different parts of the body have different density of sensory types from very fine scale touch and actuation to some more broader patches like where two needles you can't determine the difference between them and then that relates to many areas to kind of development of taste and differentiability being able to determine differences and then you of course use this very like evocative but also grounded dual representation of the samples that are discrete and may have compositional structure with a path that's moving through them and seeking to embed the topology of the sample points within an information geometry which is the kind that are actual statistics input and then the neural patches are regions where to one person it's like yeah I drove through New Mexico and to the other person it's like every inch they're having this rich experience they know the different stops and then I think one last great point from the presentation was that we could kind of, I'm not exactly sure I've said but emulate this informational characteristic and maybe not even have that the embodiment for that to even have consciousness that'll be the debate then but the debate used to be much less sophisticated than that and also people took principle stances yeah I wasn't totally clear on that last point that you made principled stances on whether this would be emulating or simulating or approximating analytical framework for consciousness or whether this by implementing that machine that it would be implementing consciousness versus like the kind of linear regression package that does cognitive modeling that could be understood in a purely SPM like framework and SPM doesn't take a package perspective except maybe implicitly on like whether SPM models generate consciousness most likely it's easy to say no because of generalized linear models so some people may take in principle stance that any description or simulation in certain ways it's all good we're definitely not generating anything conscious it just would purely be a hugely energetically expensive potentially statistics exercise which is fine yeah so I mean the chain of reasoning of our work I suppose is that if you assume a certain model of consciousness that implies ineffability as it is understood by information loss but obviously information loss does not imply consciousness even though it is an attribute of how consciousness manifests in humans so yeah I mean this question of what is the necessary to declare whether a system is conscious or not is sort of very uh is not what we discussed in the paper at all I mean from my perspective as a machine learning practitioner it's closer to what I said in the last slide I would like to get the benefits of consciousness in an artificial model and it doesn't matter so much whether the form that takes is similar to how it manifests in biological systems or not why should it I mean we don't have the kind of resources that evolution has had over a billion years to optimize for this model so the question is how do we design systems that can benefit from for example the generalization and robustness properties that being very contractive in processing can give you a rather than you know what's the exact definition of consciousness and how do we replicate that in the machines because that to me is sort of a very superficial um uh well it's a more superficial problem in that it's looking at the how and not the why yeah I don't know yeah I have some thoughts on this also so I mean Jonathan Simon who's a co-author on our work he gave a really interesting presentation recently that I heard which is you know whatever model of consciousness you assume let's say global workspace theory there's a minimal model of it that you could construct right like we could easily construct some kind of model like to make this really concrete there's a bunch of minneural networks that are the sub processes and they're communicating through some like shared R&N let's say so you can construct a minimal model like this train on some tasks and then if your viewpoint was that a global workspace theory with such and such properties is what generates consciousness well then you'd be forced to say that your minimal model is conscious and this is in a way I guess not problematic but sort of counterintuitive because the model could be doing really simple things they could not even be reporting that it's having any consciousness at all it might not even have language it might just be doing something like solving MNIST in a global workspace type architecture do you really want to say that that thing is is conscious so yeah it is a problem I think with most theories of consciousness maybe one kind of exception in my view is theories that say you know we basically have a representation that we're conscious so one canonical example I guess is Michael Graziano's attention schema theory which basically says our brain constructs a model of what attention is doing in the brain and that model basically just describes something that is like experience it describes oh I'm aware of these things they have these properties I could report that I'm experiencing these things it's basically a representation a set of neural activity in the brain and there you don't have the same problem where you can construct a minimal model because presumably this representation of consciousness is really complex and then I guess another set of theories would say that even if you emulate whatever processes in the brain we think generate consciousness then physical implementation might be really important over here so the example that comes to mind is IIT which would say it's basically substrate dependent in the sense that you could have the exact same function the exact same mechanism implemented in different ways like on a computer processor or a neuromorphic hardware and in one case there may or may not be consciousness whereas you have the exact same function in another substrate and it is conscious so that's another class of theories that would say that emulation does not necessarily imply that you're actually generating consciousness yeah well in our last two minutes what are your closing thoughts or next directions exhortations to the humans and language models yeah well I guess my I'm interested in this stuff a lot from a purely philosophical perspective so the hard problem for me is really salience of why would any physical mechanisms generate consciousness and the nice thing at least for me about this work the most satisfying part of it was that I initially had this intuition that experiences what the color red is it's not just information it's kind of you know underdetermined there's something more than just information because I can't do things like describe it but you know now we have a theory that kind of kind of breaks that that intuition and now I have a satisfying explanation I feel for you know why can't describe my experiences but under a physicalist framework so I think it'd be interesting to reason through some some other of the thought experiments that make the hard problem salient and kind of yeah to develop some some models that kind of break the intuitions for the hard problem for in my case at least like you know I've seen it done in this scenario with the topic of ineffability so there's no reason and principle that it can't be done across other aspects of the hard problem yeah so I think if I were to sum up the work in one very high level sentence it's that well reasoning subjectively about subjectivity is very difficult but reasoning objectively about subjectivity is easier and that's sort of what we do in the paper because we take an objective standpoint and we try to you know formalize or characterize subjectivity from that standpoint so yeah that sort of makes makes the picture clearer I mean from my perspective I'm interested in consciousness mainly in terms of what it offers what it brings in terms of for example improving generalization of artificial learned models yeah it's interesting because so another work that I'm doing is formalizing generalization bounds for information bottleneck which is a regularization principle used in machine learning so it's just really interesting how evolution has sort of discovered by itself this regularizer principle that it applies to human cognition that we can also show mathematically actually does improve guarantees on generalization that is really mind blowing to me yeah wow great presentation so thank you both for joining you're welcome back anytime thanks so much Daniel it was a lot of fun thank you