 Hello and welcome. This is Actinth guest stream 47.1 on July 10th, 2023 We're here with Nick Bird and Samuel Bellini-Litchie and we're gonna have a presentation followed by a discussion So thank you both for joining and Sam. Thanks again for this presentation off to you Okay Was Nick gonna say something before I start it? Oh, I can go ahead. Okay Okay So some introduction first. I'm a professor at Minas Gerais State University I'm a psychologist philosopher mainly working in cognitive science I'm gonna talk about what I did in my PhD thesis and Also some renewed interest I had on this based on some developments in large language models Which happened this year Most notably chain of thoughts and tree of thoughts Those additions to large language model made me want to come back to this and talk about this So how do you know about human reasoning which is the subject the main subject underlying this topic? Well, we had empirical research from the 60s starting with Peter Watson who was challenging the notion that we have a logic logic module for reasoning or some general understanding of the standing of logic when we're adults and He crafted tasks which were Hard to solve but they had actually simple fundamentals to it uh Which shows which we're taking to see taking to show People's irrationality something like that and also a Huge tradition that's called the risks and biases that also started in the 60s in the 70s mainly by Kahneman and almost first skin Which had similar tasks and went beyond and they even won the the Nobel Prize for this work Yeah, I had here on the slides evolutionary psychology and language and pragmatics, which were also People in the 90s working with this sort of research, but I'm gonna talk about dual process theory In the way they formulated by Evans and Sandovich which Mainly summarizes the results of Watson and and Come in verse Bias is military So I'm gonna talk about how dual process theory came to be and then Some problems that it has that I tried to address So one of the first first reasoning tasks in this trend Was the Watson selection tasks which goes like this raising the K Which cars can determine if the following rule is false if a car has a vowel on one side Then has an even number on the other side This sounds like a simple exercise So the car a clearly is relevant because it has a problem on one side If she doesn't have an even number on the other then The statement is false and everyone can can find this neatly of most people but the problem comes with the the card seven which is Not so intuitive people actually choose the second the second Cartoon there which has an even number on one side, but having that even number doesn't help at all To determine if the statement is false. What does help is Which is an odd number If it has a vowel on the other side So this was one of the tasks that famous for and Conman in first key show This one there the famous Task as well. It's called the consumption thousand the issue that people arrive and solve this It is a because it goes like this. There is a 31 years old single Outspoken and very bright. She majored in philosophy as a student She was deeply concerned with issues of discrimination and social justice and also participated in anti-nuclear Demonstrations, please make the following statements by their probability was in one for the most probable and a police problem What's interesting here is the Is the statement the last thing Linda is a big teller in his active It's active in the feminist movement Because there's also There are also these statements without one another without the construction which is in the is active in the famous movement And then there is a bank teller Because of the description people think the thing that Linda is a feminist is is very likely given the description But they think being a bank teller is not likely the issue is that people Usually mark Linda is a bank teller and is a feminist as more likely than being a bank teller The problem here is that the conduction of the two necessarily is less likely So being the famous plus being something else or being a bank teller plus being something else is automatically less likely So those are the sort of tasks that The reasoning literature used to show some sort of human irrationality One issue with this is that It's not very interesting to show That's humans are irrational. It's more interesting to show how we how humans reason and also the study these sort of studies didn't show how people actually Solved the the problem is they were actually focused on how people made mistakes One thing that's interesting about dual process theory is that explains both the mistakes and the solutions to the answers So in 2002 come in and Frederick formulated this cognitive reflection test, which is another one of these tasks but it's a lot much simpler and Elegant and it really shows Indicates at least how dual process theory is a good solution to a good explanation of the evidence in general so Yeah, the Western selection test used logic and and The Linda tasks use probability and here we just have basic arithmetic. So It's a math. We don't have to to worry about which which answer is truly correct or which rational norms we need to follow It's just basic math with it There's nothing to discuss here, right? So the task goes like this a bat and ball costs One in ten in total you might know this says it's a famous says even out of the reasoning literature But that costs one more than ball. How much does the ball cost? So people wouldn't when people read this they usually say ten cents in tuition But if you take a little time to write write it down, you can see that it's actually five cents. So The bats Costs one and five cents and the ball cost five cents and together it costs one cent The other two have a similar structure the point here is that sometimes we Look at these tasks and we have an intuitive answer which is wrong and The way we get get out of these These issues is that we stop and think a little bit and review our answers with some sort of Step by step reasoning so writing it down something like that What's interesting here is like if you look at two plus two It's almost impossible not to think for it's almost like you're perceiving the answer is for just by looking at it but if you have some sort of More complex calculation to do obviously just by looking at it. You won't have a solution So this basic Intuition is clearly shown in the cognitive reflection test and What's more interesting about it is that the cognitive reflection test actually predicts the performance of Various and mistakes and biases tests. So if someone does well on the cognitive reflection test They're more likely to go well on any other reasoning task And if they go Bad on the cognitive reflection test, they likely will go bad on other reasoning test as well. This is also to me this is also an indication that this dual process structure of intuitive reasoning versus slow reasoning reflective reasoning Likely explains the rest of the test as well not only these two but most of the others. So it's a good explanation of the literature Okay, so what does the theory say? Besides that besides explaining the the reasoning literature so in 2008 Jonathan Evans tried to Make a formulation of it that would include in Unified views from different different domains such as social cognition and neurosciences Other people in psychologists would have which had similar theories and it seems like there's something in in Common here to unify but Evans wasn't able to and so There's a formulation specific for reasoning and we don't know if it applies So much well to to other areas of definition One common way to Describe dual process theory is by making these tables of features. So we say oh type one processes Have these sorts of features on the left and type two processes have this other type of features on the right This table here was one I made by joining some of the features of Evans in Stanovic and also of Kahneman and then with some questions I Raised in the thesis I elaborated it in this way, but it's very similar to the Evans and in Stanovic chemo formulation One problem we have with this tables formulation of the dual process theory that usually people agree that In one thinking instance when you're solving the problem, you're not going to have all of those features together So we don't know for sure When some feature will be present of not for instance to appear that type one processes Necessarily have to be unconscious in type two processes necessarily have to be conscious But I think that comes a lot because of some conceptual issues in the theory But one solution they have found was to have at least some defining features But different authors have different defining features. So here I have some of these defining features of the the main authors Which is type one processes use less working memory Once have two processes most strongly on working memory The type one processes are autonomous in the sense that they don't need to wait for higher processing to respond if they have an answer they go for it and Using the couples representations mean that means that type two systems can Reason beyond what's what's in the present environment. We can think about things that are not there Type one processing is faster in comparison. I'll to is slower in comparison Type one processing is less effortful While to is more effortful in the sense of psychological effort effort like people reports having a hard time doing things so For instance in the cognitive reflection test People that solve the test correctly usually rate it is harder Because they had to use Type two processing to solve it Okay, but Samuels, which is a philosopher dual process theory and other subjects He says okay, even if these features Proposal is good. You see we still need a mechanism to explain Why these Features are linked together and so What what what is the mechanism that explains why type one processing features go together? And what is the mechanism that explains why type two? Processing goes together. So we need some Beyond saying that they're different. We need to explain What are these mechanisms that make them different? and I actually had Samuels on my HD committee and I'm not sure he was convinced with my solution so my PhD was a solution to this problem I'm not sure how how he was convinced of it But he approved the the PhD so that's good enough. I guess Yeah, the way I So so the goal was to solve the unit problem was to explain the mechanisms That that explain type one processing and explain the mechanisms that explain type two processing And I started to see to search in the literature first How people were conceiving of the general frameworks that were behind these two types of reason in one that was Not actually related to this literature of dual process theory What was actually famous in cognitive science was the modularity of mind In the 80s it was my first for dirt folder and he said that We had two Types of processes a modular process mostly related to input and perception And he had central processes that was mostly related to higher reasoning so you can trace some sort of resemblance there to do a process theory and The way he went about this is that he said these perceptual models modular processes are Information encapsulated so vision only works with vision and language only works with language space only works with space And so these were domain specific knowledge systems that did not communicate with each other And when they needed context sensitivity They needed central processes. So these central processes. He would argue or It's a topic in climbing which means that Any type of Information could be used in a solution. So we can relate. I don't know physics with Biology and form a new solution. So there's knowledge from multiple domains working here and That that would that would also need some sort of context sensitivity Which was a problem for photo. He didn't think we would solve this context sensitivity problem at all And he had good reasons for that in the classic AI we had a Lot of issues with that. I mean, you still do have issues with the Context is context sensitivity problem but not The sort of issues we had in the 80s when they were debating this same problem of classical AI So this problem was a problem of how to represent a changing environment in symbolic representation So, yeah, it was a problem because Classical AI attempted to give knowledge to robots or artificial intelligence in statements about the world and they were Part intractable hard to compute there are so many knowledge banks to search and so many knowledge to to Try to use that the robots couldn't do anything relevant They also couldn't represent that represent a change so the changing environment was terrible for them Didn't work with this symbolic representation. It didn't work with Serial reasoning or anything like of the source. This was a huge problem for classic cognitive scientists and since photo was a believer in cognitive science in the sense of classical computational to remind language of thought theory He said that He had found for photos first law of the non-existence of cognitive science This law says that the more processes are global the less cognitive science will be able to understand them so He thought only these encapsulated Processes could be studied and if we needed this sort of global and context senses context sensitive type of processing then Contest science just couldn't do it Because we had in mind going to science in the sense of a classical AI But also seems not only photo it seems like the literature abandoned classical cognitive science Various ways which were strange because if you take the work of Alan new and Herbert Simon It's strange to think that oh, they were completely wrong. There's nothing Nothing useful that they they say that could be applied today today all the evidence of the theories they made But just completely wrong that's that's a weird weird way of thinking and Well, if you take for instance our theories today of the human Bayesian brain They don't consider this at all. So it's just like they don't care. It's a different strength of theories But it also always sent the same to me that there must be at least something wrong about something right about this Shouldn't be totally abandoned, but there must be a place for these results And the explanation of the mind one thing that was in odds with the way photo described is dual process theory and also sometimes Stannivator when he talks about dual process theory It seems like it's forgetting this fact here Is the fact that we have a limited capacity? When we use conscious processing or working memory And this is one of the best evidence in psychology we have so far We had behaviorist evidence for Enforcement and enforcement learning and that stuff and then we had cognitive psychology in the 60 And this is the best evidence that cognitive psychology has had for a long time. So it's also something that should not simply ignore So we have in short-term memory the knowledge that When we store some information in short-term memory, we usually can only keep track of about seven items Sometimes even less In selective attention there was this task where There were there was a video of people playing basketball and The subject had to count how many passes they were They were making and suddenly a gorilla passed by on the scene and people didn't notice the gorilla Because they were so concentrated on the task Which is some evidence of selective attention that is when you're Concentrated on the task you actually don't even see other stimuli We have also a competition for limited resources when doing dual tasks So if I'm talking on the phone and writing at the same time I'm probably not going to do a very good writing because we're competing for a limited resources And it's important to note that if you're walking in and riding walking and talking Since those are different tasks They won't compete. So we have some competition We have more competition when these limited resources are disputed Also in working memory So if I say the number seven nine three two four one And I ask you to repeat them backwards and you're not looking at the slides You're gonna have a hard time with that Because of the limited capacity of working memory when it's clear that in other areas of our thinking of our Our brains using a lot more information than seven items To to make the scenes happen perception and to make us walk and so on So that's on the side of the type two Constraints on the side of type one reasoning Common our body has some intuitions similar to the ones I described. So he said from a solar's day Days the research that resty and I conducted was guided by the idea that intuitive judgments of my position between the automatic operations of perception and the deliberate operations of reason Common in Frederick claimed that intuitive thinking is perception like and that intuitive predictions is an operation of system one Further that the boundary between perception and judgment is fuzzy and permeable The perception of a stranger as menacing is inseparable from a prediction of future harm So you can see that this is very likely very similar sorry to The intuitions that we were speaking on of when you're talking about prediction processing So kind of saying when you look at a face of someone and he's mad You have a prediction that you're in trouble, right? so this is a sort of judgment that seems for perception since from perception and Also the concept that the contact sensitivity is Pretty much salt predictive processing when you should take on the AI literature that's also happening for instance the paper attention is all you need they have some Tips for how to deal with context sensitivity which a voter didn't dream of So this is something that Clark said in 2013 The best overall fit between driving signal and expectations will often be found by in fact inferring noise in the driving signal and does recognize my stimulus as for example the letter M Saying the context of the word mother Even though the same bare stimulus Resented out of context or in most other contexts would have been a better fit with the letter in a Unit normally responsive to the letter and might under search circumstances be successfully Driven by an in like stimulus. So what is Clark is trying to say here that since we have represents using probability Then if the context says we should go left and go left if the context says we should go right we go right because the representation is fluid and it accepts Changes really quickly So that's something that's in from the predictive processing architecture But one issue I think that we might see in the predictive Processing architecture is the lack of compositionality So symbolic representations exhibit compositionalities Meaning that complex representations are built by combining simpler elements according to rules or syntax The meaning of a complex representation is derived from the meanings of its teaching parts the way they're combined this property allows for the generation of new and meaningful expressions by manipulating symbolic structures So photo in 88 Are you that? Even if connectionness all succeed they would likely have Some sort of simulation of a language of thoughts that exhibits compositionality in order to work like we do and We see that It's it's it's seem like it's the case with the results we have this year so considering those constraints and The initial view of photo and the problem I was trying to solve which is the unity problem from blue process theory a major shift I had on the PhD thesis and then I wrote a chapter on this book Which handbook and founded rationality with Keith so We explain here in this book the chapter in this this shift Power structure, which is I mean it's not clear that you know a dual process theories theories would adopt photo, but it's still very similar The way the received view of dual process theory works, which is well type one is a number of dumb heuristics by modules and animal intelligence And then type two is the real contextual and complex reasoning real human intelligence So I tried to invert that power structure by shape by saying that the contextual predictive processing is Is a type one reasoning which is which is giving us The basics to reason on so our reasoning already comes with context We don't need to reason the context with type two reason We all use type two reasoning to fix some minor issues in predictive processing and we use Eristic search and because it's limited We only use it when there's a lot of prediction error and predictions aren't working that well And then we call for this heuristic search to handle What I was missing to see if you can find a new solution that was in capturing so in this case The powerful system is the predictive processing and not the the type two We use it so So that was the general idea for this shift Now I'm going to say some hypothesis that stem from this shift so T1 processes deal with content encoded in the form of probably density functions Which is how predictive processing Uses to represent the world Which means there is no symbol and no definite content But values means the standard deviation influenced by previous movements and previous world contingencies manipulating prior information Bias is the distribution to one or another direction closer to or further from a certain value so People here in the active inference institute who know this by heart. I'm not changing in day here I'm just using what they say basically These functions are not stored in the memory bank but distributed from the responsible brain regions over to the external organs body parts to connections The values in the distribution do not represent objects directly in this quickly they refer to this is distinct acts Aspects of the input when perceptual systems are dealing with such objects This is the line with the T1 processes being easily biased when working with references to similar properties Like similar numbers objects rhymes or pet names very often an incorrect value is picked from a distribution an example that I have from this is when Usually I have seen people confuse their youngest child with their dog the dog named the youngest child and when someone has a Another new child than the youngest the new youngest child often confused with the dog because they are I think somewhat represented in similar distribution similar point of the distribution This is also in line with payments of embodied proposals that the world is not a presented in symbols To one processes are so personal and the predictions are made by the same systems, which we process perception The clear example is that a judgment about a facial expression is related to the FFA, which is a region brain that processes Faces and this is also related to that judgment that common May argue that when looking at someone we make judgments about judgments about their face The idea is that perception is not passive but already comes with predictions and when in problem-solving Such prediction is precisely the type one answer. So if we look back at the Reasoning tasks all those type one tasks are likely stemming from predictions I don't know what to claim that you want processes are purely perceptual if in contrast to primitive Only that such predictions stem from perceptual processes so like in type one reason is that there's not a clear line between what what is perception and Contrast so I think the word you just using different words for this is It's the problem. So just not not getting into that Convince a sample of judgment of a negative facial expression shows how this is expected as a little process theory like It's expected of dual process theory that type one reason works under predictive processing Also in line with the claims of your body cognition that there is no sharp link between perception and reason On the other hand to processing works like a classical machine for reasoning such as general problem solver of new one Simon Part of this classical machine only makes sense in the brain If it's if it exists in a wider setup of predictive processing network generating type one responses like in newals physical single system when facing reasoning problem T2 processing opens a problem space containing an expression that designates the initial problem and an expression designates the solution which was produced by probabilistic prediction Having the initial expression in the predictive expression in the problem space T2 processing then uses its move generated to attempt to reduce differences between them and sometimes find different solutions and such back So let me just explain this a little bit better When new was working in the 60s there They they knew they couldn't search all possible space of a product space because it wouldn't work So they have this have this in the research which Works like a detective Let's ignore the problems which are likely wrong and just look at the spaces that might be right What is predicting in reflection framework does for a research is that the research only works on It works already with the prediction that was stemmed from type one processing So it's like we only start reasoning in making searches based on The prior predictions we already had Right, and we all we only go Further with the search and the predictions aren't working And the main point here is that the risk of search isn't searching a random space even a Pre-programmed space is searching a space that was left out From the predictions that started in type one processing. So that's how it doesn't go to the frame problem anymore because A large part of the contextual issue was also already solved by predictive processing okay, so That was the general hypothesis I Presented to attempt to solve the unity problem Which is the problem of how type one features go together and how type two features together And now I'm going to try to argue that this is likely the case Since that I can't prove this is the case, but there are good reasons for us to to think that type one processes are Use predictive processing and that types of processes do not use predictive processing Rather they need some some kind of symbolic every six search One thing that's that Stems out of this is a good Explanation for the difference between implicit representation and explicit representation To look at the literature in psychology You will see that It's very ambiguous in this these definitions like people use implicit and explicit Representations as the same thing as unconscious conscious or fleeting graspable or the same thing as automatic in control And so when it goes to do a process theory It's not very clear what they're saying in each case, but they use this Expression implicit and explicit a lot How I think this model solves this is By saying that implicit representations are probability density functions while explicit representations are symbolic Representations classical symbolic representation Clark And in his book sometimes mentions that we can have single peak distribution distributions, but the problem with having single peak distributions is that Well, if you only have one peak on the distribution function, then it's nice It doesn't have all these these features of a realistic representation It's more likely similar to a In symbolic representation in fact, that's having a single peak distribution might be the way that a probability representation Is turned to a symbolic representation so These have different and important features so The probability Representation is continuous It's uncertain and ambiguous and that's why it's able to be sensitive to context in the example to the mother example as far It can vary on retrieval. We can remember something and be different in the second time It's tied to priors. It works with statistical relations Yeah, it's it's called in and Depending on which value is most probable at the time it has a different outcome, right? so I Argued that explicit representations are likely discreet and they're likely symbolic in nature They're unambiguous. This is what allows us to disagree in the first place they're stored reliably if you have a Reliable definition of it you likely remember it the same way in the next time It's arbitrary in the sense that it's not really coupled to the stimulus in the same mind in the sense that no the perception Our perception usually is the statistical relations are related to aspects of the world It's using compositionality and it's immutable in the sense that you can't change that you can't change it. It's fixed This value doesn't change So that's the implicit versus explicit feature. You can have also explanations for our automatic system Contrasts working memory. So automaticity concerns over learned skills and Overland skills or understood as skills that have become predictable in this framework here So we usually use this to explain like driving riding bicycle riding So when you're driving If you don't know how to drive you need certain statements like Some statements about how you how you should steer right you need to have those in mind But then once you learn how to drive it's like your body Already knows what it's doing and you don't even need to think about this So this is a classical distinction in psychology, but I argue that predictive Processing theory makes a new Enlightenment to this So it's like our body is predicting what we're doing. So it's automatic so it's an explanation for automaticity, but Say that a dog certainly does suddenly runs Front of car you obviously don't have a model for that. So you're gonna need to Call in working memory to solve some of those issues. So you don't hit the dog So everything works under prediction unless The model is very unreliable and then working memory is called in to solve some Further issues An interesting hypothesis I had on the PhD thesis which I never explored and I actually don't even think anyone read this part But it seems to me to be a very interesting Hypothesis in relation to free energy and active influence. So that's why I'm being here in the active influence Institute That as we know the way attempts to minimize free energy by getting predictions, right? does higher need of type 2 processes are related to higher free energy and to bring information in the sense that when the predictions are working clearly Then you don't need to to have effort But when there are a lot of prediction error Then you need effort because you'll call in the working memory to do a risk search So to minimize a great amount their amount of free energy will take more time in work This is more effortful than having predictions ready and minimize free energy as quickly as possible before when Probabilities fail the system needs to start this game possibilities It searches for other possible solutions by means of a risk search Who is it search will be related to more information and time because it does not have probable solutions, right? Instead it needs to escape its state space almost from scratch We say almost because it is the risk enhanced It also will have tricks to get the world correct solution faster Unlike group 4 search, which would investigate the state space from beginning to end We believe reports of effort will be related to executing more risk searches Those reports of effort by subjects would seem to be based on cognitive Informational and physical constraints of reality. So that's why I think this is an interesting part of the hypothesis it's because Well, we have this psychological measure of effort But if this is true, then we actually have a physical measure of effort Like if people are working harder than the problem that means there's a lot of free energy going on And they are having to use This the risk search to fix some issues that the model isn't able to fix myself Yeah, so that's that's the effort part. So now going for the working memory part. What's interesting here is that um in working memory Working memory, it's like it's expected of it. That's that it works like a classical computer so Working memory is widely research topic in psychology But in predictive processing, it's rarely rarely mentioned and I actually searched the predictive processing books control f And people rarely use this this topic because It's not done by predictive processing. It has nothing to do with predictive processing It's it's actually an issue for predictive processing because Well, if you're going to eliminate classical symbolic processing altogether, then we need a predictive processing explanation of working memory And so far I haven't found one And I don't think that will be one But this also could be just Ignorance of mine. Maybe there's some paper there, which I hadn't found and Like we've been wrong but in any way Uh Working memory Is very aligned with What Turing was thinking when he was making the the connection between machines and mines So if this is a a sentence of Alan Turing explaining the His computer, right The Turing machine And if we change the word computer to working memory, it clearly Works, it's it's almost like it's talking about the same thing when of course if we Change the word here computer to predictive processing. It's clearly not talking about the same thing So the behavior of the computer or working memory at any moment is determined by the symbols, which it's observing In his state of mind at that moment You may suppose that there is a bound B to the number of symbols of squares Which the computer can observe at one moment If he wishes to observe more he must use successive observations We will also suppose that the number of states of mine, which can be taken into account is finite The reason for this are the same characters as those which restrict the numbers of symbols If we admit it an infinity of states of mines Some of them will be arbitrarily close and will be confused again The restriction is not one which is seriously effect computation since the use of more complicated states of mine Can be avoided by writing more symbols on the tape And so this is very similar to what we think about when we're Talking about working memory and not only because we were influenced by Turing But because of the evidence that comes from working memory is very similar to what's suspected of this sort of machine Now finally we have speed so predictive processing has basically various strategies to Make the the processing be as fast as as fast as possible while that's not true for symbolic classical AI Um So, yeah, Clark says cheap fast world is pulling action rather than the pursuit of truth Optimality and deductive inference is now the key organizing principle when he's talking about predictive processing The predictive processor is always taking certain bets about what the current state of the world implies losing accuracy compensation for speed We also have predictive coding in the the more strict sense So by predictive coding we mean specifically the property of the system to consider from the world only Stimulate which results in greater prediction error. So there's also a filter there Which allows for speed and any perception And focusing on predictive prediction relevance in that only The misdiagent to quickly decide forces of action and select amongst amongst the possible forces So predictive processing Uh is tailored to be fast necessarily Well, that's not the case for symbolic AI and so does that's the explanation for the difference in speed so Beyond that the tip to processes need to figure out the solutions online So it's different if you have a prior prediction that would say to you what the answer is From having to search a space from from scratch, right? There's also another issue that uh, the biological brains Certainly were not built for a serial realistic search And for the intelligence say that the most that the absolute speed Of the process is a property for excellence of its implementation and since the plane Brain does not have a symbolic processor implemented if we do use some sort of realistic search. It's like a A different adaptation. It's not it's not what the brain is is used to And searching problem spaces is slower than having a problem outcome ready in the significance Okay, so that was the work I did in my phd thesis and Those one those were one of the main reasons why I think This framework is good for explaining the differences in features of blue process theory And I actually wrote a prediction that I I just I I recently Learned that I wrote this because I forgot I wrote this and then I went back to to the thesis and I said, well I actually said this would happen and it kind of did so That was one of the reasons why I I renewed my my interest in my phd thesis So what I said was predictive processing system would be subject to bias from lack of compositionality such as mistakes in transitivity Failures in noticing necessary character or formal rules and so on Precisely the type of mistake type one processing in cursing. So at that point, we didn't have a large language models. In fact, the Attention is all you need paper came out in the same year And and then I noticed that This could be seen as a prediction of the type of issues that large language models are facing that's We out reliability hard time keeping the order of units or steps in complex reasoning or math straight hard time letting go of priors and to my uh What really motivated me to to start working on this game was that they solved this Precisely by using dual process theory in t2 agents. They're calling So what they're doing now to solve these reliability issues is adding a risk search To the genitive models and and it's it's having awesome results. This is just this year to 2023 So The more they develop these type two agents to answer these reliability problems of large language models The more the large language models are getting good at stuff So this is a paper I got This is the images from the paper tree of thoughts, right? This one was just one of the last ones Probably came out about Two months ago something like that The way they're they're doing this implementation is they're taking the the the Results the outputs of the large language model and they're doing further reasoning on it And then feeding the the large language model with the result of the further reasoning The further reasoning they're they're doing is very similar to the class symbolic AI So here they explained there's a simple input output prompting here, which is simply Inputting prompt and the large language model will give you an output And as we know These these outputs sometimes are very biased There's hallucination on it. They're not reliable. No all that stuff And last year they figured out that Through chain of prompt chain of thought prompting. We could also already increase The reliability of this these models. So you give an input and just as we do the Better prompting, you know, when you go to chat to PT and say I'll do this better fix this issue solve this step by step and some of the The commands that are good for chapter chapter PT and chain of thought prompting Automated some of that some of those reasoning steps and then large language models were Working much better after this. So this was last year and then Last year this started a new trend of research to To make these agents coupled to a large language model output musical prompts prompt back until the to the large language models to see how they can get better Um, the self consistency one you give the input, uh, and then there are various solutions the The self consistency the self consistency forces the large language model to try different solutions and then You get a majority vote. So if you have like five similar answers in some different paths Then this one spiked over the one which is less popular And the tree of thoughts which I thought was most similar to the The proposal I had in in psychology in the process theory Is the one which opens up the problem space It's actually pretty much what what I just explained in the The case of psychology they open up a problem space and they start to search Possible solutions which are better than the original and they feed that back into the predictive processing of sorry the genesis model the large language model And then This this is this one is uh, usually it's currently one of the best ones they have So here I made a general Uh diagram of how this would work It's not psychology nor AI. It's like what's what's similar in the two So we have a generative model here using probability density functions This looks more like a traditional cognitive Uh, common connection is modeled in a predictive processing point because it's just It's just the diagram is not that good. I'll talk a little about that and then So, yeah, we generate an answer here, which is to the problem here And we get a little linguistic output and this linguistic output then goes for an heuristic search Using symbolic representations Yeah So if you get a better answer here feeds it back to the genitive model and so on Basically the knowledge part is the genitive model and the heuristic search is simply using More steps to see to pop the back. It's more like a prompting scheme Than a new knowledge scheme, which is both true for the dual process theory model I created And the the heuristic search that's was happening here in the t2 agents for lm's Uh, a better way to do this and I think three of thoughts does implicate this Is that the input layers Are trying to predict the thoughts All right, that's a bit more like predictive processing than the other diagram And so they're trying to predict the thoughts and by open up this searching this problem space during these These steps We already have new new prompts here for the genitive model uh So if you take predictive processing seriously and this does happen in Ukraine is more likely like this I do think that the three of thoughts paper does say that the intermediary process is interfered in the searching on the lm's Finally Uh, I thought maybe we could suggest something From psychology to for this model to keep keep growing working differently Based on what we know from psychology So some things that are not implemented yet is that it's actually functioning Have these updating abilities and intelligent forgetting Which is reliant as related to insight problem solving is when you You forget something but that it means like you forget the prior right You're using a different prior so you You're able to Let go of biases, right There's also work on thinking dispositions which could be relevant in dispositions or sentences like this Release should always be revised in response to new information of our evidence So this these will be like imperative Linguistic knowledge which would have to be added on I don't know where but Likely would help at work. There's a lot of thinking dispositions thinking dispositions that are relevant to our Successful reasoning and this this is just one example Yeah, the literature knows a lot more examples that could be relevant Uh, we have in creativity research. We have this generative phase which Similar to what the genitive models on to me, but we do not have the exploratory processes going on which is common in the creativity research literature But it's what we do with for instance, me journey when we're doing better prompts Uh, so maybe using these exploratory processes From the creativity research Uh, we can also automate the exploratory part of the The creativity processes that these image generators are Getting to and not just the generative part And finally, I think the embodiment we don't have robots or anything like that That do this sort of reasoning. So AI on the computer and also Stuff from active inference which was not considered Slightly related to embodiment in a sense of navigating the world and predicting the world That stuff is as far from happening in the traditional algorithms we have So, yeah, that's a lot of stuff Sand maybe I hope you guys have some comments as I said I haven't been able to talk about this to anyone And I do think I have some at least some Maybe some relevant stuff that I've published But since I'm not famous and I don't have people to talk to it's mostly gone unnoticed So thanks for the space here Thank you awesome presentation. All right, nick it'd be awesome to hear your introduction and then take it wherever you'd like to go Sure, so first thanks, uh, samuel for interesting talk and I've been really pleased to find your research. I guess, um Maybe I'm supposed to say something about uh, who I am where I am. So I'm nick bird. I'm at the Stevens Institute of technology in the new york city metropolitan area In a department that's kind of interdisciplinary So we have like philosophy and quantitative social science But also like people doing like music and visual arts and all sorts of other things Um, I tend to do more of the philosophy and quantitative social science stuff um, so I'm Kind of also interested in human reasoning and so much much of the people I'm reading and citing and Drawing on in my research are a lot of the people that you saw in the opening slides that we saw, you know, people like waissan and Kahneman Tversky and mercy and spurber um And a lot of the other people, uh That were cited. Um, I think samuel does a really good job to incorporate the kind of like Shoulders of giants that we're standing on in terms of the philosophy of mind right the the photos and the militias and the the these people um and One thing that I think is uh, you know, everybody's mind these days is these large language models So I'm really glad to see how this this dual process theory that was that emerged from research Uh like kahneman and Tversky's on these these problems like Uh, you know the conjunction fallacy problems or the waissan selection card task or the uh conjunction fallacy tasks how that has been applied to models like large language models and what we can learn from that and then I think the coolest thing about the pre print print is how we could kind of port back and forth the learnings from both the computer science and the The cognitive science to improve one another, right? So I think that one of the neat things about this pre print that um samuel was giving us at the end Is that you know, we're kind of taking What we think is a model of reflective reasoning That we like, you know, we represent Certain parts of a problem and and maybe reason more carefully and effortfully about them In certain situations in ways that might help us And how can we help these large language models do the same because it does seem like large language models function At least the ones that we interact with online mostly They seem to function mostly as like a system one or type one process They're just kind of like quickly generating lots of uh text or imagery or something like that But they're not necessarily like reflecting on it in the ways that we Would think that uh, you know a human is capable of Um, and they might not even be capable of like representing things in the same way that we do Yeah, so I thought that was a really interesting idea this using chain of thought and uh tree of thought models To kind of create a reflective level or a reflective system within the models It seems just like a really valuable idea and just a really great synthesis of research in multiple disciplines Something that I think few people are actually very good at is like incorporating some of the best insights from multiple fields We're often pretty siloed in academia. So I just think this is great work and I hope more people Will appreciate and pay attention to it and build on it so One of the things that stood out to me in this presentation more so than when I was like reading this 2023 pre-print analytic reasoning for large language models by Dr. Bellini Leche Um, the thing that stood out to me is this idea that like Bellini Leche and Frankish sort of switched Which of these two types of reasoning processes are supposed to be context dependent? Or context sensitive or however you would want to word that And I started thinking well, I wonder if there's a sense in which both types of responses or both types of reasoning are somewhat Context dependent and I'll just say what I mean and then maybe Samuel you can kind of Say what you think I should think or um or clarify the view of the predictive and reflective framework or something so the thought I was having was well in a familiar context These Intuitive predictions these type one processes our gut response so to speak those are going to be pretty useful because they're well trained Where we have a lot of experience that we're drawing on so like our gut response our first response is often quite good It's in these like less familiar contexts Or may be similarly familiar but like way higher stakes or something But it's in these other contexts where we might think oh, maybe I should slow down And make sure I've double checked whatever my initial impulse is before I just accept it because like there's a lot riding on this Or like I'm just not used to this this type of problem. So I need to you know, I need to slow down And so there's a sense in which what context is doing is not just Showing up in one or the other type of reasoning, but it's sort of like determining which type of reasoning might be best And at the moment But I'm wondering like is that just totally compatible with what you're imagining? Or is that like somehow a deviation from the framework the the predictive and reflective framework? Yeah, thanks for The thoughts and compliments and Um thanks for the question as well so The reason why I think uh context Needs always some sort of type one help Is because of the constraints we have learned from symbolic ai So if i'm saying that type two reasoning Have the same constraints Then it cannot be contextual because It doesn't know it doesn't work like that We know that if it fails in context it fails bad But of course we do need to like when we have A novel situation We often solve this by by context, right? So how could this be? And I think it's by the um interconnection of the power of the two systems so So we always start with a prediction which in this case would be mistaken And then we start we have to search for novel instance and in this search We make the contextual comparisons so It can't be that the system too is making the comparison on its own. I think I think it's Uh more likely that when it's a case of Contacts that stems from The novel situation and then the two systems have to Figure out it together likely in the sense that I was pointing out here in the end when these predictions that go over the thoughts The most likely will Are the ones that will solve these context issues in the novel problems Okay, I think that's That's somewhat helpful. So then maybe what I'm thinking is something along what the lines you were saying at the end you were saying how There's still kind of an opportunity to understand things like executive function Uh In this framework, uh, and I think that's maybe part of what I'm wondering about and I'm wondering so I guess I'm wondering if Yeah, if there's more to be said within this, um predictive and reflecting framework about How Each of these two types of processes get selected or like What might help? Go ahead that specific point you just remember I just remember something that that's related So, yeah, clearly our working memory or type two reasoning likely does more than that It does this but likely was more than that. So, um There's there's likely also a belief bank or something like this the thinking dispositions Some sort of knowledge bank which is related to type two processing Um, so Keith Frankish does this distinction between flat out beliefs and I'm not remembering the terms, but he has a type one belief in type two belief Type one belief is that belief you have but you're not fully confident of it and the the type two belief is the the belief you have and you You have a say a very firm political position and you stated with obvious words that you mean for sure you believe that so There's likely to be some sort of different belief structure as well Which I don't talk about at all in the thesis of the work So I'm not saying obviously that this is a complete model of reasoning Uh, and obviously we need more to to figure out How reasoning works. So, yeah Yeah, okay. That's that's also helpful. Um, I'm wondering if Daniel, did you want to weigh in? Well, I think it's a very interesting question about which of those modes are engaged or how they're balanced through time and How does this connect to what's known as the context window in today's transformer type large language models So, how do you map the computational attributes of large language models today? Like architecturally or their ram or cpo usage How do you map that onto? For example human cognitive processes. Do you think that's useful or are there any insights there? Well, I'm mostly making a relation between Predictive processing and generative models But I know that's I mean large language models are not entirely the same as what Active inference is saying I'm aware of that. But there is some similarity specifically In regards to the generative model parts So I would I'm not sure I'm not a AI specialist to To be sure With which details of a generative model could be true of the human brain and I'm most likely betting that the The people who study generative models in the brain like the the fristons and all have figured out Something like what our generative models do One other thought or nick do you have it you want to add there? Um, just thought of some different ways whether people have connected it explicitly to the Literature on working memory in predictive processing. I think as you pointed out, it's definitely a link that is not highlighted. It's one that we recently heard from professors Walker and Manriquez and Friston in the recent live stream 53 that was on like cognitive paleoanthropology Looking at human working memory, but some ways that people have Incorporated working memory is like a nested model that carries context at a deeper time But that is not necessarily like an actual mechanism that gives rise to why type 1 and type 2 Are the way they are Um, so I thought that was a very Provocative direction to go to move past the descriptive and then to Look at the underlying generative model of why these outcomes are the way they are even though these cognitive phenomena are causes of other things happening They are also caused by some Influencible or contextual or variable aspects And when we lose that context dependence Of the cognitive systems, then they're totally lifted and disembodied Don't really help So by putting the the primacy on that Predictive processing element I think it opens the door to connecting those areas Beyond just mentioning the terms in proximity but to really Support different epistemologies from predictive processing As a model or approach Yeah, can you go can you go on that idea again? I'm not sure what you want me to comment Can you summarize that again and make it a question? Sure? What do you think the epistemological consequences are Of taking predictive processing the way that you have approached it here Versus alternative or prior approaches to cognitive sciences Okay, but what do you mean by epistemological? Uh How we know advanced with this theory Um, how it influences our our understanding of how we know or seek or practice or do or decide Perfect, okay, so let me do anything about that for instance. Um, yeah, I think one of the best What what this helps with most is explaining dual process theory because dual process theory is in that shape in terms of of concepts Uh in terms of theory and formulation uh People have no idea What these these systems refer to in the mind brain? Uh, we often have You know different intuitions into this and Basically, basically what I most think this work is relevant for Is for saving dual process theory from criticism Uh, which has been happening in psychology. So people are like, oh It seems like we're talking about the left and right brain Something like that something That we we don't know for sure what these two systems are right um So this is a good way forward I think you know for for dual process theory and as I said dual process theory does have Some value in itself Uh, so Yeah, saving dual process theory. I think it's a good Direction for For this this model But also if we if we think about The advantages of it Would be something like for instance It could have helped us had the idea of using chain of thoughts or Or a tree of thoughts Had anyone read this before so It can help us make these these sources of predictions that we shouldn't have some sort of zero reasoning Coupled to the genitive models. So yeah, I think it could help us think on Further on on ai if we Yeah, like like like nick was saying to like incorporating more stuff in psychology that I know that's certainly missing here, but Here we have a direction of what we should incorporate in and so on. Yeah, but Yeah, I'm also Aware of the working memory Research from predictive processing you mentioned and if you want to go back to that and explain me some more of that Yeah, sure, so Not that this is the most effective way to implement a memory system But at least that this provides a analytical method for measuring or describing them If one we're going to have the five digit number You can imagine a nested model where the lowest level is The one's place and it's nested within A decision tree of tens places and hundreds and so on and then there's some seek or access policy or strategy That helps speak the number in reverse And some cognitive or computational limitation is just How well that cognitive agent can perform on that Test again, that doesn't mean that's the mechanism that's being used But that would be a way to use a nested generative model to encode like multiple levels of Spatial or temporal variability But context becomes an issue because you're basically just expanding the the possibility space Looking for sparser and sparser associations So if you don't have a good compositionality Or really well definable well articulated causal process Then you're just building these all by all models that You're going to be searching through in the dark Yeah, one point I want to comment on that is that well It's not clear that the best way to reason Will be the way humans reason, right? So we might eventually have A different way of implementing a working memory on an LLM Which will be more effective than ours There's also a lot to the other side. There's also the point that It seems that the limited capacity we have on working memory. Maybe it's not Just a lack of power Maybe it's necessary So we kind of like don't lose our minds in the sense that we have a limited space to search And then we're able to search the this limited space and if it was to If you have if we had too much space to search because we had a super super working memory, then we wouldn't be able to finish the tasks anytime. So maybe this is the likely physical Good barrier that's something that helps And it's interesting that what I find interesting is that Okay The AI may find a lot a lot of different better ways to implement a working memory But it's interesting that they implemented a working memory Or a search agent Which was very very like the way I described in the thesis though. That was the What I found amazing As you were saying that there's there's very ways to implement a working memory, but they did it In a very similar way in the sense of searching the space after the genitive model Has offered a solution So, I don't know. Maybe that's a good A good indication that this is on the right track, but it also may just simply be simple way they found to to solve the problem and it was just By coincidence similar to this. So, yeah, maybe The correct way of the way the brain does May also be different. You don't know. All we have it was the the limited evidence I Offered and a bit more obviously and some constraints, you know on AI and information processing And you have to figure it out and I think that This implementation stuff we're asking is mostly Mostly due to the AI people to find out or the the math people in the distance to figure out Nick do you want to comment or a question or I'll ask one from the chat? Yeah, so I just have a kind of a bigger picture question that kind of gets us into the realm of these chat bots right, so you in this preprint and In this talk have like given us a variety of different tasks and the heuristics and biases literature the the conjunction fallacy to Ask the linda problem. Some people call it Or these cognitive reflection test questions And you know, there's a there's like a paper in the fall that showed that A preprint in the fall that last fall that should 2022 that showed GPT three and three point five and even earlier versions like showed a lot of the human like Alte intuitions and I could basically fell for the lure pretty often Uh, sometimes like most of the time it was falling for the lure, but then as soon as chat or GPT four Was was available to be studied and used on these models It was like performing basically near perfect and when it did get the incorrect response It wasn't necessarily the lure. It was just like some other general type of error It was making like misinterpreting the question altogether or something And it seems like the predict the predicting predictive and reflecting model framework Could have at least two different ways of explaining this like one is like What happened is they changed the ai's like type one thinking or whatever so that like it could Respond intuitively to all to all these but intuitively and correctly So it didn't need to reflect or it could be that like they somehow Created this like reflective type system with either like, you know a chain of thought or a tree of thought or some other type of System that helps it actually engage in reflection on these these tasks or whatever the analog of reflection would be for a chat bot Um, and so I'm wondering like if you have thoughts on this like as you've been thinking about this Or talking to anyone about this like which how do you think they did this? Obviously, we have to speculate because a lot of the data is proprietary, but I'm just curious to get your thoughts Yeah, that's an awesome question. Yeah, actually forgot to mention this on the presentation that the uh The the biases they make on the recent tests last year were exactly the ones the the human's main. So if we three fed it some Dual process real It would give exactly our mistakes. That's just so amazing I'm not certain why that happened, but it did And How they fixed this So, yeah, I do think that they Trained the model on these tasks because people were able to like Get similar Similar issues by changing the questions a little bit. So I think they like cheated on that In the sense that they trained on unknown problems But that's not true of new problems if you Give it some different formulation and feed it to I don't know 3.5 new version it's still It's still failing but Uh, the people on open at ai they they are aware of this chain of thoughts research They are aware of this tree of thought research But I don't think they want to Depend on relying on these external agents. So they recently Uh Release a report Where they are trying to reinforce intermediate steps of reasoning on jpc4. So they're trying to simulate Internally what the the the chain of thoughts reasoning is doing externally by Reinforcing the intermediate intermediate steps of reasoning so that it stops Failing composition out. So I'll add a thought on that. I was reminded by Daniel Dennett's Framework or model in Darwin's dangerous idea of cranes and skyhooks And so some building is built And then the question is like was it built with cranes or with top down hung Skyhooks that just sort of descend from nowhere hold everything up while it's all being built and it's like well, no It's with cranes and then if you need a big crane you can use a smaller crane to assemble a crane So that's kind of a bottom up constructive metaphor Whereas the skyhook is like the top down compositionality And that's when you get to a certain level of sophistication for a cognitive system It can make the blueprint or the plan Like maybe what we would associate with the expected free energy calculation Not just the variational free energy calculation So at some point when a plan can be made It requires a strategy Between type one and type two switching and probably other switches too um And so that crane approach You need a lot of compute To do a bottom up. So even though it's kind of weird to think about It's almost like the current large models are very bottom up Because they're very bottom up from syntax and they don't have um what you refer to as Imperative linguistic knowledge Which can be normative as well as um a heuristic for knowledge and that's potentially a small model But it contains wisdom that's explicit and practices and stances or or Positions uh as you described and so it's kind of like It's not a denial of embodiment to also clarify what this skyhook Ability is and that's type two like So I think um I guess what do you think about that or how do you connect this to anything in that area? Yeah, I'm gonna make I'm gonna make you do the summary and the question again Just although I did track some of what you mean. I'm not sure which one's made to comment on that Do you think that the uh landscape of large and small models would be different if people took on board some of the The features you described Like imperative linguistic knowledge Seriously into construction of models So let me just see if this is something you're asking so they're They're doing small models now for phones and open source models for phones and They have the issue of not being able to scale scale up, right? Because if you scale up too much then you can't run on the phone Uh, so you're asking if we can implement something like Tree of Thoughts to Up these small smaller models. Is that is that Justin? Yes Are you sure or are you just are you just saying okay go for that? That's an example of what it would mean To re-understand like where do we need an api call to a cloud center or where could some context be locally computed With much more of a type one What are some type two problems that we might be able to type on our way out of? What are some type one areas where things are not working and a type two? Description might be advantageous Yeah, yeah, I think that they're they're getting on to that and they're they're being able to make these smaller smaller models precisely because of these sorts of Discoveries they made on chain of thoughts, right? There's also stuff about being able to train on specific data Red one Uh, it's called textbooks are all you need. They're the training smaller smaller models on textbooks And they're doing better. So yeah, there's there differences in the training Training smaller models in different data, but there's also this the possibility of Making them more reliable based on What I'm calling T two regions of train of thoughts stuff like that. And again, we don't know If that's all we can do externally But I would argue that we we would need necessarily some sort of external mechanism coupled to any large language model for the best results because of the differences in Uh, representations mainly because of this right unless the large language model somehow Makes a symbolic representation Emerge from its processes. It's not clear that it has done so but unless it's it does so And it uses that reliably Because it may even have generated some sort of symbolic process internally, but doesn't always use that or doesn't know when to use that so Yeah, most likely we're going to need something external to the language model and um Yeah, this this is some clues to To how to implement them to implement that and also I think in our brain, it's kind of external not only In the sense of our minds, but also in the brain senses. It's a recent step in evolution Uh, we don't see I don't know and blizzards reasoning in type two manner, right? Uh, we do see chimpanzees reasoning in type two manner But not as much as we do. So it's very human like reasoning Uh, it's just that Going back to the context problem Animals do solve contextual problems that that's why they can they can do it, right? So the issue is not actually the context there. It's more like, uh some some Boost on reasoning capacity and that boost on reasoning capacity may be related to the pre-frontal the pre-frontal cortex as we know and so Of course the pre-frontal the pre-frontal cortex is also a network. It's not a serial training machine but if It's it's there's somewhere that's implementing The serial machine or a symbolic classic AI machine or symbolic reasoning or something like that Most likely it would be the pre-frontal cortex, which is kind of like a distinct addiction an addition to the brain and maybe Something related to language results, which is also it is distinct a new Addition to the brain so Yeah, and also I'm very convinced by the photo and delusion argument in 88 that Okay, you're gonna need symbolic processing Anyway, right either the the model is gonna Generate that symbolic processing or you're gonna have to feed it for me Because or else you're gonna have hallucination of mistakes and reasoning because of the nature of Because of the nature of Distributed representations, right? It's it's uncertain and ambiguous by nature That's why I'm not entirely convinced that only probabilistic representations would be enough to Solve hallucinations Awesome Nick, do you have any other questions or areas you want to mention or ask? I think um, I think those were my main questions. Um, I don't know if there's anything in the chat You wanted to touch on before we go. I know we're maybe over time I'll go with one from the chat and then we can have any closing thoughts. So glia maxima Glia maximalist asks Are you familiar with visual pathways in the human brain specifically the dorsal wear and ventral what pathways? Do you think this maps onto the two systems you described for problem solving? Okay. Yeah, so like I said in 2008 Evans Tried to to accomplish that unifying vision of dual process theory but I'm not very Convinced by that. I mean he he also said well, we can't we can't do this as of yet But I I do think it's possible as I said on the presentation I just don't know how to do it uh, how to have a um convincing theory of everything Of dual process theory in the brain Uh, this it does go on to other regions. Like you have automatic motor responses and controlled motor responses uh And and like you said, there's a different different pathways and reasoning uh in figuring out stuff and in the uh visual field so Yeah, maybe the what would be related to something and where to another most likely the what would be would be uh tapping on to types of processing I'm not sure how the where would be tapping on that But yeah, as as I'm trying to make the point if we extend it to the whole brain If we lose what we've learned, we're not sure how that applies cool Well nick any penultimate last words You're muted. Oh wait unmute and then continue Sorry about that. Well, no worries. Um, thanks. Uh, thanks to both of you Uh, dr. Bellini light a and uh, Daniel for coordinating this and I'm really just glad to have found the research and to discuss it And connect and uh, looking forward to further conversations to Yamast awesome Sam any closing thoughts Yeah, I'm just sad I couldn't go much Beyond what I said in the presentation. Yeah Most of the stuff you you you guys Asked, uh, I'm not sure I have good answers for that Basically what I've done is all I know and I'm not sure how to go, uh On the future with this and maybe someone in the audience audience will do so. I hope that that helps Great, I hope so too. All right till next time. Thank you. Bye