 Hello and welcome. It's December 7th, 2023. We're here in Active Inference Gueststream 67.1. Andres Carada is here with us and will be presenting and discussing on NTQR logic for noisy AI algorithms, complete postulates, and logically consistent error correlations. So, thank you for joining Andres and also Jakob and looking forward to this discussion. Thank you, Daniel. Thank you for introducing me. Hello, Jakob and Daniel. Jakob, would you like to talk about why you're here? Yes, sure. So, hi. I'm Jakob. I am a researcher currently based in California. I am interested in using physics-based principles to model intelligent and self-organizing systems and how different frameworks, most specifically active inference, can be applied to both topics in AI and systems engineering and broadly how we can use these formalisms to understand systems at a deeper, more inclusive level with other disciplines as well. And I'm very interested to hear your talk and your thoughts on these topics and engage in, hopefully, a productive discussion. Looking forward to that. Yes. Okay, so I'm going to talk about the problem of self-regulation for any intelligent machine. And it has been a long journey for me in dealing with this topic. It goes back to a patent that I took out in 2010. But recently, I've come to understand it because I've been collaborating with a philosopher and an economist on the aspects of it, and that's why I have this very abstract title for it called NTQR because it refers to a situation where you have an ensemble of N experts to which you have given T tests and each test has Q questions and each question has R responses. Okay? So, it's about evaluating noisy AI algorithms when you give them these types of testing protocols. And superficially, at the beginning, we can take that testing protocol to be the surface application itself. You are literally taking, let's say, a multiple choice exam. But I'm going to eventually dislodge you from that to understand that this is a digitizing format or testing logical consistency of anything because you can digitize anything. So even if it's continuous, you can create four response ranges and stuff like that. And the big breakthrough for me has been the recognition that for a long time, I talked about these things as universal thermometers. And the universality meant that they could be used everywhere just like thermometers can be used everywhere aside from you melting them right or freezing them beyond their range. And the thermometer also had the notion of being stupid, no intelligence, no theory about the world or the phenomena that it's measuring the temperature off. And I did that for a long time and I took out a patent because I thought that I had found a method to do these thermometers and you can patent methods. But it turns out that I did not discover a method or have a patent. I discovered logical postulates for evaluation. And so what I'm going to talk about today is the logic of evaluating noisy functions, period, in general, universally. And so I'm going to talk about them as postulates because they apply whenever you are doing these NTQR tests. And so a major conceptual goal for me is to convince you of that, that I have something that you could call postulates because especially in the machine learning world, people would say, this is crazy. This cannot possibly be. How could you have postulates for anything in the real world so general? And you'll see that I do that by basically shedding all representation. So I want to mention these collaborators. One of them happens to be my cousin. This is a conflict of interest disclosure. He's a professor of philosophy at Virginia Westland University and he has been the severe critic of things. And I've done most of the work in terms of the mathematics and the machine learning but he's the person that's responsible for actually introducing the concept of logically consistent versus logically sound and the utility of having any sort of logical proof for anything that you do. And then the other collaborator with this paper is Celia Parker who works with the Navy and has a master's in economics because you're going to see that this is a fundamental problem in economics that goes under the name of principal agent monitoring. So I'm going to talk about the two main goals that I have in the talk and I'm going to introduce the main problem that we're going to be addressing. This principal agent monitoring problem. And then I'm going to make some observations about you guys. I find that the format of these live streams is incredibly interesting to me intellectually and I just want to sort of give an outsider a viewpoint about what I'm seeing that you guys are doing that I find so laudable. And as I was saying to Daniel I have a little bit of an imposter syndrome because I feel like you guys made a mistake inviting a stranger like me so suddenly but I hope to repay that kindness by engaging in a good discussion with everybody. And so then I'm going to introduce this idea of algebraic evaluation talking about binary classifiers and I'm going to make the math be really simple just linear algebra so that you can understand it. I don't want to lose you because this eventually involves algebraic geometry and something called Grubner basis and all these complicated stuff but I don't want to use this math to hide from the basic ideas that are behind it. And then I'm going to talk about what I said before that you shouldn't think of the multiple choice exams as actually being the surface realization of how something is being evaluated. You could have given people a philosophy exam and giving them written essays and then you have chat GPT grade the written essays and give a score right from 0 to 4 right. So then you're making an NTQR equals 5 test right. And so I'm going to go over that and then the main technical result of today the main thing that is the breakthrough is that there are complete postulates for pairs of correlated binary classifiers that allow you to separate correlation from individual performance in a way that actually allows you to then immediately trivially sort of compute the only logically consistent error correlation if you believe that they behave in a particular way and I'll go through that caveat. Then I'm going to show an exact solution for the tree of error independent classifiers and then then show some experiments you know and work through the math of doing it. Any questions or people want to comment on anything? Go a little further and I believe we'll have a few pieces to add in. Okay, so the main goal that I have right is that I want to describe to you what I did technically right which is try to figure out how I'm going to evaluate ensembles of noisy binary classifiers. And so you know it has a particular way that I go about it blah blah blah blah and you know that's sort of like the surface realization but once you realize how I've done it and what I've been able to achieve what I really want you to think about is this last question right if it's not this thing that I'm going to talk about wow isn't it something like this because it seems like this is so easy to implement and it's so useful if it existed and so that's where I want to take you okay and so I want to have the discussion for that and you know we can go into the math and I can pull up the papers I can pull up github repos right and I can show you stuff if you want me to show it but I don't think it's going to help the discussion and so I want to talk about you know sparsity picking best variants and safety from internal mistakes. That those are the topics I think Jacob you know talked about some of them right I think the November 14th talk that I saw with Alexander he was harping very much on sparsity being a very biological thing and you're going to see how I use it right because I basically have a data streaming algorithm okay so this is my outside review I mean this is a community of people that you guys have been working in such hard problems and you're still working on such hard problems which I have to say you know and I'm being you know honest here you've made very little progress on right but it's because they're really hard fundamental problems but you guys don't give up you guys are still having this live stream you're still inviting people you're still discussing it right I love that right not giving up on particularly hard problems right so I'm very attracted to that particular mind temperament and you are community as I can observe a people who have come to neuroscience and biological things but from different places sometimes from physics sometimes from biology sometimes from mathematics and have arrived there but you're still bringing in people from those places right and there's a lot of active listening that I saw Daniel writing stuff down right as people were speaking I mean incredible stuff that you rarely see so kudos to you guys for setting up this sort of environment and you know inviting a stranger like me to talk to you it's very courageous intellectually and I applaud it and I hope to repay the kindness by engaging in a discussion for the purposes of clarification right that's why we're here for okay so you know I I I think in LinkedIn now I have myself a scientist and inventor and I'm starting I think I'm going to flip it I'm going to say inventor and scientist because I'm the older I get the more I think that invention is better than science first of all it's been around before us right and some people claim that the fire made us right and we didn't invent fire it wasn't invented by Homo sapiens and so invention has been before humans and it probably will be after humans are around right and and so I've done I worked in many different things and so I've done I have a patent for an underwater periscope you when you are at the bottom of a pool when you look up you see that you see that circle that's called snails window the whole world that the whole above surface world has been compressed into that circle and distorted but you can see with the shading right that you can pick up pieces of the world and so if you could if you have a polarized camera you could reconstruct this so it's this idea of reconstructing right what's beyond that I find it interesting and I've worked in in I trained as a physicist but I trained you know I've been always interested in the marriage of mathematics and exact stuff with a very physical problem so for my PhD I worked on superfluid film bortices on a Taurus and I found that theorem of Riemann that had had no application quantum mechanics until this problem okay so let's get to the to the meat of the problem so the principal agent monitoring problem so this is a problem as old as the hill so it appears in Plato's Republic and some people call it the allegory of the ship of fools right so there is a ship owner who has a crew of sailors and you know they're kind of rowdy sailors Plato says that they may be drunk and among them there's a philosopher King who's Plato himself right and but the owner you know doesn't know how to pick the sailors right he thought he thought the owner just owns a ship doesn't have any ability that's why the owner has hired sailors so if you are an a principal and you hire somebody to do the job for you either you're lazy or stupid and you don't want to do it yourself so you have this monitoring problem how do you monitor it you know how do you do how do you monitor work that you don't want to do or don't know how to do does everybody get that the paradox there right and so and so Plato's answer is democracy is no way to do this right because there can be factions right part of the crew can have a faction and you're not going to pick the best captain if you have people vote for who the best captain is so you know Plato used this to attack democracy but this kind of goes this is counterintuitive right we know that that group right that groups are better for us right that ensembles help us survive better right so there's this tension Western civilization between the sphere of the mob right and this thing about the unitary right philosopher King that's going to be the dictator so who's going to rule us is it going to be the philosopher King dictator or is it going to be the mob right so there's this constant tension so I came up with this problem because I'm one of the hundreds of scientists that have made speech recognition possible on your phone right now I used to work for dragon naturally speaking and dragon naturally speaking if you go into wired magazine and look at the highlights of machine translation the founding of dragon systems in Newton Massachusetts in the 80s is considered to be a pivotal moment because you know the dragon systems went on to create the first continuous speech recognition product on the desktop and and the way that speech recognition was developed was as people may know from machine learning was by having data sets and one of them was people reading the Wall Street Journal and then that being transcribed with all the mistakes that people made and the hooms and the haze and so forth right but after you know five or six years you know the Wall Street Journal corpus became very important as a benchmark and so people started realizing well you know I'm being graded incorrectly in this transcription but I actually it's the official transcription that's wrong and so why don't we just redo it right and so they went through the whole corpus and they redoed they spent the money to do it better and so they so every test from then on writing the software had the old benchmark with the mistakes so that you could still benchmark all papers and then the new benchmark and I said who's checking the new benchmark right it's a principal agent monitoring problem right who checks the experts how do we know that this last transcription doesn't have any mistakes then a second moment in my professional life I'm a lecturer at UMass Amherst in physics right physics 101 I have to lecture 500 students the only way to do this is to give them multiple choice exams bubble sheet answers right that I then collect with the proctors at the end of the exam and take to the back of the room and feed into an optical scanner I want to describe to you everything that I did and I was told to do to randomize the test in the in the in the room okay but it led me to sometimes when I was feeding those exams to say why do I need to feed the first sheet the answer key why can't I use the wisdom of the crowd to grade the exam and then finally the thing that actually got me into the mathematics of actually doing this was in the in 2008 I started working in UMass Amherst in the computer science department with Howard Schultz doing digital elevation models from multiple maps that were done from aerial photographs so how do you combine these things right there's there's so much data now right this is the problem that we have so much data how do you combine it and not prevent the noise from corrupting it and then finally the 2010 patent by data engines on error independent model solution which as I'm going to tell you has disappeared the the patent is null and void there can be no patents impossible if you cannot patent natural loss so I'm telling you the history of my business failure basically okay so so this is the paper that was on you know the geometric let me get the laser pointer the geometric precision error right for computer vision tasks and basically the idea there is that you have a bunch of maps okay and you're and I'm plotting here the error covariance matrix between the different maps and the way that my boss Howard Schultz did the maps he did it so that he took advantage of the fact that when you have a photograph A and photograph B when you match features in a to B you can go this way right one way an arrow but you can also go the other way and that matching is not symmetric so he wanted to take advantage of that non-symmetry to discover blunders and he did so and he was able to show that he got rid of blunders but he went on to continue using the maps and everybody told him Howard why are you doing that those maps are really correlated right that's A to B and B to A matching and he said yeah but why not they have some information and so eventually I was able to prove that he was right and this is what I'm showing here is is where I have a model where I just assumed that everybody's zero and all and I've induced a signal if you're a physicist you know what I'm talking about right because I have these two pair to these two maps that I know have been correlated or a highly correlated and I'm introducing them there with other maps that are not so correlated so it's like I mean I'm I'm I'm putting an injection right a modulating signal and this recovery here was with compressed sensing and you see how without being told that that along the diagonal there's a strong structure it's able to pick it up right so sparsity if the errors are sparse enough you can discover the error the precision error between regressors so even though today I will talk mostly about classification everything applies to regression too and here's the the crux of the issue right that that the the way to do this is to basically get rid of reality this is the way that you get rid of representation right if you have I have a model why don't I subtract a true value from the model whatever remains is the is the error and so I'm just going to talk about that today in terms of classification okay we can stop here if people want to ask questions because the next theme is going to be data streaming yeah I have a few comments Jacob you want to go first though yeah sure I guess I was wondering if there if you have been thinking about I guess a measure to quantify how how much information is lost or can be lost when using a particular benchmark because the way it sounded like when you were talking about it is I guess similar to the notion of well all models are a coarse-graining of the particular thing we're modeling therefore every benchmark is going to have this problem of who's watching the benchmark or who's checking that but the benchmark itself is correct which is a I suppose an escapable problem we cannot escape it as a civilization right nobody knows what the answer key is right there is no dictator that we can go to as a civilization as a society it's an escapable right we're always benchmarking to something else there is nothing but benchmarking to something else right am I am I crazy here or can I get an almond right nothing but difference yes nothing but gradient unity is yes at minimum to yes right that is the only thing that we have to observe and that's a statement of relatively relativity in physics too right that we cannot observe absolute velocity we can only observe relative velocity between particles it's the same thing for measurement is no different yeah a few other comments so the back and forth keeping the back and forth directions being fit and how that contains information on blunders like you described and then another subtle pattern is in the in the matrix one slide previously in the cells that were too off diagonal yep there's a slight positive correlation that's transmitted spuriously potentially amplified through compress sensing and distorted and so on there's also ripples of underlying even trivially analytical square correlation matrices empirically plus noise plus compress sensing have these attributes and that is what put so much of the experimental design and attention interpretation onus onto the cognitive entity as hardly this or any data speak for itself yes yes yes I have to agree with you right there there is we have to distinguish here there is no magic and there's no metaphysics here there's only a measurement which is a statistic of a sample and therefore it's impossible for it to for you to generalize it right it's not it's not an intrinsic property here right if I'm telling you that this is the error correlation those maps I'm telling you that that's the error correlation for that particular collection and for those particular maps not for the algorithm in the future or in the past right I'm telling you something about the test that you just took period end of story nothing else if you want to ascribe meaning to that go ahead right but then you're incurring that right that cognitive you're the one who's now becoming the dictator right the bucket is stopping with you because you've decided to assume that that measurement meant X about the world yes when you undertake the epistemic quest then you're the principal epistemic agent and that's an n equals one situation another I think interesting touch point is hierarchical predictive processing models which we talk about a lot in inactive inference they have a highest level prior and if that is going to be also learned over update there's going to be a hyper prior on that so that's one aspect where there's just the buck stopping at a point from a hierarchy perspective and then also even with just the Gaussian the bell curve with the mean and the variance but there's not a variance on the variance that is what gets collapsed that's the meta cognitive and that degree of freedom is what constraints and makes models go stale and makes them over generalize without being aware yes and I want to I'm going to address that one here because I'm going to talk about measuring error correlation independent model being able to measure empirical error correlation now whether you want to generalize that because you have some view of how the Bayesian generalized it or not that's a separate issue right but being able to at least benchmark the actual correlation on the test so that then you can have those cognitive models and those epistemic models that can have that let's make the analogy your car has a computer in its engine right and he also has a thermometer what I'm going to talk about today is about how to build a thermometer that helps the cars computer run the engine better right and I'm all I'm saying really comes down to that that it wouldn't be great if intelligent minds had a thermometer for intelligence right because then the thermometer just tells you that the car is overheating it doesn't tell you why you need a cognitive process right to figure that out or it's because a piston is blowing right or the oil went down right that's a separation right but knowing that something is wrong by overheating with a thermometer is golden right and I want to get to that rock of having a thermometer that I don't have to worry about representation it just gives me a measurement I'll deal with the fact of what it means right but I just don't want to worry about what that measurement is or I want to reduce it as much as possible for that epist so it may sound weird but you would want a tool that strips epistemology from checking the logical consistency of your evaluation so that you can aid that epistemic agent and so I'm right that this is the thing that I find weird and fascinating here that I'm going to propose to you that what we need to do is strip epistemology and I'm going to do that you know continuing on and so let me now start talking about that okay so data streaming so you guys talked about sparsity right so the way that I encountered a data streaming was with a good touring smoothing algorithm in speech recognition at a dragon systems they gave us a book written by gosh how can I forget his name our research director and he had good touring smoothing frequency smoothing can I get a count of how many people know about what this is or know what data streaming is is this something that you guys know very well as a community you have a description that sets it up for how you want to talk about it okay so the way that the typical data streaming algorithm and the way that it gets presented is that it's a method for minimizing memory when you want to compute statistics of a stream and so the prototypical simplest cases you have a stream of numbers that are coming down the pike right and you want to keep and at any moment I want you to tell me what the average of those numbers are so the memory intensive way to do it is to store all the numbers right and whenever you query gets made take the average of those numbers right the whole list but of course as then as the stream increases right that list is going to balloon and become humongous so the philosophy of data streaming is is don't do that just keep two numbers have a sum sorry every time you get a number increment the sum and then have a counter of how many numbers you've seen and then the average is just the sum divided by the number right so just having two numbers two integer counters or two real numbers whatever right you can store and at any time compute a stream that may be a billion long so that's the purpose of data streaming right to compress you take a data sketch as it's called which is a compressed version of all the events that you've observed in the data stream in all the richness but you're not doing that you're just taking the events and you're doing some sort of compression on it and you're keeping that statistic and you and then at the end when people tell you ask you you say oh this is the average that I've served so far so for a good touring smooth and got invented by Alan Turing during World War Two because I don't know if you know that he was responsible for the enigma machine breaking right in World War Two and you know they had they started to you know Alan Turing was a very good mechanical dude he actually built a tight predicting machine way before computers or he did any of the stuff that he was doing you know as he's known very abstractly right he was very much hands on and he helped build a bomb which were early computers that were processing the German traffic that day and trying to decode it they had to work all day these bombs were continually churning right this is the enigma machine by the way recently found in a flea market $15,000 if you find this machine so it had these very complicated rotors right and so these these are the bombs and they had a lot of women assistants right back when computers were women right before you had Grace Allen and so forth right and so this you had to send people home at the end of the day when did you stop you don't know how many secret rotor settings were done that day when did you stop when did you decide that you've done enough processing and there's no big chunk of German traffic that you missed so he came up with good touring smoothing which was how many rotors settings have we not seen which is a crazy statistic if you think about it right what do you mean how many things have you not seen right and so he was able to do it by shifting right by saying well there's many orders for which we've seen only one message right and there's fewer rotors settings for which we've seen two messages right and so if you look at how many messages were per rotor setting which now biologists use to figure out how many species they missed in a survey you can then know how to shift the mass to the zero frequency observation you take a little bit of probability from everything and shift it down and can you believe that this is still one of the provable best ways of doing this estimating unknown frequencies Alan Orlarski from UC San Diego has been doing some of the modern work on understanding good touring smoothing notably he doesn't get used nowadays in neural networks because neural networks never have any missing data they will always give you an answer even if it's wrong or Linsky in this paper which was published in Newark argues that that's wrong and you should go back to doing what he does we're not going to go into that make a comment there yes this reminds me of the distinction with parametric and non parametric statistics and when people talk about how many parameters this model has neural network transformer what have you it's a parametric model that's why they're talking about how many parameters there are and non parametric methods although that's kind of like saying the non elephant animals kind of classic joke it's also like a very open and procedural space where there can be a lot of methods such as shuffling and leaving one out all these different statistical methods that may have less power or less of some statistical positive capacity in the case with the generated data for example are known to be from a Gaussian error distribution but under relaxed assumptions about variant structure then non parametric statistics can do well yes and in and and heuristics must be something that biological systems use a lot right instead of exact computation I would my toast that statement into two which first statement I'm most comfortable with is we can model biological agents as using heuristics we can model anything as using heuristics in our finiteness as cognitive modelers so if we say we're restricting ourselves to an eight parameter model or to this kind of genre of model for this phenomena it's a map territory distinction now as to whether the territory uses heuristics whether you can really say does a bacterium or does a squirrel use heuristics that is a question what are those heuristics but the first question which I feel most strongly about is that we can say that we'll choose to use heuristics in modeling complex behavior why not how can that be not the case okay I'll grant you that because I don't have actually experimental right I mean to be to be precise right scientifically you're asking for a particular demonstration of our heuristics and biological system and I don't have that I only have an aspirational statement that maybe it does happen right okay so so the thing that I want to do here and the problem that I'm trying to solve again is not necessarily procedurally what's happening we know that us in the world are not completely isolated from the world right but what I want to think about is the Gedanken stream where you're isolated and paranoid which does occur in certain technological realms it doesn't have to be a mental paranoia right so for example a self-driving car that wants to check its pedestrian detectors immediately right that immediacy right means isolation right it can't wait to get the feedback by actually touching something right or get further information moving forward and you know it's it's it's close right it can only rely on on its own things because it's unlike a Roomba right it can't be bumping into pedestrians it cannot get feedback any other way other than this mechanism that it's decided to do whatever it is for pedestrian detection which could be sonar or could be visual so I think of it as an extreme that I want to solve to understand what are the limits of what can be done okay not because we are actually need to be in that realm and so I want to I wanted to build certain features into it right because I was using this in in speech recognition and one of the things that I wanted to do was you know to make it simple and so I wanted to treat all the members of the example as black boxes so I don't want to go inside your brain I don't want to go in and get readings of internal parameters in models which is not wrong mind you again right to come back to the analogy of the thermometer and the computer in a car engine right you can have intelligent things and a lot of people do neural networks right and they inspect the internals to try to figure out right how to detect when it's actually hallucinating there's nothing wrong with that right but I'm saying that it itself is then going to bring assumptions that I can't check right so I don't want to do that I want to treat everything as a black box I just want to see what the decisions are from the oracle or whether that oracle be a person a function or whatever right and I don't want to go I don't want a theory of mind the other thing that I want to do and here's the crucial thing right is I want finite chains of safety validation I don't want to use Bayesian models I don't want to use probability because like you just mentioned Daniel those are hyper parameters which are encoding information about the world and the whole point about being paranoid is that I cannot right I cannot rely on that right I cannot rely on any representation of knowledge I don't want to right to avoid that pitfall and so by going to sample statistics that's how I'm going to avoid that because I'm not going to talk about distributions I'm not going to infer anything about the future or the past I'm just going to tell you a number about something that already happened and since it's a sample statistics right I'll be able to then do the magic that I do which is to create a complete set of postulates so then I can prove that you could be in states that are probably wrong or inconsistent and so that's the punchline right that I can do both of these things of being isolated and gedonkin right and paranoid by just having functions that only use statistics of the decisions by the ensemble members nothing else let's stop here because this is an important point right this is a design choice right that I'm making could you unpack that a little further I'm not sure I fully understand how samples to set using sample statistics differs from conceptually from using say Bayesian probability on your data so would you agree that there's the concept of sample statistics and then you know distribution statistics right and things like that right that you can always define a sample statistics and actually compute it if you knew what the sample was so that's what I'm talking about that that's what I mean by sample statistics and then I mean by so so what would it be it would be something like if you have data being produced by a Gaussian distribution right a sample statistic would be the mean of what was produced okay that has meaning that exists right independent of what meaning you want to ascribe to it whether you want to think that it's the mean of the actual Gaussian distribution right that's a separate thing right using that data and using that mean to then estimate what is the mean of the Gaussian distribution that produced it is a completely different task yes right so one the data may not have been produced by a Gaussian right this issue just to connect it to like what we talk about with variational or just Bayesian inference and active inference let's just say in the room the temperature is continuously variable but then the thermometer is going to be integers and then the agent is only capable of coarse-graining into buckets of 10 so we can still talk about only cares because of you know a precision engineering spec right why am I being uselessly precise right yeah yeah variable precision to promote system properties is definitely a corollary of this kind of a framework more broadly like getting rid of the representation question this is like hierarchical predictive processing of visual and so at the earlier lower levels and this phenomenon is recapitulated in in silico as well as in aspects of biological systems that the earlier layers pick out less abstract components like edges and on and off and then not clean but that at each synaptic interval which can be modeled as actually means and variances and differentials in the hierarchical predictive processing architecture that could be modeled as like a kind of like representation explained away you know where's the romcom in this pattern of neural activity and then it's just very interesting that you brought up of course the engineering setting predictive coding which is to say difference in coding was discovered slash invented in the video compression setting and yet in the Rowan Ballard 1999 work and so on it came to be understood as a more general property of how information transmission and so on in certain settings led to certain parsimonies that has certain relationships with things like the Kalman filter and predictive coding. So you and Jacob have mentioned this thing about a transmission of information. So I'm not going to go into it. There's so much to talk about. I've been working on this for a decade. The difference between the observed responses gives you something about the test but also something about the quality of the person who answered the test. The only informative person who answered the test doesn't transmit anything about how they performed. The random guesser only transmits information about the test. If I randomly guess what A and B are, right, I'm going to have the right proportion of A and B as the test itself. It's going to be in the wrong places. But you're transmitting the difference between the A's and the B's. And so there is no magic here. If the ensemble is made of stupid agents, they are not able to transmit via their aligned responses any information that allows you to grade them. So my method has blind spots, right? There is no magic here, right? If your ensemble is made up of stupid members that don't have any information, you're not going to get anywhere. What do you think about an ant colony where none of the nestmates know about how many seeds that they have and how many nestmates need what and how much rain there is outside and the rate of information and blah, blah, blah. But they don't know any of that. They're just using their little interactions. I actually have a couple of books on ants and ant colonies and I saw that you wrote a paper on ant colony behavior. I'm very interested in thinking about that, but we're not going to be able to talk about that, okay? Fair enough. Okay, so I've been interested in the comment you made about Rao too, general information. So data streams as NTQR tests, right? So we're going to look at binary classification, right? So down here below, we have the true label, right? So this is a stream. So think of this flowing to the right, that gray box. Everybody sees that alpha and beta? And then I have three different classifiers that are making different decisions, right? So aligned on the first item, the first two classifiers are saying that it's beta and the third one disagrees and says that it's alpha. Everybody with me on this as being, right? And so what I'm going to do is just collect how many times each of these different patterns occurs. That's the compression, right? So we could be looking at a test that has a million items. I don't care. Since I'm doing binary classification, I only need two to the three, eight integers. So right there, we see the compression that's very attractive, right? For terms of a biological system, right? Because you're not storing the whole stream. You're just storing these eight integers that are picking up these patterns, okay? And like I said, and like Daniel has said, right? You can digitize to whatever you want in terms of engineering spec. You don't want to measure the degree of the temperature of a car to a millionth of a degree. It's useless, right? Economically, there's no point. Likewise, you may not care too much to worry about, you know, knowing, right? You don't care about too much beyond boiling or something like that, right? You just want to be told, hey, the car is overheating, right? You could digitize that. That's a safety. You're coding in there by choosing the degradation of digitization. You're deciding what is your engineering spec and what is your safety concern, okay? So I cannot tell you that. This is an instrument that you have to design and you have to decide what your R is. Any questions on that? Yeah, just a few notes. This is really interesting. There's the principle of the hash encoding or the kind of lookup table for all the combinations. So then you can do hierarchical nestings. And then also this situation is kind of like the cells in the retina just speaking coarsely that there's some false positive and false negative for some activation thresholds. And so it's kind of like a retinal display that is getting noisy photons and also there's variability through all of these finite systems. Variability and finiteness are ever prevalent, but the observation is just what it is. Exactly, exactly. And therefore, right? All the problems with philosophy and logic occur because of infinity, right? If we just accepted that the world is finite, but maybe very large, right? We would solve all sorts of problems, right? Because a Miller's axiom of choice is only needed for infinite sets, right? Finite sets, everything is explainable and understandable. Okay, so here I'm going to show you. Eventually, you're going to see this. This is the set of equations for three classifiers. And these are the eight frequencies, okay? And so these are the complete postulates for error-independent classifiers. I'm going to come back to this and make it much simpler than it is, right? But I just want to show you how these frequencies here, right, they're eight of them. And I can sort of synthesize, right? Assuming that the classifiers have some sort of performance, right? I randomly pick numbers between one and a hundred and then divide it by a hundred to have these values. And then I just plug them back in here and then I get what the frequencies would have been of how they voted. So I'm just trying to show you how these numbers, right, are integers, right? There's no probability here. And then the other thing I want to go back to is something that I think you guys would find interesting from the physicists. Renault was a 19th century French physicist, very famous for building the most precise thermometers of his time. At the time when people were just inventing the concept of temperature and people told him, how do you know that you're measuring anything? Temperature is an imaginary concept, dude. We don't even believe that there's such a thing as temperature. And even if you did, you would need to have some sort of theory about how you make errors in your thermometer so that I would believe what your thermometer is saying. And Renault was, screw that, right? He says, are you saying that there is no science without theory that there can be no purely empirical measurement? And he said, no, I'm going to make purely empirical science. And so he was a radical empiricist eventually in the 20th century, right? We found out that empiricism doesn't work. But he basically is going to do what I'm going to show you today and this is what I did with the maps. He took thermometers and just compared them against each other. That's all he did. And then he looked at the differences, right? And he's the first one to start talking about the concept of precision as saying, yes, I can build the most precise thermometers because I can get them to agree to a certain amount of figures, right? And then they start disagreeing. So even in science, right? There's a concept that you really do not have science until you have disagreement in measurement. You have to have error in measurement. If you do not have error in measurement, you either tell me about math and postulates or you don't know what you're doing. Your instruments are stuck. You think they have a precision that they actually don't have. Thank you. This is awesome. When we have multiple observations going wide, that's sometimes called sensor fusion and then when we have the clash of the prediction and the observation and the differential that's found there, that's the predictive processing architecture. So those are both, there's so many motifs but you totally need a minimum of two. Otherwise you have like a statistical monad which is really more of just a pure conceptual object. And then one interesting tie-in is like inactive inference. As we specify the generative model, the preferences are directly over observations. In extreme case, meaning that the likeliest thing and the preferred thing are one and the same and it's just that, it's just you prefer to get that number on the roulette table and every single time your action is confirming that's the case. That is one special case that can be set up as well as epistemic, more general epistemic activities but by having the pragmatic value load directly onto observations like the thermometer reading, not, I have a preference for the room to be at 72 but I have a preference for something about the measurements and that is a direct target rather than the hidden state being the direct target which leads to a fantastical analysis. Okay, so you guys seem to be, yeah, yes. I'm ahead of thinking about this and I have to implement it. Okay, so let me finish with this with the temperature saying that the reason I came up with upon Renault was because that this thing happens in invention all the time, right? You're not too clever. You invent something and you think you're clever but you're not, right? It's somebody's thought about it before. You just don't know it, right? You're just ignorant. And so I said, you know, somebody must have thought about this. Somebody must have done something like this and so I went looking for it and so I found this book inventing temperature which has a whole discussion about Renault and how he's wrong because he thought that he was doing a metaphysics, right? He thought that he was getting behind the curtain. That, that was his philosophical mistake, right? And so I want to pull back from his philosophical mistake and say, no, this is the comment I made earlier to Daniel, right? I have, I am not saying anything about what the reality is. I'm only telling you an estimate of a sample statistic. That's it. How you want to interpret that is up to you. And I recommend this book because it's great reading, right? Because it's about how you invent physical concepts, right? And the struggle that people went. Okay, so now we're getting into the part where I think you guys are going to be interested because I'm going to go back to the digitizing the format for logical consistency because there's two different viewpoints of the same test. And one of them, which is the one that I'm choosing to use the NTQR methodology just to have a new word for you to go somewhere else in your mind is different from the ML binary classification case, which has semantics. And here's where I want to show to you that the mathematics of the NTQR test is separate from the semantics. It can be attached to semantics, but they're not the same thing. So I want to spend some time to make sure that we capture this one, okay? So, and then the payoff here is the realization that you can start to understand it in terms of semantics. This is something that a machine could use to binary classify things in the environment, okay? Which is how I thought of it first and how people in machine learning think of it, right? I have a binary classifier because I'm looking to classify things in the external world. I'm looking to be told how many A's and B's there are in the external world. But once I pulled the rug from under you, I'm going to tell you that the exactly the same mathematics can be used to figure out the statistics of correctness and that has nothing to do with what A and B means in the world because it just means that there's some percentage of the questions had A as the right answer and B as the right answer. But who knows where you came from digitizing that format, right? So there doesn't need to be any semantics attached to A and B and I can still calculate what your performance in the total test is. So something that can be attached semantically then has the same mathematics in a way that it can be used to check test about chemistry, about philosophy, about geology, about any subject, right? You're not doing binary classification. You're just correcting grades that a test that have two responses per question or three or four, that's a general thing, yes? Very separate from the specific task of binary classifying things in the world. So I'm going to stop and encourage discussion because I want you to get, I'm going to show you the differences too in how it comes into the mathematics but I want to first get if you, if just with words you understand that there's binary classification where the percentage of A's and B's in the test actually means something in the world and then there is grading of multiple choice exams where the percentage of A and B's has no meaning outside of the test. It just means that that 10 questions in the exam had A as a correct and 10 had B. But the questions could have been about geology, right? A and B doesn't mean anything. Yeah, just to kind of say it how I'm seeing it if we look at the output of a statistical test we might get a vector percentage rain and percentage not rain. Bayesian causal model, probabilistic statistical model, percent dog, percent cat. In contrast, we have a different procedural nonparametric approach that can be designed incrementally to discard various aspects of information. Especially in a discrete finite space taking kind of the finitist perspective on modeling and then using that to say, well, I am course grading into this many cells so I am gonna have this many hash codes so we will be able to do this in this runtime. Okay, that's a little bit too complicated, Daniel. Let me restate the following. We have a philosophy question, okay? A philosophy exam and the philosophy exam consists of A and B questions, right? Where you're giving a passage and you're giving two philosophers and you have to identify which philosopher wrote that passage, okay? You get the philosophy exam? Yeah, it's testing for familiarity with philosophers not the ability to generate philosophy or recognize new kinds, but I hear you. So we could have, you know, Schopenhauer and Wittgenstein as choices A and B in question one, but question two could have Plato and Aristotle. So A and B, right, are not referring to the same thing in the two different questions. That's what I mean by having no semantic connection, right? But your grade is that percentage of times that you correctly answered A when it was an A-type question, right? So is that make it easier to understand that I'm making a simpler point, right? That I can detach the grading you from having attaching a semantic meaning to your response? It's like you, this is a common question in neuroscience where certain cells while learning a task can learn an abstraction of the task. For instance, if it is A or B, you will have cells that respond to this concept of A or B, but then you have specific components of the network which specialize to the particular instance of A or B, like Aristotle versus Plato or any other set of philosophers. I guess the question is whether... I'm not sure I fully understand the... So both can happen at the same time, right? So both can happen at the same time, right? You recognize that you can have the general structure, right? Basically, what I'm trying to argue is that if there's an algorithm that can grade R equal to questions, right? Then I'm done, right? It can do it for any R equal to question or philosophy or economics and it can also do it for binary classification. What do you think about binary classification of language like language generation questions? Yeah, anything, right? But you would have to decide if that's correct. Let me go on maybe, you know, and then when I show you things, you can start... Let me see if I can... Let me escape and see if I can go now to Mathematica. So I get to show you some pictures, yeah. Yeah, like the example, yeah. I think of the pheromone perception in the environment and the alarm or no alarm. Is it at the threshold? You got finite number of antenna, finite number of receptors, noisy signaling, don't know go, buck stopping with an estimate brain somehow, some way. Okay, so let me go on and you'll see what I... I think you'll start to see how there's different ways of looking at things. We still only see PowerPoint. Oh, sorry. I see, yeah, I'm going to do share again. And I'm doing share now of Mathematica. Okay, you see the Mathematica? Not yet. I just took it away. I don't know why I did that. You see it now? Yes. Okay. And then, oh, I'm going to stop the share because I need to do the whole... Yeah, thank you. That way I avoid, okay. Perfect, perfect. Okay. And then now I can go to here. Yes. Perfect. So this is the paper that I sent Daniel and it's the one that I've submitted to a special issue on AI safety and philosophical studies. And so I want to... So because I'm doing AI safety, right? I want to speak about this specific case of doing a binary classifier to be concrete, right? And so I'm now shifting into getting you into the mode of accepting that, in fact, there's a logic to evaluation. That there are postulates and there's a logic and that it's universal, right? It's a big one. It's a big burger. And so the way that I'm going to do it is by just, you know, talking to you about how I would... What are the possible grades on an exam that has two responses before I see anything, right? And so I know that if I have, you know, QA questions, right? Of A type, then the number of correct responses. So R is for response, right? On the A questions that the I've classified, right? Does everybody see that notation? R, A, sub-I, right? This has to be... These relationships have to hold. I claim these are postulates. You know, I've been thinking, how the hell did Euclid do this to begin with? How do you introduce postulates? What do you argue? What do you say, right? When you have postulates, you can't prove them, right? You have to sort of say, I can sort of prove them here, but you have to sort of see them and say, yeah, of course, that makes sense, Andres. Right? You can't have more correct responses of A than there are A questions or B. Agree? Or within a circumscribed axiomatic, verified information environment. So play on. Where I have Q questions, right? Where I've told you what the number of Q questions are, right? There's no way... There's no other way, right? And so therefore, I can immediately go and write, you know, this table. I mean, this is how simple it is, right? I'm going to tell you here. Let me make that bigger. Right? So you see this? I'm having a Q equal 20 exam, right? 20 question exam. And I'm just going to make a table of all the possible correct A responses you could have. A, A, right? The number of A's that you correctly said that it was A. And the number of B's that you correctly said were B's. And then I'm just going to keep track of the number of A's, right? Questions. Well, the number of A questions can only go from zero to the total questions, right? And then once I decide that there's NA of them, then like I said, right? My correct responses can only go from zero up to that one. And then if I have NA questions, the B questions has to be this remnant. And so if you just do that, this is the cube you get. This is what you get for the grades. Until three weeks ago, I've never seen this cube. And it's amazing that this cube is... It's actually, you know, kind of a nice figure, right? It has got this double-edged kind of thing, right? But that's it. If you answer a 20-question exam with 20 responses, your correct grade is somewhere... is a point somewhere here. Is it a true tetrahedra? I don't know what to call it, right? It's got that 90-degree twist, right? At the bottom, it's 90 degrees to the top. Yeah. Well, wow. What do you see in it? What does it mean to you? Well, the thing about it that I found really interesting is let me show you the other way of looking at it, right? Which is now the way that I looked at it for years, until this year, which is the realm in which I looked at it, which was the machine learning, where now I'm telling you what is the percentage of things at A's that you got correct. I'm telling you the percentage of B's and then I'm telling you the percentage of A's. And in that cube, that same Q equal 20 test, this is how it looks. See how different that is? See how it has that kind of quadratic structure? What's different between the two... Now, okay, so I'm going to show you what's different is that I decided to do the math in the machine learning world where... So this is what I'm sort of showing you now, right? These are postulates, right? The number of A responses that you give me are equal to the number of A responses that you got correct plus the B's that you incorrectly label as A's. Yeah, true positives and false positives. Exactly, exactly, right? These are postulates, right? These are prior postulates, right? They exist before you see any observation about the test, right? Everybody's on board with the fact that this is postulates, right? And at the end that the number of responses you give, right? In a sort of election integrity, right? There can be no more votes than their voters. It obeys this. And I can rewrite this in terms of being a binary classification with these equations down here. Yeah, just to kind of connect that to the case of an election. So it's like one approach would be to take in the samples as evidence and try to estimate a continuous hidden state. And you're always going to get into this statistical approach versus treating the fundamental finiteness we had. I'm not going to use the NTQR, but those are the parameters that describe the finite printing and dissemination project of that project and that printing. There was a thousand ballots with five options. That is the state space. You could go into cognitive modeling. You could do narrative information. You could do interpretation. You can have this, all those things. But the finiteness and the discreteness then to get around this challenge of like, well, temperature like in, it could be like any number or it could be a number greater than zero. So how do you deal with that? And you take a kind of engineering approach here, which is you define the safety zone, which can be hierarchical abstract safety zone, generalized anomalies and so on, but you define the safety zone within the safety zone. You have the direction of free energy minimization statistically control or you just stay in the, in the fully finite worlds. And maybe the issue can be resolved by explaining away the semantics of the decision through mere propagation and procedure. Exactly. Logical consistency. Okay. So, so, so it turns out that, that so, so these two postulates, right? These two postulates here are written as a binary classifier here and you see that these equations are linear here, right? So that's the cube that you saw with the planes. And you saw that when I went to the ML, I showed you parabolas. That's the quadratic. Are people seeing that in the equations, right? That when I changed to the ML point of view where I'm dividing, right? Then I end up having these quadratic structures, right? In the, in the cube, right? You see the parabola? What does that represent to you? That's the ML space. That's where I'm telling you the accuracy as percentages. You see the percentage of you got half. Yeah, it has this test has the highest resolution when the coin is even 5050 statistics does the best job. When the coin is 99 and one, then three measurements give you very poor statistical power because you can't resolve 99 one from like 98 to very well. It's not quite like that, but it's close to that. Let's move on. Okay. So, so this so, so I'm not going to show you the math, but these two equations, right? This is where the algebraic geometry comes from. This is what we can observe, right? We observe how many times you said that it was a who can observe how many times it was B that you said that it was B. And I know it's related to your individual performance by these, these, these equations. It turns out that you can disentangle them. And there's actually, you know, because these two frequencies have to add up to one. There's only one postulate and that postulate is being represented here in the two different ways. It's either this one or this one, either in the ML world or the NTQR world. And here's where we come back to that thing about transmitting information. The difference between your responses is telling me something about the difference in your correctness. Just to clarify, like you use a single equation number 16. You're exactly the same test, but I'm showing, you know, P of A sub A is RAA divided by QA. So are you co-asserting these axiomatically, or are you declaring them equivalent? They're equivalent. And so they're co-asserted by equivalence. Yes. And so why do the shapes look different then? Because they're in different spaces. One is in the P space and one is in the integer space, right? One is in the integer ratios and the other one is in the whole integer. Okay, so it's kind of like a discrete mesh and it's continuous space and there's integrity between the discrete and the continuous. And you have very strong procedural guarantees on the discrete and then you have arbitrary or standard statistical guarantees on the continuous. But my problem is that I have difficulty seeing how a biological system could do the division one, but I see I can see how you can do the integer one very well. Right? That's why I'm bringing up the NTQR. That's why I think you guys would be interested in it because the ML space is not, I don't see how a biological system could be dividing one. But I can see how a biological system could build a cube that's 20-sided. Yeah, that's kind of like a synapse and the synaptic release, it's in vesicles. Like it's not like dopamine is a continuous variable at the synaptic level. The dopamine release is in one bucket. Yeah. So then we get to that. So this is the interesting part, right? So what is the problem with ensembles? Gosh, if we could just do my... So here's where we go back to heuristics, right? And my love of heuristics. Majority voting is such a great algorithm. It's so simple, right? It's so simple. If things were good enough, majority voting would just be good enough, right? It would just make us safe as it is. The problem with majority voting is that it can go wrong, right? The mob can be wrong, right? Things could be flipped in terms of their accuracy. So we need to consider now ensembles, right? And the problem with ensembles is that they can be error correlated, right? They're not going to be making their errors independently. So we need to handle that. And so here's where we come to the first set of what I would call non-trivial possible, right? Because this is what makes them complete, right? That if I observe two classifiers, this, right? They can only vote four different ways. And there's no other way. That's it. That's the completeness. This is guaranteed. So if I give you equations that describe these four different frequencies, and those are going to be the four possible, it's I'm done. And so, by the way, these are the surfaces for some of those planes. I'm going to show you later. Let me move on to there because it's going to be 430. So here's the correlation. So this is what I was saying before, Daniel, that the P's, right, are these things, right? The response is divided by the total number of questions, right? So either I can talk about R's or I can talk about P's. And then correlation is defined as something that looks exactly like you would define it for distribution, but it's here defined for a sample, right? So basically what I'm saying is I want to continue to write things as being individual products of performance. And so what I really want to have inside this parentheses, right, is the number of times that they both said that it was A and it was A. Well, that's this number here, but I'm not going to put that number, right? What I'm going to do is I'm going to stubbornly stick with individual performance and then I'm going to put the difference between that number that I need and this thing that I'm stuck with putting in, which is the product. And so these are the four postulates, four pair correlated binary classifiers. You only need two of them. You need one for the A label and you need one for the B label. And it acts exactly like you would expect correlation to do, right? When correlation is positive, it's going to increase the number of responses where they agree. And when it's negative, right, it's going to increase when they disagree. It has the right sign basically, right? Correlation here is plus contributing plus and in these two equations where they disagree is continuing with minus. Yes, something like a finite procedural recasting and reimagining of the true positive, true negative, false positive, false negative. Those are four actual values. And so again, within the finiteness of digitized sensing, especially you can take purely nonparametric, purely algorithmic, purely arithmetical numerical approaches rather than always passing everything through the kind of statistical meat grinder. I'm done. I'm done, Daniel. I'm done. I'm done. You said exactly, right? This is the thing, right? And so I'm not going to go back into Tom Mitchell on Platonius. One of the students came up with an independent solution, which is wrong. And I've been slammed by reviewers because they say they did it right and I'm doing it wrong and it turns out to be the other way around, but I'm not going to go into that. That's my personal beef. But let me then show you what happens when you take these equations, right? So what is the problem? The problem that everybody's encountered with pair binary classifiers is that correlation is entangled with individual performance in every one of these equations. Do you see that? 26A, 26B, 26C, and 26D have correlation in every single one of them. We cannot separate correlation from individual performance. But if you use algebraic geometry, you can. And that's what I'm not going into. You can separate these equations. You get back the individual equations. 33A and 33B are exactly the same equations that I showed you before. But then you get these new ones where you've decoupled correlation. So 33D, 33E, and 33Fs are the only ones that have correlation. And it turns out that they're the same possible. You get no information from them. So, and in fact, then everything in binary classification is controlled by this number. The number of times that you agreed in the B label minus the product of times that you said that it was B. It turns out that you can prove that for binary classification this is exactly the same as this other quantity. And the way that I'm going to use error correlation is this one, this equation right here. And the only thing I want you to note about this equation is that it's linear in the correlation. So if you tell me what you think the grade is and I observe how that you voted and how you differed in your voting, I can tell you what your error correlation is. I'm racing ahead and I just want to show you some evaluations, okay? So this equation is the solution for the prevalence of A's when the classifiers are making error independent mistakes. It comes from this quadratic polynomial that you get from solving that very complicated system that I show at the end. So now I need eight of them. I need three of them, right? So I need eight equations. So now I have i, j and k and you see sort of a descending symmetric pattern, right? This ladder goes down one way and it goes up the other way. These are the B's and these are the A's. And when you solve that equation, you get a quadratic. And it's that quadratic that I'm excited to show you, right? Do you get a quadratic equation which has these complicated factors that depend only on the statistics that you have observed? There's no free parameters. So this is not basic estimation in any way, right? You have no freedom whatsoever. So then when I do synthetic data where I've made it to have 19, 20th of one of the variables, it predicted exactly. But let me show you some experimental results. And the one that I want to show you is where it fails. That independent solution is going to fail if the classifiers are too correlated. It's going to give you an imaginary solution. Coming to comment on that? Yes. An imaginary solution means that the parabola doesn't touch the axis. And gosh, I look at this picture and I go, I can draw this circuit. I can make this circuit, physical circuit, right? So that I could, because right, I could make a geometric construction that draws this parabola and then I can check to see whether electrically it touches that axis. And this is a warning light. If you don't touch that axis, your classifiers are correlated too much, right? The error independent solution has failed. So this is something that no probabilistic method can do, right? No probabilistic method that assumes error independence can show that the error independence assumption is violated. That's the main problem with distributions and statistics, right? If you make assumptions, you cannot prove, right? Because you're tuning your parameters to satisfy your assumptions. But when you're doing things algebraically, like I'm doing here, there is no freedom. You can't be in a state where they, and actually I can show, because I can have a complete representation where I include all the correlations. I can prove that when I do these computations, if the classifiers ever have a square root that's unresolved, this is the independent algebraic evaluator. I can prove that that means that that set of classifiers is actually correlated. It has non-zero correlation. And below you're seeing that that's in fact the case, right? You're seeing that one of the correlations is almost 6%. And I've been able to detect with the independent solution that in fact there is non-zero correlation because I have an unresolved square root. That doesn't make any sense, right? These are integer numbers. That doesn't mean to say that you've got a percentage that is some square root, right? And this cannot be done by any infinite statistical, right? This is the distinction I want to make. There's finite statistics, which is about samples, and there's infinite statistics, which is an infinite number of samples, right? And there is finite statistics where you accept that what's gold is the sample. The sample is true. Everything that you derive from it is not possibly true versus the infinite statistical world where the world, the sample, is always wrong, right? In statistics, a given sample is always wrong, right? The equality only occurs on the average of the samples. That's a completely different way of thinking, yes? And I say that the finite one is the one that I want to choose in terms of safety, and it's 430 exactly, so I'm going to stop because I feel like I've been talking too much and maybe people can chime in and push back and have interesting questions. Yeah, anyone in the live chat and then Yacob, go for it. Yeah, it's... I just want to say how crazy this is. I'm giving you error correlations, which are the only logically consistent error correlations once I decided to do majority voting or not. And here I'm showing you that majority voting is kind of like a little bit too hysterical. It thinks that they're 25% correlated, and in fact, they're only at most 1%, and so my method of using the posthumates gives a better self-consistent solution. I haven't gone into that, right? But basically what I do is I take this number, which is not at all an integer ratio, right? And I find the integer ratio that's closest to it. And then I ask, for that integer ratio, what is the only logically consistent error correlation? And I get these numbers. That reminds me of the finite pi, like 22 over 7 is a classic small approximator of pi, and you can go into finite approximators and therefore get closer and closer incessantly, keeping the strong guarantees. Right, but at some point it's useless. You're doing the width of a hydrogen atom as people like to point out, right? Your error in the circumference is going to be the width of a hydrogen atom. Who cares, right? And so a finite heuristic is going to do it very well. Sorry, Jacob. I had to interrupt you there just to point out that the fact that I can even measure error correlation and say that it's logically consistent is incredible. I just have to say that. Yeah, and no worries. I'm still trying to wrap my head around the concept of disentangling the semantics from the particular tool that you're using. Right. Because I feel like it also kind of depends on your expectations as the person using these tools, like going back to the example of A or B. If you take away the particular content of those questions and you only look at the statistics of the answers. Of correctness, as I would say. Statistics of correctness. Statistics of correctness. Then perhaps you are taking away the correlation from the particular content of those questions. Yes. Are you taking away the correlation of, for instance, the person who said those questions? Oh, my God. It's so good that you were saying, right? So here's what I had to do at UMass when I was giving the exams, right? So UMass Amherst is a state school, right? So you get bubble sheet answers, right? Exams at UMass Amherst, right? But I taught at Williams and at Swarthmore, right? I never gave bubble sheet exams at Swarthmore or Williams, right? So it's only right when you have large classes that you have these things, right? And so at UMass, right, the students would cheat. So they told me you have to make five exams and rest. The student is one color exam. And then you see people in a diamond pattern around that student so that everyone around the student has a different color exam. And then you scramble the responses. So nobody can look over anybody's shoulder and copy the bubble, right? So what am I trying to do there? Why am I trying to do that? Because I want to prevent the error correlation that occurs because you cheat it. And I'm trying to find out the error correlation between all the students that is due to a commonly shared cognitive misunderstanding of Newtonian physics. Maybe I put a question that is testing their belief in Aristotelian physics, right? And I want to detect that. And so when I get a wrong answer in a question, the purpose of the exam for me as a professor is to find out what I should be talking about the next class, right? To address the mistakes that people are making cognitively. I want to design a test that blocks any error correlation except the cognitive one. And so that goes to what you're saying, Jacob, right? You get to design what error correlation, right? Somebody's error correlation, right? Maybe a goal for you, right? I as a teacher want to see cognitive error correlations because I want to help you and teach better. And when I talked about digitizing, you brought up another thing, Jacob, which is that maybe in the original signal, the responses were not correlated. But then when I digitize it, I correlate it. I have no, and I think Daniel also talked about that, right? That when you change formats, right? You're introducing spurious signals and all of them are in there. And yes, there is no guarantee, right? But by the way, there is the advantage here that I can measure the error correlation, right? So that you can see whether I did introduce, right? By digitizing the logical consistency of your grading if I introduce correlations or not that existed in the original format. Does that make sense? There's no free lunch. You won't get anything for free here if you really think about it, right? It sounds magical, but it isn't. It's okay on the free lunch theme, kind of thinking about this in a fun way. There's a lunch where you just show up and it's there for you. So that's kind of the dream scenario if you're one of those who eat lunch at that time because it's kind of like you've chosen the axis to create a finite projection to... It's the right test, the seating arrangement. There's just an ideal situation. And then some of those maybe actually perhaps even relaxed in practice. Like, I don't know all the details, but when you describe that you're taking an integer approximator so there's ways of integerizing and finiteizing that strip semantics but then I thought about what if the ship of fools, what if none of them can be the captain? Well, then the ship's going to go extinct and then it doesn't matter what... That's the imaginary parabola that doesn't make contact. It's like, no, there's no decision rule. There's also no quadratic regression, but this is just an imaginary solution. These people are talking about survival strategies that are imaginary. Yes, so you get the other part about it, which is why it was good to put it in a philosophical journal which is what if there is no answer key? What if we're all deluded and we're just testing with a test our alignment on a delusional belief? I explained this to an ex-wife of mine who's a psychoanalyst because I explained the concept of logical consistency versus logical soundness, by the way. With logical soundness, you need to know the truth table. Logical consistency, you don't. You're just checking to see without semantics that you're cranking the logic machinery correctly. If your premises are correct, your conclusions are correct. So that's what I mean here by logical consistency of grading. I cannot tell you exactly what the grade is, but I can tell you whether you're being logically consistent with how you believe your experts are behaving. That's the only thing that can be checked here, but it can be provably shown that you are in a highly correlated state and therefore you should raise an alarm. So when you look at what this could be, you could think of it as a way of selecting. If you have different models and you find the one that's most aligned with what your beliefs are about what the truth is, then you have instant compression, just select that model because you found the one that's most aligned with everybody and so you don't need to keep all of them. But there's nothing to say that you're not hallucinating and that you're going to be eaten because you think that there is no tiger there, but there is a tiger. You could have false beliefs and be delusional. Confident and alive, not confident and soon to disease. Exactly. And so there is nothing here that prevents that from happening, but it would allow you if you were initially in a good operating state to start detecting something that is malfunctioning, so that you could prevent that from happening. Yeah, I'll ask a question from the live chat. Upcycle Club writes, I have a question. How do you define the data sketch of the decisions? So that was in the statement about completeness. So whenever you look at an ensemble, by the way, if you know data streaming, there's data streaming algorithm for every statistic. Sample statistics, there's many of them. Of course, with a finite sample, there's only a finite number, but there's many of them, combinatorially explosive. So here it is. If I'm talking about three, I have to talk about two to the three or eight different frequencies. If I'm looking at only their alignment, but that's not the only statistics I may want to. Suppose that you're looking at classification of DNA pairs. So there's also sequence information. So you may want to know how accurate you are on the point, on the given base pair, but you may want to know what percentage of times you were correct on both pairs. So now you're keeping tab of another set of statistics, which is not the one here in equation 38, because now you're comparing two different points on the DNA sequence. And so you could be keeping statistics about the observation 10 steps back or 10 steps forward. So there's an infinite number of statistics that you could be doing. So I've only shown one small set of them, and for every one of them, you're going to be able to write this summation. It's a part of, if you will, of the basis of probability theory that you have event space and that you can delineate event space completely. So you have completeness. There is no event that you do not know about. And by the way, there is no theory about all the theories of the world. This is why you cannot do this with theoretical work. Because we do have no theory that can enunciate all the possibilities of all the possible theories about the world. This can only be done with sample statistics. That's how I get away with this. To kind of reflect that back, first off, again, I appreciate this presentation and work. I encourage people to look into this because it's very interesting and to connect to the prior century of statistics and understand where exactly this is positioned and how deeply this statistical paradigm is baked into the safety discussion. And to this last point that you're creating many possible finite spaces, the error patterns and covariances can be harder to fake than the mean. You could just take one number and say... Oh, my God, Daniel. Oh, my God. Okay. Oh, my God. I mean, come on. Yes, exactly right. You can fake the signal, but you cannot fake the correlation. It's so much harder, right? It's so much harder to do that. Yeah, if they're faking the mean and the variance and all the higher coordinates of motion, then it's indistinguishable. Exactly. Exactly, right? Yes. So this applies to the concept of enemy inside the gates. How would you detect that something inside of you is hallucinating right now and you shouldn't be paying attention to it? Right? This would be a crucial thing for any... Right? Let's go back to the original question, right? Forget about what I said. If it's not how I said it is, it has to be something like this. How can an entity have intelligence, have integrity of thought if it doesn't have a way of detecting enemy inside the gates? Yes? You have to defend against yourself. Talked into security. Yes? Right? This is what I think is so interesting about this. Right? Forget about... Right? The number one enemy you have to worry about is you, your hallucination that is preventing you from seeing the tiger. I would think that that would be the primary thing. And like going back to the semantics, I can see how this can be used for binary classification of right, of safe, unsafe, right tiger, not tiger. But then, because the mathematics is exactly the same, how a biological system could co-opt exactly the same computational structure to just check correctness of very complicated opinions, not just binary classification. How about Jakob, a last thought, and then we'll just each have a closing word here. Yeah, I'm trying to connect these thoughts to, I guess, the active inference formalism where how many layers of metacognition do you need to realize or elicit this type of self-reflective behavior on the different systems that you're modeling. To all of them, right? This would be used everywhere. Like crazy, right? You could use this over and over and over and over and over. Right? You check every single component, right? It has no intelligence, so therefore it can be used, right, anywhere. That's what I'm saying, that this is the thing that I find fascinating about it because it has no intelligence and it stripped the semantics. You can insert the instrument anywhere to check statistics of correctness of your Norse or oracles. What that makes me think about is Chris Fields' recent course, under it's not saying you watch this or anything like that, but what he spoke to was that there's a classical screen that can be digitized, a physical screen, and then there's two agents that have these semantic reference frames. That was done in a Bayesian statistical setting and you have kind of added discrete beads to some first approximation inside some of these otherwise distributional considerations. Like if the two thermometers can be from 0 to 10, that's how many pairwise observations there are. You can have all kinds of distributions and flows and that type of modeling and continuous state spaces, or there may be a whole variety of activities available in the discrete cube, including these things like one over and seven up, and then that correlation, again to make the point clear, it can be basically detected and triangulated where discrete differential is occurring. In active inference, we have the kind of continuous optimization machine learning, but this is more like pinpoint like this bolt is failing rather than like there's a 0.8 that one of the bolts is loose. No, this is failing. Yes, it's definitely trying to identify this is failing, don't pay attention to it, maybe take it out of the ensemble. What are the next steps for your work? So kind of going through the binary case has made me realize it was very hard to figure out the binary case. I've been working it since 2010, but now that I've finally gone through the other end and I figured out how to make it simple, I'm starting to be able to write down what the solutions have to be for the three label case, so the R equals three test. And so what I've not gone into is that it goes back to that issue of these two numbers being the same 34 and 35. There's no more information in the binary case, but in the three label case, you would find that these three things are not the same, right? So strangely enough, when you have more responses, there's more information because there's more chance of disagreement, and disagreement is the most informative thing that you can have for an ensemble. Agreement is useless. Agreement is a tiny portion of these equations, right? Look at these equations. These equations, there are four of them, right? Well, not this one, but in the eight one, you can see it. Yeah, the maximally informative coin flip is 50-50. That's the statistical way to say this. Right? But here, right, there are eight equations, and only two of them are when they completely agree. And this is what people like to talk about. Platonius and Tom Mitchell talk about their agreement equations, where they just wanted to talk about 41 plus 48 at it. And I'm like, why? Why not look at the full set of information that's available when you consider 41, 42, 43, 40? You know what I'm saying? Each one of these is true. It's much more complicated when you have correlation. Then you have pair-wise correlation, and then you have three-way correlation. But my point is, right, that these things are quite complicated, but they have a very predictable structure. This makes me think about it being a meta-observation. Yes. It's a second level. It's an actual calculated... It's a monetary factor. It's an actual fact. As much as the thermometer's reading is a fact, the thermometer that measures the difference between them is also a fact. That's right. And then you have to decide, well, I know that I'm in this domain, and this is temperature, and temperature's not going to change very quickly because it's a car engine, right? So it can't actually fluctuate. You know what I'm saying? That kind of thing. So if you see something that's fluctuating too much, you know that it's wrong, right? That's domain knowledge, right? Being applied to then take a measurement, right? And then interpreting that measurement as being a safety issue or not, right? Again, the computer is needed, right? The thermometer is providing the reading, but the computer is needed to understand why. The other analogy I like to make is like a murder mystery. A crime has been committed. With this method, you can interview all the suspects and find the illogical consistency the suspect, right? Because they have a story that doesn't compare to anybody else. But you cannot find why and how. Because there's no semantics of why people commit murders or how they commit them, right? So this helps you solve the crime. Who committed the crime? You know, this algorithm is malfunctioning, but it cannot help you figure out why or how, right? That you need a higher cognition, right? Thank you again for joining. Thank you for inviting me on this wonderful discussion. Indeed, perhaps a 67.2 in a future time. Yes. Thank you. Thank you.