 Okay, are we ready to go? Everybody? Okay, so I'm going to start in a medi-RS, as they said in this neck of the woods a couple of thousand years ago, where we're talking about universal turing machines. Everybody remembers and has completely integrated into their genome itself the differences between a deterministic find automata and turing machines and so on? A heartfelt cherry yes. Good, good, that's exactly what I wanted. So now we're going to be talking about some more of what the very powerful things are that you can do with turing machines and some of the rather amazing implications for all of human mathematics. So remember last time I was talking about, is there any bizarre chance that this would work? I doubt it. No, no power. So last time we were talking about universal turing machines. These are turing machines which actually take two inputs. You can think of these as just one long tape where there's a unique code that takes the bit string on the tape and maps it to two inputs. Alternatively, you can think of it as a variant of a turing machine which actually has two tapes. It doesn't actually at the end of the day matter for any of the results that I'll be presenting for actually any of the results in computer science theory. But if you recall what a universal turing machine does, it takes its second input, interprets that as a... Audio is not very good. So I think it would get better if you take off the mask. If you don't mind. I'd prefer to... Okay, it's being worked on, I guess. So anyway, recall that the universal turing machine, remember that the set of all turing machines has the same cardinality as the integers so we can actually represent every single turing machine as a unique bit string. The universal turing machine takes its second argument, that bit string, interprets it as a code for a turing machine, and then it runs that turing machine on the first input to the universal turing machine to produce the output of the second argument if it ever halts. So the way to think about this is it's just a interpreter. Just like in computer science, in your laptop, you could actually code up something that does this. That it takes a specification, for example, in some other language of a program, along with an input to that program, and then it actually runs it. Okay? And recall the Holton theorem says that there is no turing machine that is total, meaning that it actually comes to an answer for every single input, that you can use to compute whether this universal turing machine will halt for every possible input. So in other words, there's no turing machine such that you can give it u and v, and it will tell you does v halt on input u. This is an uncomputable function beyond the capability of human minds. Okay? We cannot do this. It's a well-defined function, but it's beyond our 10. And, okay, so now this is not working again. There we go. The proof, it's relatively straightforward. It goes by, well, there's all variations of it. Cantor diagonalization again. It's a central theme in many, many branches of mathematics. Here is one way to use cantor diagonalization to prove the Holton theorem. To find this function, h of i, j, it equals 1 if turing machine i halts one input j and it equals negative one otherwise. So this is the turing machine that we would want to be able to exist. It will be when it actually computes h of i, j. So by hypothesis, let's hypothesize that there is such a function that's defined for all of its inputs. That means that the negative of h, which flips things around, is also defined for all of its inputs. So here you list all the turing machines. Remember, they have the same cardinality as the integers. Here we list what the output, what the value of h is for that turing machine and this initial value into the turing machine. Here we look at the negative function on the diagonal only. And there's cantor diagonalization again, that you can prove that there is no turing machine along this, in this list that can actually input this. That can, sorry, that can produce that as its outputs when it's given this turing machine with that particular input. Okay. Here is a slight variant, in some sense even more compelling than of the halting theorem, which says that there is no turing machine that can tell you whether some other particular turing machine will halt on a blank input. So we're now not even worrying about the inputs. Just as it's using the program itself when there is no turing machine that can say whether your program will actually halt of no input for an arbitrary program. Can anybody tell me, I guarantee that everybody in this room has encountered many, many times programs that actually do not halt and they just have to simply kill them. So can somebody tell me an example of such a thing that you have actually run into to make this bit more concrete? If you ever run a piece of code that enters an infinite loop that you did not want, then you just actually encountered a variant of a certain aspect of the halting problem in that you did not know ahead of time that it was going to actually for that input enter that infinite loop and never produce an output error. So that code that you wrote, it's a partial function. It was not defined for that particular input because it never ended. And this is saying that the same kind of a thing can happen even if your program isn't actually having any input at all, just a blank string. So why would this be interesting? Here is a very fascinating concept. It goes back to the 50s, I think, 50s or 60s. It's called a busy beaver function. So let's write some terminology. Note that for any integer, there's only a finite number of Turing machines that have that integer number of internal states. There's only a finite number of Turing machines that have, for example, four states because the rest of the aspects of the Turing machine are defined by its particular update function and which of those states is the initial one which is the final one. There's only a finite number of those things. So in general, we can define this right here. So this is the set of all Turing machines that have N states and there's going to be some number of them. It's not a very big number. For each one of those, for each such N, every one of these Turing machines is either going to halt on the input zero or blanks after S of TNI steps or never halt. Technical difficulties are just going on. Everything okay? I'm sorry. I'm going to have to interrupt for a minute if I have a problem with the audio. Could you turn this in? Mm-hmm. People in Zoom, can you hear me better with this microphone? Yes. I'm using this while we solve the problem and I'll take this away. Okay. Okay, sorry about that. Fortunately, I have two arms. I want to hold my mic one to gesticulate, but I actually need a third one, I suppose, to control this. But in any case, so for example, we're all good to go. I'm hearing somebody for this course. Infinite loop. Everything all good to go? Okay. Should I move that over here? Okay. Modern technology. Let's see if we can actually manage what was current in the 1930s. Okay. Everybody then back into this reality from the matrix. So anyway, let's say N is four, so we are on this right here is the list of all Turing machines which have four states. For each one of those, I'm going to write S of T and I for the number of steps it would take to halts. There is some function S that for any Turing machine says if I give it a blank, what are the number of steps before it halts? Or instead, this can simply be undefined that it never halts. Okay. Define the busy beaver function to take this number of states and then simply say what is the maximum value, what is the maximum number of steps that any Turing machine with that number of states takes to halts assuming that they ever halts. So you don't include in that max any Turing machine that never halts. So for example, for N equals four, this is going to be the number of steps any Turing machine with four states would take to halt if given an input string that was all blanks. Okay. Everybody understand what the... It's kind of a little odd thing, but it's called the busy beaver function because you have a busy beaver, I guess. But anyway, here's the theorem. BB of N, the busy beaver function, it actually grows with N, its argument, faster than anything you can ever write down. You could play all kinds of games of N to the N to the N to the exponential, write down whatever function you want that's increasing. BB of N increases faster. So we're talking about some of the most simple mathematical concepts that are beyond human capability, just does a function increase. This has got nothing to do with P equals NP or Poincare conjectures or anything like that. This is provably a harder problem than we can ever solve to just make something that increases that. And the proof is actually quite simple given this puppy up here. Computer scientists are sneaky beasts the way that they leverage their theorems. Let's see, it's kind of blocked on my screen. But let's say, hypothesize that we actually could compute the busy beaver function. Then what we could do is we could write a piece of code that runs all the Turing machines with say four states for BB of N plus one steps. We know by the definition of BB and N that any Turing machine that has not halted by then will never ever halt. But by the halting theorem we know that's impossible to write a program that will tell you that a Turing machine halts. Bing bang boom. This number grows faster than anything you can write down. Oh, you limited little creatures, you. Nobody should believe this, by the way. But you go back and the math is not going to give you any wiggle room. But it doesn't make sense. Okay, now on to one of the central concepts of... Oh, I don't know all of reality. Sorry, David. Can you compute it for N equal to the default? Yes. And it has been computed for very small numbers. It depends on your precise choice of the universal Turing machine that you're using to define this. This right here is implicitly there's a universal Turing machine. But it can be computed for small numbers. If the universal Turing machine, for example, back in the 50s this was computed where your universal Turing machine was the language Lisp. It's also known what are some of the numbers that are too big for us to be... that actually they are uncomputable. So it is known things like what the Busy Beaver function is for a particular programming language for N equals 456 or so. It is also known for particular languages that once you get up to things like Busy Beaver function of 50 that that number is actually uncomputable. Okay, so you can actually do some things concretely. Okay, so just because you say that, for example, as an answer is a part of your answer that, for example, it depends on the specific like the universal Turing machine that you're using. So is there a way to cheat on this by like collectively using Turing machines and so on so forth or it's just like... There's an infinite number of universal Turing machines. Yeah. So we can't do it collectively, but if you remember I don't want to actually change this. But if we go back several slides there's what's called the invariance theorem which is crucial which computer scientists take it to mean that we don't need to worry about the actual Turing machine that we're using. Actually, am I going to get to it in a second? Yes. Give me a slide or two and I will address that. Okay, so comagoral complexity. Everybody here has heard about things like how to measure the complexity of the universe and people especially from the High Sangre de Cristo foothills near Santa Fe, New Mexico are all really besotted with the notion of how do you measure complexity. One of the early notions due to this well, I often I often characterize people like comagoral or Shannon Turing as being aliens or people from the far future because they just did way too much with too much brilliance there's no way that mere human beings could have done this. But in any case one of these semi-aliens comagoral he came up with a definition of complexity that actually what was motivating him even though he came up with all probability theory he said yeah, that's good but nah I want to actually be able to come up with things like how do you measure the complexity that don't depend on probabilities so he did a lot of work on trying to understand how you could define something random without worrying about probabilities and also in this definition here how you can measure complexity without using probabilities and this is also wound up with like Ray Solomon often did this used the same kind of a notion trying to produce a foundational derivation of basically Occam's razor and machine learning I would expect many people have actually done that I know I redid that when I was a grad student it's not very hard to come up with this kind of an idea and what it is is that you say the complexity of any given string is the length of the shortest program that actually computes that string program meaning Turing machine so for example if I give you a string that's 10 to the 5th 0s it has very very small comagore complexity basically here's a bit of C code that gives it to you Pi say the first 10 to the 5th digits it also has small comagore complexity but this thing here where these digits are for example made by quantum mechanical like a Geiger counter or something it has very large comagore complexity because the shortest program that produces it has length about 10 to the 5 that program is print and then it gives you the number okay so this is a way of measuring how complex a string is basically how long a program you need to actually create it so we can formalize that by saying for a fixed universal Turing machine let's define L of P is the length of a string P the comagore complexity just formalizing this is the minimal value of any program its length such that this particular universal Turing machine when fed with that program will produce the output that you want that's comagore complexity okay and now going after one aspect of a GeoJ's question here's some characteristics of comagore complexity first of all it's obviously defined for all of its inputs but as you'll see in a little bit no surprise we cannot actually compute it for all but a finite number of inputs even though it's got an infinite number of possible inputs so not only is it uncomputable in general but in fact there's only a finite number of inputs that we can compute it for at all also there are very few strings with small comagore complexity the way that you can see that is that for any particular constant K the number of strings with comagore complexity less than K has to be less than 2 to the K because the 2 to the K is the is the maximum number of strings without length so those are the maximum number of input programs that you could feed into your Turing machine to produce an output and even if you were lucky and all of them actually produced the string that you want there still would be no more than 2 to the K of them here then is the question that a GoJ was an aspect of the question that GoJ was going after let me consider any two universal Turing machines U and U prime then there is some constant that depends just on U and U prime and is independent of the actual output string whose comagore complexity we are considering such that it bounds the difference of the comagore complexities since there is an infinite number of these output strings S this is taken to mean that we can essentially treat all universal Turing machines as identical as far as comagore complexity is concerned the proof sketch is very very simple this function f that is the length of a compiler that basically translates strings for U it's for example if U is C and U prime is Python then this f is going to intuitively it's just going to be a compiler that cross compiles C code into Python code okay alright as I said comagore complexity is uncomputable here is some intuition for that this is called the Berry Paradox it's a very famous paradox from the turn of the 20th century to Rosen the foundations of mathematics people like Russell and David Hilbert and so on and so forth here is the question what is the smallest number that requires more than 70 letters to define there are 70 letters in that question obviously you cannot produce an answer to that question because you would have just defined it by that question let's say the answer were 72 well you've just actually defined that number 72 but by hypothesis it takes more than 70 letters to define it even though you just did that with a 70 letter question okay so the basic idea is you translate that into an output string say what is the number of letters required to define that number and then you can basically show that the instantiation of this in terms of an output string cannot be computed here you also go past that here is a version of it that in some ways is far more compelling far more daunting this is sometimes called Chetan's incompleteness theorem sorry or the Berry paradox yeah remember the invariance theorem so for all strings the difference of the comagore complexity for all languages is going to be less than some particular function of those two languages so if this is actually undefined for one it's also going to be undefined for the other Chetan is in the completeness theorem let's walk through it a little bit carefully there is some constant k and you can actually using lisp and so on you can come up with bounds on it and this constant depending on your language in the order of several hundred but there is some constant k such that there is no Turing machine that always outputs just a single bit yes or no and such that with the it takes as an input a string s and some k prime greater than k it outputs it holds for the output yes for some input and it will never halt unless its answer is whether the comagore complexity is greater than k prime in other words this is a very careful mathematical way of saying there is some constant k but you can never ever prove that the comagore complexity of a string is greater than k think about it there is only a finite a small number of strings that are output strings whose complexity is less than k there is an infinite number that is greater than k we know that but because we cannot enumerate all the ones whose comagore complexity is less than k we can't actually pick out any one of those infinite ones and say yeah this is in that set is comagore complexity is greater than k in terms of physics that means for example you can never prove that the comagore complexity of the precise position and phase space of all the gas molecules in this room is greater than say about a thousand or so it could be that there is a program whose size is less than a thousand for any fixed touring machine that program would actually give the position of all 10 to the 24 or so gas molecules in this room in phase space basically you come to the edge of mathematics and then you fall off and there is an infinite c down there and you can't figure out where you can go infinite set of mathematical questions with well defined answers that humans cannot solve okay let's see let me skip past some of this stuff skip past that okay the definition of comagore complexity that I've just been giving to you it's got some aspects of it, some characteristics that one might object to as being not particularly aesthetic at a minimum one of them is that it's not what's called sub-additive or complexity for any two particular if you have two arguments we would like it if this is going to be a measure of complexity we would want it to be the case that the complexity of two particular outputs to a program taken together is less than the sum of the complexities of each of the two outputs for example if my two outputs one of them is pi one of them is square root of 2 what is the shortest program that will actually compute pi and square root of 2 I would like to think that that's going to be less than the length of the shortest program that will compute pi plus the length of the shortest program that would compute square root of 2 but it's actually not true with this definition of comagore complexity I just gave you a very related problem is it's not called complexity and here is one where we're starting to get into really deep waters to find this function g of x what this is is the sum over all programs that you're inputting into a program which halts of 2 to the minus length of that program there are many reasons especially in the foundations of machine learning that you would want this to be a semi-probability distribution but where the sum of all of its values can be less than 1 rather than having to equal 1 but there are many many reasons why we would actually want this beast right here to be a semi-probability distribution unfortunately when we're using an arbitrary Turing machine this actually equals infinity in general so these are some aspects of how we can so these are some aspects of what I've been presenting to you the past couple of days cool as it is profound as it is as much as it should be the case that if you've got any kind of emotional fiber in your being you should not be able to sleep for the next 7 nights because you're staying up light at night worrying what does this mean for my perception of reality despite all that there's actually some problems here well I don't know how to fix them I don't know whether this has to do with perception of reality or computability of reality they are the same same thing same thing I can reduce this to a simple joke everything that's profound, truly profound you know it's profound if it actually ultimately can be viewed as a joke that the universe is playing on us it's a simple joke from the Hitchhiker's Guide to the Galaxy that's got to do with the famous question about 42 and what it says is that blah blah blah and then at that point the universe disappears in a poof of logic the universe disappears in a poof of logic this is the kind of inspiration behind that Doug Adams statement so being a little bit less glib I don't want to get into this now but there's some very good work in philosophy it's called ontic structural realism Max Tegmark has a series of papers in which he reduced it to mathematical statements in first-door logic representation theory concerning the foundations of physics and I with a collaborator we're extending it to a model in which mathematics itself is actually a stochastic process and basically and I would argue this long and hard every time I have argued it long and hard frankly I've won that there is no way that you can justify thinking there is anything more to reality than math it's all there that is the lesson of quantum mechanics nothing about wave particle duality this is all math no more real or unreal than some other body of math because by definition any experimental test that we can do in this body of math is given by this body of math there's no concrete reality that we are going to this gets into the foundations of philosophy and so on but basically what I'm doing is I'm pushing back very hard right now I'm elaborating what Gilder just said that computability, reality you sleeping late at night they are all one and the same thing so anyway topics for discussion later on so I'm getting back to reality here or at least to the weird dude up in the lecture giving some slides there is a way that we can actually solve these problems and to do that we are going to restrict attention to what are called prefix free Turing Machines these are a subset of all Turing Machines that if you pick any particular prefix Turing machine that has the property that the set of all inputs that it halts on is a prefix free set remember the definition of prefix free I'm Gilder went over it very briefly I think it's foundational to sharing coding theory for example all of sharing information theory is assuming that your actual code words your code book is actually a set of prefix free code words can you go back to one one comment one question first comment for example if you take that h of x less than or equal to 1 it crafts inequality basically the first thing that we showed in the lecture next slide we kept talking about how universal Turing machines are and to solve a problem of universal measure of complexity we are restricting our attention yep just to solve these it actually there's not a universal solution to this problem we're just like oh this is um correct this is a universal nobody has proven that the only way of solving these particular problems is by restricting attention to prefix free Turing Machines people have found that is a solution to mathematical problems so everything that is being done is an elaboration on that particular way of getting around these problems so just as a follow up question I mean I'm not asking this as a student but just as an observer okay so if we want to come up with this solution and if it's not working universally then maybe the complexity measure is not that universal but this is a variance of complexity that is okay I mean I'm not sure exactly what you mean by universal but I mean you cannot solve this problem without restricting our attention to this third one right you'll see okay okay okay so um here is why this is a solution to the problem recall that there exists so here's a theorem there actually exists universal prefix free Turing Machines so even if you restrict yourself to ones that only hold some prefix free sets there are those that are universal so they can actually compute anything so what that means is that if we choose any prefix free Turing machine U and any other one which is not necessarily prefix free then for all strings the difference in the comagore complexity of those two Turing Machines is less than this constant so as far as calculating the difference of the shortest string that gives you under that Turing universal Turing machine a desired output is concerned as far as that problem measuring complexity we're fine restricting attention to prefix free universal Turing Machines because we get the correct answer to within a constant and that constant is something that we've got to eat no matter what when we're using Turing Machines okay so let's see once you so this variant of comagore complexity where you're restricting attention it's often written in the literature like Lea Vitani with the K of X rather than a C of X it is sub additive as Gilger was emphasizing by crash inequality we now know that because we're restricting ourselves to the prefix free set the set of programs that this will halt in particular X that's prefix free therefore by crash inequality this actually is a semi-probability distribution okay let's see let me skip through all this cool stuff okay now some more profound things so I just gave you instructions if you want to pass this course you're not allowed to sleep for the next night because if you do sleep it means you don't really fully internalize what I'm saying I'm now going to also add another requirement that you actually have two weeks in which you're not allowed to sleep because you're wrestling with what this means for what you thought reality was something that a Chetan came up with sometimes it's called the Chetan's Omega it's the halting probability it's defined for prefix free universal Turing Machines the halting probability is simply given by this sum internally speaking it is a probability that if you send in a randomly generated by flipping a coin a uniform probability randomly generated input string feed it into a prefix free universal Turing machine and then if you're stricting yourself to such inputs that are in the prefix free set like free Turing machine oh I'm sorry I thought that this was given in the earlier lectures so for example it's actually prefix free is a property of a set of strings so to give a very simple example actually that won't work that's a prefix free set very very simple dumb one so in general it's a set of strings S of I such that there's no R and T that are both in the set where the concatenation is in the set it's related to what are called instantaneous codes and this is what I say the foundations of Shannon's coding theorem and so on the basic idea is that if I'm actually trying to decode a set of bits coming in I can know when I've come to the end of a set of bits that I should now say oh okay that set of seven bits I just saw that's actually a code word let me go look it up and then the next bits that I'm seeing is the next code word so it's an alternative to the dumb way of sending a set of multiple words down a channel which would be at the beginning of each word you specify the length of that coming word the number of bits instead here you're actually just using a code such that you can avoid that stuff entirely because you know when you're done okay is that clear to people okay cool so the halting probability we've got a prefix free universal turing machine so it only halts when the input is a part of this prefix free set because it crashed into quality we know that this probability is well defined to actually you can run this on your laptop to start and this has actually been calculated for several digits of omega and again it depends upon your precise choice of a computer language but what this is is saying you just generate random bit strings by random I'd ask me I'm flipping a coin the uniform probability to get each zero and one I know it's uniform because there's a two to the minus here and I just start feeding them into my universal turing machine and any one of them that actually halts I'm just going to add up this value here which is two to the negative the length of that particular input that caused the universal turing machine to halts so you can write a piece of code that does this as I say crashed into quality ensures that this is all well defined the successive bits of omega this is a single real number and in that real number omega this is like God's number and that number is included the answer to every mathematical question any human being will ever be able to ask it's in there and the way it's in there is very very simple you just define a prefix free turing machine universal turing machine such that for what it does is it takes any input interprets it as a mathematical question and just starts running through all possible proofs where it's going to be saying is this question the answer to it yes or is it not and if it halts on that particular question the answer is yes and so you know that's actually you can show that's going to be one of the bits in this expansion loosely speaking each one of these bits codes for a different mathematical question and whether the turing machine halts on that question or not which is giving you the value of that but that's telling you what the answer to that question was p equals mp is in there everything is in there sorry about so there will be many many questions whose length of the program is equal is sorry any question I need to think about it good you're not going to sleep so Matteo passes the course so basically any question that is well posed in the sense that you can write a computer program that can start working through all proofs in second order logic to try to prove that to try to prove a particular answer to that question such that if that program ever stops and then you've got a proof whereas if it never stops then it's actually not provable so that's the precise sense it's got to do with saying that if the program is checking through all proofs of your proposition it's not by definition means it's not provable your proposition so my question is looks like so if you consider the coefficient of 2 to the minus n then this is the number of questions for which you have a proof of length n right yes you can work through basically as I say I don't want to go through the details of it there's actually a nice scientific American article I think about this from many decades ago basically the successive bits so let me be more careful when I say all a mathematical questions we can ask answers what this actually tells you is every mathematical proposition where you can actually write a program and tell that program to halt if and only if it comes up with a proof and you can choose it to be second order logic or any logic you want that the actual of what the proposition is and otherwise not ever halt that is in here so for example the continuum hypothesis that there's actually a cardinality that is greater than the integers but less than the real numbers this would never actually halt because you cannot prove the continuum hypothesis we now know that you can construct a consistent mathematics in which you can either assume the continuum hypothesis is true or you can assume it's not true similarly the parallel line postulate of Euclid we now know of course that you can it can't be proven either that there are either the parallel line postulate or the negative so you would put this into the halting probability it would never come out and say I found a proof of it oh it's very very much girl and beyond I have one question two questions one of them is easy the other one is dumb super quick so we can actually find the solution because successive bits that construct the codeword is uniquely decodable because we have perfect three essentially we can uniquely identify the solution then the second question is that let's say that I want to encode a question so you say that because you say that math is underlying the foundation of physical reality so as a physicist let's say that you want to encode some question and find the answer I understand it it's all great like speaking mathematically on paper but what if you want to encode something and just find the answer in other words if we are a couple hundred years from now one of our starships lands on a planet and there's some we find the ruins of some advanced civilization might we when we open up their vault find a device that's calculating the successive bits of the halting probability what would that device be like thank you that's the smart version of the dumb question that I have well if you remember this science fiction movie on this topic that actually hasn't been made yet but that's irrelevant because we're way past simple notions of time in this course basically nobody knows what it would be doing unless you could right now do this on your laptop Matteo could start doing this there's a language called prolog which actually generates all proofs to start running prolog programs for every single mathematical proposition you can think of so in practice no one knows how to encode for example this question problem of p equals mp so no one actually tried oh no no it's very easy to do it but it's just that when you start running that program so far it's never halted doesn't mean that it won't at some point halt but so far it's not the set of all mathematicians they are a Turing machine and they are running that program and so far they've not halted when you view that way there's a bit in here that's saying how will the Ukraine war end up anyway okay so now to oh boy time is going to be tough um okay another nice amazing thing about my prefix for universal Turing machines is they allow you or more precisely somebody much smarter than you called um Levin he was a Russian mathematician to prove um what's called Levin's coding theorem and I won't go through it but basically what Levin's coding theorem allows us to do is to prove that with um using prefix free Turing machines comagor of complexity behaves just like Shannon entropy does by the way I didn't mean to disparage people when I said much smarter than you he's much smarter than me he's much smarter than everybody one of these Russian mathematicians is just brilliant um but anyway so we can uh use um uh prefix free comagor complexity we can define algorithmic mutual information it obeys the proper properties and we can combine um for comagor complexity a conditional which is just like Shannon conditional entropy and we end up with this really bizarre result right here that it's saying that um uh let's see this is a typo ignore that IP right there that should not be there sorry but this is basically saying that um the average um algorithmic mutual information between any two bit strings averaged over bit strings is going to actually be equal to the mutual information between those bit strings according to that probability distribution to within a constant sort of this is um basically saying that comagor complexity does reduce to Shannon entropy in the limit and so now let me see if I okay I'll try to then quickly wrap it up to um connect this up with thermodynamics very very quickly so um let me go through this quickly so the comagor complexity of a bit string sorry to slow you down but if you go back just to understand this statement yeah I was going to say this is a typo that should not be there on the left hand side you have uh mutual information defined in probability I mean normal one and k of t is uh a complexity it's the comagor complexity of the actual probability distribution itself and okay and then that is bounded by by the algorithmic I mean the average of this of um mutual information so this should look very similar to you to things like the typical set in the asymptotic equal partition property this is a similar kind of a thing where you've got a minus a constant on the left hand side and a plus a constant on the right hand side and an average in the middle okay alright and so to finish up this first half of things um and boy I could have used that coffee right now um so comagor complexity it's saying that the complexity of a string is the minimal length of any string that would actually compute the one whose complexity you're interested in okay I've just proven to you that this concept has proven to you um so to speak that this concept has all these profound aspects that are going to be keeping you up at night but essentially that's ultimately aesthetics there's also um paintings in the Prado Museum that are really profound and will keep you up at night but this is actually um far away concrete concerns it's not really you know I'm pontificating about how physics is ultimately reducible to Turing machines and all mathematical questions and so on but concretely we're interested in things like the computation in human brains and the energetic constraints we want to relate these kinds of concepts we want to transform these concepts so that they involve thermodynamics okay so what is an obvious trivial everybody in this room should be able to do this um you know in the final exam don't worry but there could have been a question in the final exam um variant of Cormagore complexity that's more thermodynamic well rather than saying what is the minimal input to a given universal Turing machine that will cause that universal Turing machine to compute a string I want what is the minimal heat flow the minimal entropy flow the minimal thermodynamic work that would be required over any physical instantiation of a Turing machine to get it to actually compute the desired output and halt so this is what an engineer is going to be concerned with this is what somebody designed this laptop will be concerned with and the answer you can prove and it's a squirrely way of deriving this now certainly you don't have time to get into it now is that you can view it as a correction to Cormagore complexity it's the Cormagore complexity plus let's see notation got flipped around this is basically 2 to the minus length of the um uh normalized I'll remember we were talking about this white no black one yeah this quantity here the g of what I'm writing now in these slides is g of v is basically that for the string v and this is actually Chayton's constant it's playing the role of a normalization constant partition function Chayton's constant the halting probability it's a partition function or it's logarithm is so this is a modification of Cormagore complexity that tells you how what is the minimal amount of work to produce this string rather than what is the minimal length input that will produce that string so some intuition behind this or some of its properties here very very abstractly shown um boy this is kind of screwed up that time right there that should be a label on the x axis but here is time of a Turing machine universal Turing machine iteration number here is the states of the universal Turing machine I'm arranged for increasing Cormagore complexity and so these are actually programs all at the end of the day input programs that will the end of the day produce the same output string recall that going back to Landauer we have known that thermodynamic work is required whenever we actually combine two states and lose information about the predecessor when you run a Turing machine that's happening all the time when I run this laptop if you're given a coding assignment in CS 101 there is no single right answer any program that people come up with that actually produces what is desired let's say the coding assignment is write a program to generate the first 20 digits of pi many many people in that class will actually write programs that work they will not be the same program what that means is if you look in actually the evolution of the state space of the Turing machine you're going to find that Jacopo's program and Matteo's program they were initially different but as those two programs are running along they're eventually going to end up in the exact same state maybe earlier maybe later when that happens by looking at the laptop you cannot tell was this Jacopo's program or was this Matteo's I've lost information there's a thermodynamic cost with that so recall that the comagoro complexity of a string is unbounded okay because the whole business I told you that there's actually very very few strings with small comagoro complexity the number with comagoro complexity less than k is upper bounded by 2 to the k by Levin's coding theorem you can actually prove that the minimal heat flow to calculate the minimal work required to calculate any particular output string for any particular universal Turing machine is in fact bounded and that comes from Levin's coding theorem you can't compute that bound of course might be a very big bound but it actually is bounded so I think I will end it there for this morning so the first half of things and let's everybody take I don't know 5 to 10 minute break at 11 11 so Levin I have a question so the last part was a little bit quick sorry the last part was a little bit quick so are you saying that the heat flow is uncomputable I'm saying the upper bound I'm saying the upper bound is uncomputable the upper bound is uncomputable but essentially you know that this omega is less than 1 so don't you have an upper bound also on log z sorry oh the precise bound so you can only come up with a loose bound with a precise bound so we know that actually I have plots back here these are actually plots that people generated this graph right here this is actually strings represent as integers on the x axis this is comagoral complexity it's unbounded it actually gets close to log x so what the result shows is that instead if we're looking at thermodynamic complexity it actually never gets passed there's some bound that it never gets passed and that computing that bound that precise bound that actual limit cannot be computed you can come up with some value that's greater than it but you cannot actually compute that strict upper bound by Levin's coding curve okay okay? thermodynamics in this sense is different from computer science so so these are comments or questions heat generated due to two different programs will be different so one will know which is which this is which is which I'm not sure I understand the question whichever one did I'm not sure I understand the question I can figure it out by just running those two programs and seeing that you know this because you were saying before that we do the computation then you cannot distinguish from the output whether it's my program or Jacopo's program oh I see you might be able to these are bound okay so this is very carefully phrased this right here is the minimal heat flow for an idealized semi-static process so what this is saying is that your program and Jacopo's program only one of them is going to be the one that actually generates the minimal heat flow and that's what this result concerns can we actually distinguish the two programs by the amount of heat flow that they are generating yes but the business about how you have to pay with work here this is basically an intuitive way of illustrating that at the point that your two programs converge in even the idealized thermodynamic reversible versions of those two programs when you lose that information both of them are going to actually require some work you're going to have to compress your gas so to speak if you've got a gas based Turing machine there's actually a nice science fiction story by Ted Chen about how you can actually do artificial intelligence using just pneumatic tubes and pistons and so on but in any case at that particular point that you cannot distinguish between your program and Jacopo's anymore by looking at the state of the Turing machine its head and its tape once you do that that means that both of the programs at that point have to have actually applied some work so the amount of work that they generated earlier they were distinguishable earlier so it could be different could be the same okay so other questions so you're not going to sleep okay so you said like in one of your previous slide that the complexity of competing pi is less than 30 just a list of strings I wish to know if when you say complexity if for example I have maybe Monte Carlo method to compute pi are you taking an account of the time taken by the computer okay so I'm not sure I fully understand your question but here's a really cool result that people in computer science I don't know why they haven't actually explored this comagoral complexity I think this is part of what you're getting at it is only concerned with the length of the shortest program that program might take from now until the year 2100 to complete another program which is only one bit longer might actually compute might actually complete by this afternoon comagoral complexity says the first one is in some sense preferred to the second one there is a variance of comagoral complexity called leaven complexity the exact same leaven and what it is let's see if I can get this right on the fly chances are not, let me try the leaven complexity of a string is defined as the minimum over all inputs such that the Turing machine for that input computes the string of the length of P plus the logarithm of the running time so the logarithm of the time it takes you to halt on that particular P so it's a natural way of addressing what you just said and guess what leaven complexity is computable in fact it's very easy to compute can people see why it's got to do with the same thing as halting probability to compute the leaven complexity you just start writing down all programs and running them and one of them is going to eventually produce your string the one that just says print my string and we know that if we keep going beyond that since we've gone to all programs up to that size and we keep going longer we're just going to be getting programs just keep increasing lists so we can always compute the leaven complexity of any string I suspect or it would be interesting to investigate whether that can have all kinds of implications in computer science because it's not really been explored much that I can tell but also the thermodynamics of leaven complexity might be much more interesting the thermodynamics of a commagoral complexity dueling Russian mathematician so to speak okay very good so we take a break of five minutes and then start again