 The many computationally equivalent definitions of a Turing machine. Here, I'm going to very quickly give a high level sketch of one definition and then talk about some of the important properties of Turing machines. I'm not able to, what's going on here, come on. OK, so intuitively speaking, a Turing machine, much like the DFA that Gaudi was talking about, as I emphasized earlier, we have an input tape. We have a head that is positioned, one particular bit, in the input tape. The head, at every single iteration, unlike in a DFA, it's going to be able to operate on the full tape. It's not just getting one symbol after another like a DFA. As I mentioned before, in any particular iteration, the head can actually move left or right, and it can write a bit on where it is, and then enter a new state. All that happens in one particular iteration. There's also a special set of states, which are analogs of the accept state, loosely speaking, which are usually called the halt state with a Turing machine. And the computation is defined as the map from the initial string of bits on the tape to whatever the string of bits is when the Turing machine enters the halt state. OK, so it could just keep going, but whatever it's got there at that point, that is what you have computed. OK, and again, this is based upon, it's an abstraction of your actual laptops. You're running a program, and basically, you've got a termination condition. The program keeps running until it hits that termination condition, that's the halt state, and then there's going to be information in your memory that's actually providing you what the output of that computation was, and that's here part of what's on the tape. OK, so yes, yes, yes. Note the very, very important thing is that the state might not ever enter the halt state. The computation might run for forever. In that case, the input-output map is not defined. Not particular input, so the input is the string, and if that string causes the Turing machine to just run forever and never halt, then there is no output defined for that input. So in mathematics, this is what it's called actually a partial function. It doesn't actually have an output defined for all this possible inputs. So anyway, as we mentioned before, there's the church-Turing thesis, which is part of what makes these so profound philosophically speaking. One way to start to think about what's going on with Turing machines, what is the significance of the fact that for certain inputs you never would actually get a halt at the partial functions, has to do simply by comparing sizes of infinite sets. So do people here know things of like cantor diagonalization and whether two infinite sets are the same size and things like that? OK, so I'll go through a quick review. I'm going to try to, can you make sure that I finish by at the latest, just before 10 of, so that people have a chance to go up and get coffee and so on, take a break? Yes, we have no lecture after this one. So essentially, and also, there is no coffee break. OK, so should I go into? Then you can also lecture. OK, excellent. So can I go to 11 then? OK, so you've all heard of cantor, I assume, from the 19th century, one of the founders of modern set theory. And he wanted to be able to investigate what it means for two sets to have the same size, even if they're infinite. So for example, can we say that the set of integers, in some sense, has the same size, the same cardinality, as a set of real numbers? How can we even talk about such things? So he actually gave a very, very intuitive definition of what it means for two sets to have the same cardinality, the same size, that applies if the sets are finite, which are the ones we're normally thinking about, or if they're infinite. And that's what's given here. Two sets have the same cardinality, if and only if there's a one-to-one-onto map between them. Sorry, David, I'm not sure the screen is moving there. Oh, no. So if you can get the, I don't know why not. Can somebody get the IT person? Yep, I just am trying to go out of share screen in the back. OK, so this is showing. And to be very safe, let me just leave it like this without. So when I went into Adobe Splat L, it seems to have lost the. Do you want it to screen? Yeah. So when I did that, that's Splat L. But now it's not actually hearing it. So this is changing here, but it's not changing up there. Resumed. OK. Was it resumed when I entered it? Let me just do it this way, then. Oh, I can't do that. I want to go out. Maybe we can upshot and try again. That's why I would just try to do that. David, I tried to do that. Let me just run it through like this. OK. So anyway, sorry for all of these technical snafus we've been having. It seems that the computational machines are all aligned against us. They do not want to be analyzed. But we shall persevere and overcome, in any case. So Kantor's brilliant idea, it's so beautiful, such a simple notion, but it turns out to be so powerful, is that two sets, they can both be finite or they can be infinite, he just defined. You can't sort of argue with it. It's just a definition. He says they had the same cardinality if you can construct any way whatsoever a one to one onto map between them. It doesn't matter what the map is. If you can construct one, then they are defined to have the same cardinality. If they do not have the same cardinality, so as you cannot form such a bijection, bijection means one to one onto map, but there's a one to one map and injection from A into B. So there's an invertible map from A that goes into B. Then we see that B has larger cardinality than A. So to give an example, eraser thing, there it is. Why this is also intuitive, here are two sets. These ones are dots. Those ones are stars. Is there a, do they have the same cardinality? Sorry? Correct, you can't construct any bijection. In fact, which one has the larger cardinality, the dots? Because the dots, you can always do an, they do not have the same cardinality, that's the first point. And then you can actually do an invertible map from the stars into the set of dots. That is a map that is invertible. They don't have the same cardinality. You can do an invertible map from stars into dots. Therefore, dots is bigger. So these are the definitions. The really cool thing, the super cool thing, is that you can apply this to infinite sets, as well as to finite sets. So to give some examples, the set of integers has the same cardinality as the set of odd integers. Do people see why that would be so? It's because I can do a one-to-one onto map taking every integer to some particular odd integer. It's going to be different from that integer. Here are all the integers. Minus 2, minus 1, 0, 1, 2, dot, dot, dot. I love my ellipses. Dot, dot, dot. And then down here is just the set of odd integers. More of these fun things called an ellipsis. And the crucial thing is that, like you can see by how I'm laying it out, I basically just constructed visually the one-to-one onto map between the set of integers and the set of odd integers. OK, so the proof is I just gave you the proof. Similarly, the set of integers has the same cardinality as the cartesian product of two integers. And because of that, essentially, it has the same cardinality as the set of all rationals. Do people see why that would be? Because a rational number is defined by a pair of integers. And since there's the same cardinality between any pair of integers and the integers, therefore, he in essence showed, according to his definition, that there are as many integers as there are rationals. Here is where the fun stuff starts. Remember, I just very, very goodly, quickly, insuciently, that's the cool word, mentioned that there's this issue of, do the integers have the same cardinality as the reels? This has got to do with what's called the continuum hypothesis, some Fields Medal Awards, some of the most profound mathematics people have done. I won't actually be getting into that, but that's kind of like a teaser. And the answer is that actually no. Rationals has the same cardinality as integers, but the reels are larger. And here's a simple way. You may have heard of what's called counter-diagonalization, and that's what we'll work through. Let's assume that there were such a bijection between the set of reels and the set of integers. Well, that would mean that over here is the integers increasing that we could just list. According to that bijection, what's the real number that's associated with one, 0.1243098, whatever. Here's the one that's associated with two, and so on and so forth. I'm just making up those numbers, the point is that there is such a list if there is such a bijection. Well, let's see what's going on with my screen. Come on. There we go. So here's what it was. Look at the elements on the diagonal there. This is called a diagonalization proof. Notice the ones in red. Pay attention to the bouncing dot, so to speak. Look at what I'm gonna do now. I'm going to construct a number in green. Reading off those entries, this is 0.203 dot dot dot. Their entries are given by the original ones here along the diagonal, plus one, mod 10. So one is going to two, two is going to three, and nine is running around the horn and going back to a zero, okay? Notice that this number, 0.203, cannot exist anywhere in the list. We know it can't be in the first entry because I changed that particular value from a one to a two. It cannot be in the second entry because I changed what was originally a nine into a zero, and so on. So here's a real number I've just constructed that cannot be in this list no matter what that list is. This is called for obvious reasons, the diagonalization proof, and it is a proof that the reels have greater cardinality in the integers. And you may have heard about something called the continuum hypothesis. I was kind of alluding to it before. That is a question which took a century to solve essentially, saying is there any set that's cardinality is in between the integers and the reels? Bigger than the integers but smaller than the reels. And it turns out that you can actually say yes or no. Mathematics is still going to be consistent no matter which one you do. And that was, I figured out I got called Cohen about half a century ago. One of the most important results of 20th century mathematics. Okay, so what's all that got to do with making arguments about Turing machines and so on and so forth? Well, let's work it through. So another example, which I haven't given yet. This little star there, that's called a clean star KL. The guy's name was clean, very smart computer science fellow of the last century. This is called the clean star and what it means, the clean star next to any particular set means all finite strings of elements from that set. So B star means all finite strings of bits. B star and the integers have the same cardinality. Do people see why? Well, I can't call any integers into bits. Yep, and going the other way, it's gotta be a bijection. So we can just work it through. Here I'm gonna construct the bijection. The integer, remember we can, it suffices to just consider the positive integers because they have the same cardinality as all integers. So one, we're gonna make the bit one, bit zero. Two, we're gonna make the bit one. Three is going to be zero, zero. Four is going to be zero, one. Five is one, one, and so on and so forth. And you can see if you just do this kind of a thing that every single finite bit string is gonna be down there for some particular number. Okay, so now recall that the time equals zero state of a TM and its time equal tau state. If it halts a time T equals tau, those are both members of B star. Also recall that I said that this is a partial function because sometimes it won't ever halt. And on the set of such partial functions that can be computed by any Turing machine, those are called the partial recursive or you may have heard them refer to as computable functions. Under the church Turing pieces, those are the only ones that we're able to actually construct. But remember this business about the cardinality of B star and Z. Q has the same cardinality of Z. We can define TMS. Actually the interest of time let me just zip right through that. Yes, okay. So recall the definition of a Turing machine. And this is why I'm actually gonna be coming back now to counter-diagonalization just a moment. I'm gonna be explaining it. It has got a head with N internal states, one's a special halt state. Each iteration, these are the things that change. The head changes its state. It writes a new binary value at the bit it's on and then the pointer moves left or right on the tape. So this is it. That's the specification of a Turing machine. So here are the things that we just wrote down. There's an integer N, which is the number of states. There's the numbers in the set of all states which specify the initial state and the halting state. And then there are these maps. These are maps over finite spaces. This is actually, and you can have an arbitrary values of N. So the actual size of this list of numbers is not prefixed, but it's pretty straightforward to see that there's a bijection between this specification and the integers, okay? This is actually a very, very important concept is the basis of what's called universal Turing machines. The important point is that this means that the set of all Turing machines has the same cardinality as the set of all integers. And remember, by the way, that Cantor showed us that the set of all integers is actually smaller than the set of all reels. Okay, that bijection is called an effective enumeration, that's just some language. So how many of these, so the Turing machines are partial functions from the integers into the integers. How many of these functions, the members of Z is Z, can Turing machines compute? And I sort of am leading you there by saying the set of all Turing machines is only integers. So in other words, again, kicking it up to this kind of profound philosophy, maybe or maybe not, it's controversial. Is this a saying, does the set of all functions even an augmented human brain, like us using laptops, can compute, does it have the same cardinality as Z to Z, which is the set of all functions that Turing machines are partial versions of? And here's a theorem, I'm not going to prove it, but for any particular set A, what's called the empower set of A. So you have the set of all functions from As into bits. So you take all elements of A, you assign it a bit, the set of all such functions that's written as B to the A. It's a theorem that it has, according to Cantor's proof, a Cantor's definition, it has a larger cardinality than the set A. Corollary for every, I don't know what's going on with my laptop here, but in any case, the corollary is that for any set A, the integers to A has larger cardinality than A. So that therefore means that the integers to the integers, the set of all functions from Z to Z, has larger cardinality than Z. Recall, though, that the set of all Turing machines has the same cardinality as Z. Putting it together, what this shows is that Turing machines cannot compute almost every function from the integers to the integers. These computers ain't worth exploitive-deleted. They can't do anything as far as the set of all functions from the space of infinite tapes to infinite tapes is concerned. So this is like one of the first results because we, again, keep coming back to who cares about this as mathematics, beautiful math, but nonetheless, we keep coming back to this notion that, well, maybe it's got some physical consequence. This will be telling these things about the human brain and possibly even, depending on other philosophical issues, things about the actual universe because of the universe, we believe its dynamics can be computed by a Turing machine. Then this means that there's a huge limitation on what the actual evolutionary dynamics of universes are. Or maybe the universe is more powerful than Turing machines. These are our major, major controversies. Okay, so I'm going to try to, in the last 10 minutes, I might even get past that. So first of all, let me take a pause. Is what I presented to people, I'm kind of skipping through the slides to try to broken play running, as it's called in American football, is what I presented so far actually clear to people? Okay, cool. These slides, of course, will go on the Slack channel. So okay, the big windbag sitting up on the screen, he's just proven, or at least he says so, and I'll go check him up afterwards because I never fully trust what my professors tell me. But he says that he's proven that the set of Turing machines has lower cardinality than the set of all maps from bit strings to bit strings from Z to Z. Despite the fact that they're working on these bit strings and starting from one and starting to another. Okay, well, I wanna see such a thing. Show me, what is an uncomputable function? One that a Turing machine cannot compute, concretely. And this is what's called, you may have heard of what's called the halting theorem. One of the most important problems in computer science, it actually gives you an answer. Okay, so let me just work through this carefully. So the set of Turing machines is I keep emphasizing the same cardinality as set of integers as does the set of all finite initial bit strings on a Turing machine tape. So since the set of Turing machines are the same set of cardinality as the same cardinality as integers, what that means is that we can actually list the Turing machines. Here's Turing machine one, Turing machine two, Turing machine three, and so on. And here's the initial tape, the initial pattern on the tape, the inputs to the Turing machine. It could be, like I was saying here, zero, one, zero, one, dot, dot, dot. And this here, my N of M, what this notation means, is it's the integer that a Turing machine number N produces on the input M with value infinity if it never halts. For example, here, Turing machine one on input zero, this is whatever it produces. Turing machine two on input string zero, this is just whatever it produces if it halts and so on and so forth. All the way along, that's what all these entries mean here. That's why this is three of zero one, that's Turing machine three when the input string is zero one. Okay, now I'm gonna just use this weird notation, I'm gonna define plus K for all integers is K plus one for all finite integers. But if the Turing machine never halts, so that values infinity, then plus K means the value zero. Okay, so plus infinity, I'm just calling it zero, just some kind of silly notation that I was making up the late one night. Cantor again, counter diagonalization. The function on the diagonal, where I've just now added pluses all the way along, that is not computed by any Turing machine. Or counter diagonalization arguments. This all follows from the fact that Turing machines have the same cardinality as the integers, which we established by noting that we can just write down a finite classification of what is that you need to define a Turing machine. That's a finite number of bits. So therefore, no more Turing machines than there are integers and therefore, this is one example, this is not yet the halting term, but this is one example of a function that's uncomputable. Once you start making a list of Turing machines and seeing what the outputs of those Turing machines are, so this is very concrete, you can do this. Take all Turing machines, put them in the list. According to whatever criteria you want, whatever effective enumeration that is called. And then as you go through it, say well I'm gonna define a function that for input zero, what it outputs is the first Turing machine in my list plus one. For input one, it's going to be putting out the, whatever the second Turing machine in my list does, for input one and so on. That function you construct that way cannot be computed by any Turing machine. And it's well defined. I can write down its value for any possible input. Okay, so let's see. I will skip this stuff about a universal DFA. Basically a universal deterministic finite automaton. I kind of made up this concept. I assume it's a computer science, but I don't know. It's a deterministic finite automaton that can basically emulate any other deterministic finite automaton. But I did it, and you can prove, here is me proving it actually. And Gildian may know, I assume that there's some kind of results along these lines in computer science. But you can prove that there's no such thing as a universal deterministic finite automaton. One that can emulate any other finite automaton. On the other hand, there is a major important theorem. There do exist actually an infinite number of universal Turing machines. These are Turing machines that can emulate any other Turing machine. What that means, u sub Tm, that's a universal Turing machine. It is one such that if you give it any input string uv, where v is a specification of an arbitrary Turing machine, what this universal Turing machine does is it takes that input, looks up in this list of all Turing machines, what is Turing machine v, and then it runs that on the actual input value u. It's basically a compiler in your computer right now. That's what a universal Turing machine is. You can put into it any definition of any Turing machine and an input to that Turing machine and it just emulates that Turing machine on that particular input and produces the output and that's called a universal Turing machine. Remember, I basically said to you as much as thing as a universal deterministic finite automaton, it is very, very important that there are in fact an infinite number of universal Turing machines. So these are things like the compiler on your laptop. They can emulate anything else. Okay, so this is actually also relates to the fact that this means since a universal Turing machine can implement any DFA, but no DFA can implement a universal Turing machine because there's no such thing as a universal DFA, and this is another way of seeing what are the ways in which Turing machines are in an absolute sense more powerful computationally than deterministic finite automaton. Okay, so we're gonna pick sometimes called a reference universal Turing machine and so I'm just gonna write u of uv for u that's implementing the Turing machine specified by v on the input of u. So I'm gonna put that comma in there to make things easier to read. So in other words, we can write this in terms of notation that the universal Turing machine of u comma v that's equal to this v of u. Okay, this is just my notation. And again, we call that in general, v as a function is gonna be a partial function. It will not halt for some of the inputs. Here then is another, and I'll try to end this here today. Here is another example. I gave you one example of an uncomputable function using counter-diagonalization. That, frankly, can seem quite contrived. Okay, yeah, you've got one, but I would never be interested in that in the real world. Here is one that actually is very, very consequential. The Holstein theorem, and here's a sketch of a proof and I won't try to go through it today in the remaining minute or two, but again, these will be on the staff channel so you can look it up. It says that there is no Turing machine that is defined for all of its inputs that can say, does the some universal Turing machine, running Turing machine v on input u halt? In other words, if I'm interested in the question, does an arbitrary Turing machine specified as v, does it halt when fed an input u? I wanna know that. Does this thing ever halt or is it undefined? This theorem says that you cannot write a program, no C code, no Python, no Ruby, no nothing, that can take any sort of specification of a Turing machine and any input and say, is this gonna ever halt or is it undefined? Can't do it. This is called the halting theorem. It's one of the cornerstones of computer science. It's an impossibility result and I was referring earlier today about some impossibility results by Naoto Shorishi. There are some earlier ones. You can find nature papers of several years ago by a guy called Qubit in the quantum domain and things like this. They find instances of machines that basically implements the halting problem as a physical system, things like a relaxing spin glass. So it's got direct physics consequence and one of the things that Guji and I are exploring and many other people, there's many other collaborators now. Stochastic thermodynamics of computation, what are the thermodynamic analogs of all this? Because as Guji was emphasizing, computer science is all about how as you change the problem, the resources that are needed in terms of number of bits in your memory, number of iterations, how that depends on the details of the problem, all that one needs to do to do stochastic thermodynamics of computation from a high level, what you're doing is you're saying another resource is the energy needed to actually perform this computation. And so just throw that into the mix and see what the consequences are. How does the energy needed to perform a computation depend on the computer science definition of that particular computation? How can we marry these two? And so for example, here, ending just on this note, there's an open question of trying to find a deep question. How's that for going meta? Try to find a question that somehow is concrete that you can actually start to try to analyze that involves the issue of what are the stochastic thermodynamic, thermodynamic in general, analogs of the Holting theorem? I cannot build a machine that can say whether any other machine will halt. That other machine is burning energy as it runs. My machine is burning energy as it runs. Does this have any consequences? And computer science is littered with those kinds of questions that you just sit there and you look at them and you say, oh, but if I'm doing this on a real physical system, there's gonna be real physical analogs of all these theorems. What might they be? And you can actually get some really fun answers to some of those kinds of questions you might pose to yourself. Okay, so let's end it now. I hope everybody is good and burned out because if not, then I didn't do a good job. And let's resume tomorrow with some more cool, fun stuff. Okay, oh, are there any questions? Yes. Sorry? Things like GPT-3 or like these AIs that take language, where do they fit in these models? Like, are they like Turing machines or? GPT-3 or BERT or so on, they are, it's, I don't know, I don't think that there's some sense in which it's a universal Turing machine. Everything is an example of a Turing machine. The question becomes interesting when you say, is it universal? And there are things like you can build a universal Turing machine out of, as I mentioned earlier today, by just giving a physical identification of the variables in a three-body problem that we have mutual gravity and they're rotating around one another, you can actually map that into a universal Turing machine. You can actually map the simple hydrogen atom into a universal Turing machine. If you adopt the right representation, almost everything in the universe is computationally universal. But the difficulty becomes in that and when you do those representations, you can't actually initialize the system properly. Now, in the case of things like BERT, GPT-3, and AlphaGo, and all these other kinds of whiz-bang things, the question would be, is there some particular representation of the inputs to one of those that actually means that its overall behavior would be implementing an arbitrary universal Turing machine? So could I, in some sense, use GPT-3 to run an arbitrary computation because of its internal structure? And that will depend upon the details of GPT-3. Sometimes you can do that, just like with the three-body map or gravitating problem. Sometimes you can, sometimes you can't. I think that answers the question. Okay. Any other question? Okay. So just one announcement. So we decided to move the exam of Rufo to Tuesday next week at 4 p.m., okay? Okay, thank you. So see you at quarter past two, here, for the second lecture of the final, okay? Good afternoon. Thanks, David.