 So tonight, as our introducer, I'm really delighted that we have Nicholas Talib, who spent his... Yeah? All right, I think you have a fan club here. And Nicholas spent his career as a trader, but many of you also know him as the author of the wonderful book, The Black Swan. So I will let Nicholas do the honors now. Thank you. I'm honored to be here and to introduce our friend, Matthew. But before I do that, let me tell you a few things about randomness, given that it sort of happens to be my specialty, probability. If you know probability, you can bully anyone in science except mathematicians, because mathematicians are only interested in deterministic things. Is that the opposite of randomness, deterministic? So, but you can go bully people in linguistics because they use probability and you can find, of course, something, question them and then they freak out. You can bully people in genetics. You can bully people in finance. That's what they did. Actually, finance was invaded by people coming from either statistical physics or probability theory. You can bully people in medicine. You say, you know, what's that? The p-value? How did you compute it? They freak out. They completely freak out. And actually, every medical paper, every paper in science is statistical. It is not deterministic. The only thing that is not statistical is mathematics or things that are theoretical, mathematical. So whenever a paper is written, research is done, they have their N number of experiments. You know, how random or non-random it is. Statistics is figuring out, you know, how non-random something is or maybe. So you're dealing, the boss of the paper, regardless of the discipline, is he or she who knows how to determine the random component of the result. So it is a very important field. I mean, I watched the performance before and it was much closer to a good documentary. I don't know how he does his slides. And I suggested to have a non-random business which sells, you know, packages making slides like these. It's a beautiful talk. You're going to be very impressed. So Matthew, to introduce him, come from statistical physics. Visibly, he has a seven-page resume. So with a seven-page resume, you always have, you know, a multidisciplinary, and it is. He spends some time on Wall Street. Maybe not much, he's secretive about it, which means someone made him sign paper saying, you know, something. And what else did he do? He's today at Olaf College in Minnesota. And it's supposed to be on top of a hill where he can look out and see the non-random sky, all right, and above the random activity of Minnesota, Minneapolis, near the St. Paul, okay. So this is how I... But then you're going to really enjoy this talk. Thank you for coming. Thank you, Nicholas. I want to say I do come from Minnesota and whenever you travel to Minnesota, it's always warmer wherever you go until I came to New York right now. So yeah, we're going to talk about randomness. And I want to get started with a little hands-on exercise. I'm going to do this kind of quickly. It's very simple. You have sheets of paper that have... one of them has five blocks on it. It's very easy to recognize. All I want you to do is randomly, whatever that means to you, fill in those blocks with zero... each of the block with either a zero or a one. Do it randomly, whatever that means to you. So it shouldn't take long. The more you think about it, the worse you're doing. Okay? Oh, also realize, some of you might have a five-digit zip code. Zeroes and ones that match. That doesn't count. Don't put a five-digit zip code in here. Okay. Now, where's my excellent assistant, Gaby? How many people put a zero in the first box? Raise your hand. Gaby is going to count very quickly. Actually, Gaby, you count this side. That side. I'll count this side. Okay. I got 34-ish. We'll call that 74. Okay? How many people put a one in the first box? And nobody should be voting twice. 62. So 66 versus what do we have? First time, 79? 74. Yeah, you guys are pretty random. That's kind of what you want to see. Let's do one more advanced step here. Let's look at the first two boxes. How many people put a zero-one in the first two boxes? A zero-one. I'll do mine. 17. I had 18. 35, right? Okay. A one-zero. Let's do this very quickly here. Yeah, I got 19. Whoa, look at that. Now, a one-one. Yeah, I got 15 and 19. Okay, zero-zero, 19. All right. I've done this a lot. You guys are the most random crowd I've ever seen. Congratulations. We're going to come back to this show. Let's remember that. This is very interesting. Good work, collectively. I want to talk a little bit about randomness for a moment historically. Where did it come from historically? Well, it turns out, let's get back to the screen here, which you recognize as an astragalus, the knuckle bone of a goat or a sheep. What does this have to do with randomness? Well, it's not what it is. It's what it became over time. The dye, as a matter of fact, the astragalus was used as a dye. So for a very long time, human beings have voluntarily encountered randomness through gambling. So you think it's been a very important part of human existence for at least going back to 3000 BCE. So how have we thought about randomness? Well, unfortunately, for a long time, no one really thought very hard about it at all. I mean, for example, even Aristotle, who thought about a lot of things, including what he called events, which there were several types of events. There are certain events. The sun will come up every day. There are probable events, which is actually something he attributed to the validity of an argument, and then events that were unknowable. And to Aristotle and those around him, gains of chance were unknowable. Not unknowable in sort of a, oh my God, that's really complicated to figure out what's going to happen, unknowable in sort of a philosophical sense. And the reason for that is what we think of as random, the ancients actually attributed to fate, to the gods. And things that were in the hands of the gods, you didn't think about it. It wasn't a topic that was even a viable topic of discussion intellectually, because it was something the gods knew. You could gamble, but the gods already knew how it was going to play out. So you were just doing what you were doing for fun, but you were going to ask how it happened. And this was a state of affairs for a very long time. In fact, for any real step forward in thinking about randomness, we had to wait almost 1,900 years, to 1,654. And as it turns out, again, connected with gambling, there was something called the problem of points. The problem of points is a very simple scenario. Two people are playing a game of chance, they have a pot, it builds up as the game goes along, they have to end the game early. How do you divide the pot based on each player's probability, likelihood of winning? So this was actually a standard question of the day that was bandied about between all the leading mathematicians in Europe, including people like Fermat and Pascal, and they essentially solved the problem and created what we would call the modern theory of probability. And from this point forward, things like Pascal's triangle came out of this and probability, as we know it, evolved very naturally, beginning at this point. Now, here's a very important issue. Probability is not randomness. Probability is to randomness for like the surfer is to the wave. The surfer needs to know there's a wave. The surfer needs to know what the wave does, but the surfer doesn't need to know what the wave is. Probability has the same relationship with randomness. It uses randomness, needs to know something about how it behaves, but it never has to ask what randomness actually is. What we want to do is ask, what is randomness? It's something, right? I mean, it's everywhere. We see it as in gambling, it's in science in all kinds of forms, it's in weather prediction. It even comes up in love, you know, random encounters and all the vagaries of love have some randomness to it. We know it's something, so we can think about it. And mathematicians have thought about it very, very hard. And usually the way to start about, think about any complex problem is to break it down to as simplest form as you possibly can. When it comes to randomness, nothing is simpler than flipping a coin. A fair 50-50 coin. You flip a coin, you get, you know, an outcome ahead of a tail. You flip a bunch of coins, you get more outcomes. You know, you can do this a lot and get some idea of what randomness means by looking at the outcome of a very simple process. Now, just to change the notation here, you use h's and t's a lot for heads and tails. Being more mathematical, we're going to use zeros and ones. Sort of ties in now to what you already did. We're going to see a lot of zeros and ones for a while here. So it's actually a name and called a bit string. So I'm going to talk about bit strings a little bit. And bit strings always refer to these combinations of zeros and ones. So this is a way to start thinking about what randomness is. Let's stare at, collectively, some bit strings. There's a bit string. That's a big old bit string. Is it random? Well, we could do the same process we did with the sheets of paper. We can ask a very simple question. Is it random, or perhaps a little bit more precisely, was this the result of a random process, like a coin flip? Well, let's do what we did. We can ask simply, how many zeros do you see versus how many ones? And if it is random, like you guys, you'd expect to see roughly as many zeros as you do once. And then you take it to the next level. You play the two-bit game. How many zero ones do you see? How many one zeros? How many one ones? How many zero zeros? If this is random, so like with you folks, you'd expect to see approximately the same number of each of those. And then, in this case, we can go even further. You can ask about three-bit combinations. There happens to be eight of them. You'd expect each of those to appear about an eighth of a time, four bits, there's 16 of those. You'd expect each of those to appear about a sixteenth of a time, and so on. This is actually a fairly practical and hands-on definition of randomness, a random length. So from here, you can actually build a kind of a test, a very simple test, that something is random if, a finite bit stream like we have here is random, if it has this nice property. All short patterns, you can't get too long in your patterns, but all short patterns of the same length appear about the same amount of time, appear with approximately the same frequency. And that's what we did in that quick little demo. Now, just for fun, can we take it to three bits? You guys are so random. I don't normally do this. Let's do the three-bit thing real quick here. Let's see how we turn out. So, Gabby, we're going to have to be good here to do the count, but let's ask the zero, zero question here real quick. Zero, zero, zero. How many people have started their bit stream with zero, zero, zero? One, six, okay. I had nine, 15. Okay, zero, one. I'm sorry, zero, zero, one. Okay, I got 12. Zero, one, zero. Four. Wow, I had 11. Okay, zero, one, one. I had 12, 24. Why don't we just stop there? I want to actually... Do you have one? Do I need somebody else? Okay, we're going to do one, zero, zero. Okay, how many people have had one, zero, zero? Sorry. And 15, 27. We can keep going like this. What's the evidence say now? Are we still random? No. No? You don't think so? How many people think we're random? Here's a good question. How many people think we're still random based on this data? How many people think we're not random? How many people are hedging? Yeah, so. Again, this is the problem with the definition. It's a soft definition that the shorter part, in one sense you can only take bit strings that are so long, but more importantly, the approximately part makes it very hard to be definitive about whether or not a finite bit string is random or got created from a random process. Reasonable people can differ based on the data. So, yeah. But still reasonable people can... They're just tests, yeah, exactly. And you can refine this to make it a little bit more precise, but in the end, you're still going to have to make a judgment call somewhere along the line. So, that's the problem with the finiteness of a bit string. The solution, the mathematical solution, is to take it to the infinite bit strings. Now, there really aren't any infinite bit strings, but you can imagine a bit string of infinite length. And now you've got a tricky question. How do you talk about frequency with an infinite bit string? How would you talk about the frequency of, say, 00 in an infinite bit string? Well, it turns out you can do it pretty easily. And mathematically, when you work with the infinite, the trick is almost always take it back to the finite and move forward from there. So, what that means here is instead of taking the infinite bit string, take a finite chunk of it and do the same process we just did, which is very well defined. Well, take the first 2,000 bits, go in there and count the 00 pairs, and suppose you get a number like 0.256 as a frequency, kind of close to a quarter. Then you take a bigger chunk. Do the same thing, count the 00 pairs, you might get a number that's even closer to a quarter. Take even a bigger chunk. Do it one more time to get a number that's even closer to a quarter. And then if you're good enough and you have some insight to what's going on, perhaps, just perhaps, you could imagine taking this all the way to the infinite limit and see that pattern on the right actually trends towards, converges to 0.25 exactly, precisely 1-4. If you could do that, in some cases you can, you could conclude that 00 does have, we'll call it a limiting frequency, but it's still the frequency 0.25. So you can actually talk about frequency of a finite pattern in an infinite bit string. And once you can do that, then you actually have a really nice definition of randomness. Those wiggle words are approximately and shorter than that. That all finite bit strings of the same length have exactly the same limiting frequency. Like I say, no wiggle whatsoever. It's a precise definition. And that's actually the one very good working definition of randomness. Now, what's interesting about this approach to randomness is it connected to our notion of number in a very interesting way. That's where the next story is going to begin, with the work of a lot of mathematicians, but particularly Emile Burrell's, French mathematician worked around the end of the 1800s. He was very interested in what the real numbers are, the numbers we work with all the time, and how to characterize them. And so, you know, the real numbers, you know, here's some faves, there's e, 3 and a third, square root of 2, I think that's log 2 up there, and the ever-popular pi down there. Those are the real numbers he was studying, the types of numbers he was studying. So let's take pi for a moment and just look at it a little bit more closely. Many of you know pi's decimal expansion goes on forever and never repeats, no pattern whatsoever, so here's a big finite chunk of it. Written in base 10, let's write it in base 2, and suddenly we're back to bit strings. So this notion of a bit string actually can be applied to numbers, in particular pi. Now, at this point in time, we have computed, we, human beings, have computed on the order of 8 trillion digits of pi. Not infinitely many, but a lot. They've been subjected to every distributional frequency test you could possibly imagine, and by all appearances, pi looks pretty darn random. It appears to be perfectly distributed in the notion, in the way we've just been talking. All the frequencies are exactly what they should be, up to 8 trillion digits. So this is actually an interesting property of a number. Its digits are, have this randomness to it. Well, we're actually built on this. We want to talk about the notion of a number as being normal, if it has the same property that pi has. The digits are perfectly distributed, not just in base 10, not just in base 2, but any base you could possibly imagine, base 17, if that's your thing, that in every possible base, the digits are perfectly distributed. We call that a normal number. So I want to talk to you a little bit about why did he pick the word normal? Well, let's look at this backwards. Let's look at non-normal numbers. Well, if a number is normal, all the finite patterns of the same length have to have the same limiting frequency. All finite patterns better appear, if that's going to be the case. So one way to start looking at numbers, do they have every pattern in their decimal expansions? Well, whole numbers have only 0s, so they don't have every pattern. Rational numbers have sort of the same problem. They start repeating blocks after a while, so you can't get every possible pattern, so they certainly can't be perfectly distributed. Then there's some weird, I call them concocted numbers, made-up numbers. This is a number in base 2, you can imagine in base 10, but it's a little bit more interesting in base 2. You probably see the pattern. It's a 0 and a 1, 2 0s, 2 1s, 3 0s, 3 1s, etc. It doesn't have a lot of patterns. For example, the pattern 101 never appears in that number. So it's an interesting number, but it's not normal. Okay, so it's anything normal. Well, that's a good question. Braille said, which row of numbers are normal? Essentially all of them. It's the normal state of affairs. That's why he used the word normal. Okay, I know what you're thinking. I just pointed out infinitely many non-normal numbers. Now I have y'all to ask you to say, well, essentially all of them are normal. That's a little bit on what essentially all of them means in a mathematical sense. I'll do a little cartoon picture. Suppose this were the real numbers. In there, you put all the non-normal numbers. There's infinitely many, but imagine that represents the non-normal numbers. I'm going to take those away leaving the normal numbers, and it looks just like the real numbers. The non-normal numbers are less than dust. They're instantly small dust within the real numbers. There are so many real numbers you can't miss. There are so many normal numbers you can't miss them within the real numbers. If you just, in some sense, pick the real number almost certainly, it's normal. In other words, they're everywhere. Can't miss them. Can't help but trip across them. Here's an example. I'll try. Here's an example. It's also a concocted number. Anyone see the pattern there? It's kind of hard to see. If I highlight it a little bit, you might see. It's the digit 1, which happens to be 1 squared, followed by the digit 4, which happens to be 2 squared, followed by 9, which happens to be 3 squared. You get the idea, 1, 6, 4 squared, right up the line. So it's all the squares lined up. With some modest effort, and modest being several months of your time studying very, very complicated papers, you can actually prove this is perfectly distributed in base 10. It's actually a fairly significant result. It's not known if it's perfectly distributed in any other base, so at least you can say, here's a number which is almost normal. It's normal in base 10. So there's 1, but not normal in the Braille sense. What about our interesting example? The numbers we love so much, you know, that your log 2's, your square root of 2's, your e's, your pi's, are those normal? Turns out nobody knows. It's wide open. No one's ever proven any what you might call interesting real number is normal. In fact, the only real numbers that have been proven to be normal in the sense of every base are heavily concocted. I mean, they're not the kind of numbers you want to talk about in polite company. There's really obscurely constructed real numbers that have no use other than showing that you can actually construct such a thing. So the numbers we know and love, essentially all of them are real, are all of them normal, but you just can't prove it. It's kind of interesting. Yeah, for all we know. I doubt it, but for all we know, absolutely. Yeah, exactly. We only have $8 trillion, it's nothing. Yeah. It's sort of unfortunate. Well, let's take a look at pi in that context. I would say, essentially, every mathematician believes pi is normal based on the data. It's a statement of faith. So what if it is normal? Let's just suppose it is normal, even though there's an outside chance it's not. Well, if it is normal, as we said, we know that every pattern in the same length has to occur at the right frequency. That also means every finite pattern better occur instantly often, or it would have zero frequency eventually. So in pi, every finite pattern has to occur instantly often. That's an esoteric statement. Let's make it a little bit more practical. The last book you read, imagine it sitting on a hard drive as a bit string. The last book you read occurs in pi infinitely often. In fact, every book ever written occurs in pi infinitely often. Actually, let's not stop there. Every book that ever will be written is already in pi infinitely often. It's also essentially all the other real numbers as well. If pi is normal, and essentially everyone believes it is. So it's sort of weird. Yes, you could if you should live so long. How far are you going to have to look to find that bit string? But yes, every book, everything that can be represented as a bit string, every picture, every piece of music, everything that anyone's ever said is in pi infinitely often. I'm not telling you how to find it or anything like that, but it's in there. Interesting takes on pi. Let's use that as another take on randomness. Completely different perspective now. This actually has to do with the notion of what we're going to call compressibility. It builds out of the work of two fantastic mathematicians. Claude Shannon, an American mathematician, worked on questions involving communication, what we call the theory of information. And Andrei Komogorov, a fantastic Soviet mathematician, a problemist who also thought very hard about what randomness is. They both came to the same perspective. Let me show it to you via an example here. Here's a bit string which I think we would generally agree is not random. It happens to be 2048 ones. Now, I could as easily represent that as just telling you, hey, just print one 2048 times. Or if I want to be a little bit more formal about it, I would say use the command print and use the data one in 2048. And I have that bit string. I have it faithfully represented in a much more compact form. So here's an example of something that's not random, and we say it's compressible because there's a much more compact way of representing it. Now, shift gears and look at this. A little bit more complicated, and you can try to compress it. That was it. That was trying to compress it. Nothing happened and didn't give. The best I can do here is probably just say, print thyself. And there's no compression there, right? I mean, that takes up a little bit more space than what I started with. But this one happens to be random, and it happens to be incompressible. And that's the ticket between compressibility and randomness. Now, the notion of compressibility is something that you actually deal with all the time when you work on a computer, which I'm guessing you do. You know, take an image like the Mona Lisa. That image on the screen occupies 1,000 by 1,000 or more pixels, a million pixels, color information you throw in there. There's about 300 megabytes of stuff it takes to draw that picture on the screen. Well, the way that got to my screen wasn't through 300 megabytes. It was through something called JPEG compression in this particular case, four megabyte file sitting on my hard drive. Now, what's kind of interesting here is that's compressed. It can't be compressed any further, because if it could be compressed further, guess what, JPEG would have compressed it further. That's as far as it goes. So it's random in this context. What's kind of neat about this is they both have the exact same information, obviously. I mean, that's where you get the Mona Lisa from. But what's sort of neat is the random one has highly concentrated information, the same information in far fewer bits is highly concentrated versus the very dilute Mona Lisa and her full-blown glory. Now, one of the issues here that comes up is the fact that what is the relationship between information and randomness? Not that long ago, I had sort of the naive view that information was the antithesis of randomness. Information sort of organizes things and it takes away the randomness. Well, from this perspective, and this was Shannon's perspective, they're the exact same thing. Pure randomness equals pure information. If you can compress things down as tightly as possible, make it as concentrated as information as possible, that's the best form which to transmit information with. Pure information is pure randomness. So that was Shannon's perspective. Now, let's come back to this notion of algorithm for a moment here. This compression of a bit-string is a command which is a command in a data. You can think of it as a bit-string and the command we can think of as an algorithm. Command algorithm is just synonyms for each other. What is an algorithm? It's actually sort of a subtle notion. It's a process. Well, it turns out people thought very hard about what algorithms are. In particular, two individuals, Alan Turing and Kurt Girdle, thought very, very hard. Alan Turing, many of you might know because he was a main character in a fairly well-known movie, The Imitation Game, might have seen recently. It was actually in the news today written past the Turing law or ratified the Turing pardon for gay men convicted during World War II in England, which was named the passage of the law, the pardon was named after Alan Turing. Kurt Girdle, totally enigmatic individual, famous for Girdle's incompleteness there, among other things. They both thought very hard about what an algorithm was. They came to a similar conclusion, almost identical conclusions, that you can boil the notion of an algorithm down to building blocks, atomic units, Legos, if you will, that you can construct every algorithm out of it. And you can use several names, but usually we use a Turing machine as the name for this process. There's a fundamental tenet of computation and computer science that every algorithm, whatever an algorithm it is, you can build it in a Turing machine, which is this very simple building blocks. And every Turing machine, because it's so well-defined, can be nicely written as a bit-string. Everything can be written as a bit-string, and a Turing machine can be written as a bit-string. In fact, there's an algorithm which will take an arbitrary bit-string and tell you whether it's an algorithm or not. In fact, that algorithm is a bit-string, so there's a bit-string which will take a second bit-string and tell you if the second bit-string is an algorithm. In fact, that bit-string could look at it itself and go, hey, I'm an algorithm. It's one of the beautiful self-referential features of the theory of computation. But we're more interested in just this idea that algorithms are Turing machines. So in this constructed compression, we have an algorithm and a bit-string together. The algorithm is just this Turing machine. A Turing machine is just a bit-string. And what do you think you'd get if you had two bit-strings together? Absolutely. A bit-string baby. Actually, you just got one. You'd get a bit-string. One more bit-string. So what's really neat here is the compression, this active compressing is itself the same type of entity as the thing you're trying to compress. It's a bit-string. Let's give this this beautiful definition of compression. The compression of a bit-string, we could informally say it's the shortest algorithm in data that will produce it, but now we can say it's the shortest bit-string that will produce it. So we're comparing a bit-string to a bit-string. And then randomness becomes very easily defined. A random bit-string is one, well, which, you know, you don't get anything any shorter when you try to compress it. So it's also a very satisfying definition of randomness. So there you have it. Two alternative definitions of random. What is random? Is it this idea of a perfect distribution? Or is it this idea of incompressibility? Let's take our favorite number Pi. Turns out they don't even agree on Pi. Pi is sort of the poster child of randomness from the perfect distribution. Everyone believes Pi is random. Almost everyone. There might be some dissenters out there, but it seems entirely random. On the other hand, from a compressibility perspective, there are all kinds of algorithms that will produce 8 trillion digits of Pi that are very short. So from a compressibility perspective, it's incredibly compressible. So it's not random. So these two definitions don't even agree on fundamental numbers like Pi. Now, Mark Cocker, a wonderful Polish-American mathematician, a probabilist, thought very hard about randomness for a period of his life, and he had a great quote that I'm going to share with you. From a purely operational viewpoint, however, the concept of randomness is so elusive as to cease to be viable. I'm not saying there's a bait and switch talk. There is no definition of randomness. There is no universally accepted definition of randomness. There are competing definitions that agree in some ways but disagree in other ways. At this point in time, we don't know how to properly define it. But it's something, right? Something's happening here. I mean, there's randomness out there. So let's forget trying to define it perfectly. Let's deal with good enough and think about using randomness. When I say using randomness, I mean intentionally interjecting randomness into a problem as a means of solving it. All right? So, historically, that whole process began actually a fairly long time ago. Lord Kelvin, the great Lord Kelvin for whom the temperature scale, an interesting paper, 19th century dark clouds of the dynamical theory of heat and light, which he tried to summarize the state of physics as we went into the 1900s for the 20th century. And in there he addressed a lot of questions that one were some questions in thermodynamics involving the motion of molecules, particularly something called equal partition of energy which tried to define how molecules moved around and looked the same so he had some theories on equal partition of energy he was trying to put for it and he had some mathematics which showed this would explain the fundamental ideas of thermodynamics but he got really stymied by the math. But he had a great idea. I'm dealing with positions, random positions of molecules, how about if I just simulate a bunch of random molecules and that's exactly what he did. He interjected randomness. Here's a passage from this difficult to read paper that points out where he actually started thinking about this. This was done by taking 100 cards 0, 1 through 98, 99 to represent distances from the middle point and then by a toss of a coin determining on which side of the middle point it was to be plus or minus for a head or tail frequently changed to avoid the possibility of error by bias and then the draw of one of the 100 numbers was taken after a very thorough shuffling of the cards. So he randomly positioned molecules around and they just did the physics on him to figure out how his equal partition of energy was doing. So it was intentionally interjecting randomness to solve this problem. Turns out nobody cares about this. This is obscure for two reasons, two big reasons. One, he was flat wrong about equal partition of energy. Other people had other theories that turned out to be correct and no one really wants to read through your 85 pages of obscure steps for a theory which in principle is wrong. So it sort of just died on the vine. And secondly it was just difficult. He had a computer his name was Mr. Anderson and poor Mr. Anderson you can imagine this British basement flipping coins and shuffling cards and jotting numbers down and sending them up to Lord Kelvin and all that kind of thing and it just was high efficient. So it just really wasn't the right time for randomness. They needed some better ingredients. They needed a better tool and a better question. And I think you might guess the better tool the well-known electronic computer and the right questions as it turned out came out of the Manhattan project at Los Alamos the development of nuclear weapons. Oh yeah actually needed the right person. Kelvin might have been the preeminent physicist of his time but he wasn't the right person. He wasn't John von Neumann. It required a seniorly brilliant mind to launch this I can say a lot about von Neumann but what I'd like to summarize von Neumann is for the vast majority of great mathematicians you can quantify their greatness by listing the number of different fields to which they made substantial contributions. That wouldn't work for von Neumann. The number of fields he created in order to accurately represent von Neumann's contribution. Computer science among them. Game theory. He created a mathematical system for quantum mechanics. He created operator algebras. I mean the list goes on and on and on. He only lived to be 54 and he spent 10 years doing classified work at Los Alamos just to put it in context. So von Neumann was a brilliant mind and really moved this forward. So what did he do? What did he do? He was actually working on a question not unlike Kelvin's. He was doing something related to neutron scattering. The probability of a chain reaction. This was after World War II. They were working on the hydrogen bomb von Neumann had tackled and solved every seemingly insurmountable numerical calculation up to that point but he was stuck here. And the reason was he had some very precise rules coming out of quantum mechanics as it turns out but also this randomness coming out of quantum mechanics and he just couldn't move it forward. Even the great von Neumann. But von Neumann had a very good friend. Stan Yulong, Polish American mathematician who at the same time was convalescing from surgery and to pass the time he played canfield solitaire type of solitaire over and over and over again. And like any good mathematician would he eventually asked the question what's the probability of going out at canfield solitaire? He started fiddling around with it. He realized, you know, he got these precise rules but he got this randomness from the shuffling and he was stuck. He realized he's never going to work out but Yulong actually worked with the ENIAC the computer used by Los Alamos. And he had this little idea. You know, I could probably code up write a program to play solitaire. It's pretty easy rules. I could probably, he didn't know how to do it but he faked the idea of shuffling the deck. If you could do that, you could just simulate the whole thing in a computer play thousands and thousands of games in the computer and just count how many times you go out and empirically estimate this probability. Great idea. This was a flight of fancy, because Yulong was fully aware they're not going to take the one computer used to develop the hydrogen bomb and write the first computer solitaire game on it. It just wasn't going to happen. But he saw the connection. The connection between can feel solitaire and neutron scattering. That's the essence of good mathematics. You see this connection. They both have precise rules. They both have randomness. You could simulate both of them in the computer and he told Von Neumann this. He explained the whole thing. And Von Neumann probably listened very carefully and then said in Von Neumann's way yes, we shall do that. And he did that. They did that. Now, let's get over this. How do you fake randomness in a computer? Computers are deterministic. So now there's a whole industry built around generating pseudo-random numbers. Every computer does them all the time. But no one had done it at this point. Well, Von Neumann had a way of doing it. I just want to share it with you, because it's sort of a little bit of insight into Von Neumann's thinking. It's called Von Neumann's middle square. Here's what Von Neumann proposed and what they actually did. You get an 8-digit number. Drop off the two outside digits. The middle square. Just keep doing that over and over again. Keep taking those middle chunks. Repeat this often enough. You get these blocks of four digits and you can ask, hey, are these random? And so they applied the same sort of tests we just did right here. Hardly anything more sophisticated. They counted frequencies. And they looked random enough. Or as a friend of mine likes to say good enough for government work. In this case, government work was building a thermonuclear device and it worked. And this is the beginning of what's now called the Monte Carlo method. The name is actually a nod to the history of random missing gambling. It also came from Stan Yulom, who had an uncle who had a little bit of a gambling problem at Monte Carlo. They thought it was a very appropriate title for this method. And for the moment they did it, boom, everyone recognized how powerful this was. As we sit here, people are doing Monte Carlo method calculations on all kinds of problems out there. It's an incredibly powerful tool used all the time. Really, all you do is flip coins and watch what happens. And that's the beauty of it. That's why you can use it. You just flip coins and watch what happens. Very, very simple. But I want to show you a slightly enhanced version. Who uses Monte Carlo method? All kinds of people use the Monte Carlo method. But turns out there's a beautiful enhancement. Beautiful in the sense of it's elegant, it's simple, and it's very natural. It's called the metropolis algorithm. And that's what we're going to finish with here. And we've got another little exercise we're going to do with the metropolis algorithm in a second. But let me just introduce it. It's something that came out of well-known paper, some of us, to me. It was commissioned by Fast Computing Machines. This is 1953. It had five co-authors. Two husband and wife teams. Which reflected the incredibly important role women played in the early days of Los Alamos, particularly related to computing. The authors are interesting in a bunch of characters. Nicholas Metropolis was the computing guru at Los Alamos. Anybody who uses computer, he got his name on the paper. So he gets his name on this paper, and he gets the whole algorithm named after him. He really just happened to be there. He really did nothing of substance than anyone could decide. But it's the metropolis algorithm. The tellers, Augusta and Edward Teller, many of you might know Edward Teller, the name. He was very prominent in the 1950s. He was a fervent anti-communist, intense supporter of development, the hydrogen bomb, and many people think he's the model for Dr. Strangelove. Then we'll do Dr. Strangelove. He was well known in his time. Fantastic physicist as well. The Rosenbluths, I apologize for not having a picture of Ariana Rosenbluth. It also points out the almost invisible role women played at Los Alamos, even though how incredibly important it was. There's no extent photos that I can find of Ariana. They were really the biggest brains behind the metropolis algorithm. Particularly Ariana was responsible for the translation of sort of blackboard calculations into the ENIAC, the computer they were using at the time. She was one who was able to make these things happen, which was a substantial undertaking. It would be substantial undertaking in any era, particularly in that era. So what did they do? I'll show it by example again. I used what's called the eising model of a ferro or iron magnet. It's a very simple model. It has this idea that ferro magnetism with what's called a monopole, a little arrow that either points up or down. And you put a bunch of these together. In this case, they're pretty much as many up as there is down. This would be un-magnetized. We can say it's also disordered and it has a high energy. In this case, energy is simply counting how many opposite pairs there are in this configuration. On the other hand, if all these sort of magically mostly point in the same direction, it's un-magnetized and it has low energy. Now you put these two side by side and you see, you know, magnetized versus un-magnetized, ordered versus disordered. We also introduce this idea of energy. When you induce energy, nature tends to interpret that as probability. Nature likes low energy. That's why balls drop. Nature likes putting things in low energy configurations. And the way it does it quite often is through probability. So anytime you associate a low energy, nature considers that as the high probability. Okay. Now, in this particular case, if order is preferred to disorder, if I constructed a model where order magnetization is preferred to disorder, it raises an interesting question. Why isn't iron magnetized? Well, the answer is arithmetic. Order vastly out-numbers disorder. It's kind of the Burrell idea of normal. Scrambled digits disorder is the normal state of affairs. So even if I prefer any one little ordered arrangement to any one disordered arrangement, there's just so much disorder out there that order doesn't have a chance. This is roughly the situation that, say, room temperature. We call that high temperature, but at high temperature, the difference in preference between order and disorder is very, very small. So if you prefer an ordered arrangement to a disordered arrangement, I got too much disorder. But here's the trick. If the temperature decreases substantially, the magnitude of that difference between order and disorder grows massively. Even though you can't fight the arithmetic, you can't fight how much order there is versus disorder, if now, if you really prefer order so strongly, it can sort of emerge out of the disorder. Even though it's numerically still vastly outnumbered. And this is actually what happens with iron. At very, very low temperatures, iron becomes magnetized. Even though disorder is still winning numerically. Okay, and that's sort of interesting, and we're going to tie this back to Monte Carlo method in a second. But this whole process is governed by what's called the Boltzmann distribution. Boltzmann distribution measures the energy, probability, temperature combo. It's named after Ligwood Boltzmann. I think of it as nature is trying to show us what is most probable. For example, the Boltzmann distribution governs the organization of the molecules of air in this room. Now, there is some probability that all the molecules of air in this room will congregate in the far right corner back over there. It would be unfortunate if that happened. It has extremely low probability because it would take a tremendous amount of energy to get all those molecules organized and back in that corner of the room. The configuration we see right now, thank goodness, is a very low energy configuration. There's a lot of them. These are the ones we see, probabilistically, this is what happens. Low energy wins. The unfortunate arrangement is off scale and leads to a probability that, effectively, it's never going to happen. Boltzmann describes that. Well, if you put Boltzmann together with the Monte Carlo method, that's the metropolis algorithm. It's a very simple idea. You're still flipping coins, but now you flip coins, but you kind of think of the invisible hand of Boltzmann, if I can use that phrase, waiting the coin. Nature wants to land a particular way in order to have nature's preferences play out. That's what Boltzmann tries to describe, and that's what the metropolis algorithm implements. It's just Monte Carlo flipping coins, not just any coins, weighted by nature's preference. Let me just give you a quick demo of it in action. Here's a 400 by 400 Ising model, so there's 160,000 little monopoles in there, black and white. The red bar on the right is temperature, and I'm just going to start running here at a reasonably high temperature. I'm just doing Monte Carlo, and I just get noise. All these little monopoles just flip up now, but as the temperature lowers, suddenly they start leading in the direction of their neighbors more and more, and you get this nice low energy configuration. Raise the temperature back up. Suddenly all that order goes away. We're back to high energy, total disorder, and just for fun here, and it's pretty cool, and you get back to this right here. So physicists love this because you can study complicated systems in a computer laboratory. You can mimic what nature does, you can do it computationally efficiently. You can drive things towards low energy in a very natural way. So it's great. Biggest hit in statistical physics you could possibly imagine. The minute the paper came out, everyone jumped over it like Monte Carlo, it was in everybody's toolbox right away. It was a secret. Monte Carlo, metropolis, were very, very powerful tools, but metropolis in particular never left the physics world or the natural sciences in general. But there's other things you can do with it. You can do a lot with it, in fact, and the trick here is you got to play around with this notion of energy. You have to fool mother nature in terms of what energy is. Let me give you an example. Bell Labs, 1983. 30 years later, this group led by Stuart Kirkpatrick and others were working on problems of combinatorial optimization, which is a fancy phrase for a very simple scenario. Let me give you an example of a combinatorial optimization problem. There's something called the traveling salesperson problem. This is where you're going to have to get busy here in a second. Travelling salesperson problem is you have a bunch of cities, Honolulu, San Francisco, Ashtabula, some others. You want to minimize something. In this case, minimize the cost of a tour. I can think of cost as distance here real quickly. We go around and there you are. It looks like the minimal distance tour. Here's what I want everyone to do. I want you to take out your sheets and you're going to have something that looks like this, but I'm going to give you a little bit more complicated cost. Between any two cities, you read the cost off from this table. The distance cost from A to B is three. The cost from A to C is four. If you want to go from A to B, you have to use $3. If you want to go from B to C, you've got to use whatever the cost is. I want you to build a tour that goes through all these cities exactly once it returns home. I want to see who gets the smallest value. Let's do that real quick here. I'm going to do one myself so you can watch. I might go from A to C, to D to H. Now I'm going to add my cost. If I went from A to B, it would be three. B to E would be four. E to G would be one. G to F would be two. F to H would be one. H to D would be four. D to C would be one. And C back to A is four. Seven, eight, ten, fifteen, sixteen, twenty. You've got to be twenty. You can't beat me, you're in trouble. Remember, you've got to get back home again. You can't stop. This is your classic itinerary. Around every city exactly once and back home. Fourteen? Excellent. Fourteen? You guys are good. Sixteen, yeah. Fourteen? Very good. We're getting some fourteens. Thirteen? Do I hear thirteen? Thirteen? I've never actually seen a thirteen. Not that I don't believe you, I've just never seen. Yeah, good job. Okay, so hold those calculations. Let's actually look at how the gang at Bell Labs solved this problem. They actually looked at this as a very complicated problem that was, you can sort of get a feel for it, it has some complexity to it. If you go around, you might notice that, you know, you could just be sort of dumb about this and list every possible path. I don't think you're doing that, right? There are 720 different paths, but you could. I mean, if you had nothing to do for a weekend, you could list all these. Fourteen? Fourteen? I gotta see that, thirteen. Yeah, just hold on to it though. But that wouldn't work so well if I increased the number of cities to ten, or twenty, so you can't do it exhaustively, and the interesting problems are a little larger. Here's what the Kirkpatrick gang thought. You know, this sounds like I'm trying to minimize something. You know, a tropless algorithm minimized energy, what if I could trick it into minimizing cost? What if I just fooled Mother Nature? Would it even notice? Turns out the answer is no. And they had a very natural way of doing it. They looked at the analogy once again. You have these random configuration monopoles, compare those to random tours of those cities. You can run the metropolis algorithm and just start flipping coins and changing the monopoles. You get all these high energy orientations. If you just did the same thing over and the configurations just randomly started jiggling the tour, you would just get a bunch of lousy tours. But if you lowered the energy, suddenly, for the Ising model, you lock into this low energy thing, would you lock into a low cost solution over on the traveling salesperson problem? The answer is yes. And it's essentially off the shelf metropolis just identifying the components between the Ising model and the traveling salesperson arrangement. There was really nothing to do. So they gave it a name, of course, simulated annealing. The annealing for those of you annealed is a process of lowering a temperature just right so you kind of land where you want to land. So they were concerned about how you lower the temperature, but it works just fine. So here I think, here is my computer aided approach to solving this. Let's just run it. Here it goes. Let's talk about your 13 afters. I don't like to see if that actually works out. It's wrong. The best I've ever seen is 14, which doesn't mean anything other than it's the best I've ever seen is 14. This scales really well. Suppose you had the 48 capitals, the contiguous capitals of the United States. When you're traveling, it will glide through these. You start solving this and it just does the same thing. It just lowers the temperature and starts converging towards this nice energy low-cost solution and you can see it starts settling in pretty quickly. You get this final solution, Jefferson City little rock. What do you do? There it is. And you could probably draw that yourself or something pretty close, but imagine replacing distance by, say, airfare and try to do the same thing. It's just insurmountable. Or what if you had something like a thousand cities? Metropolis algorithm simulated annealing scales incredibly well. It will solve this one. It takes more time drawing it than it does solving it. But it settles into a really nice tour of a thousand randomly located cities immediately. And it resolves a couple little crossings and I think that would be kind of hard to draw by hand. Perhaps you could come up with something close, but again, imagine replacing distance by any other cost. Again, the algorithm doesn't care. So this idea of just mapping the Ising model and these ferro magnets onto things like traveling salespeople was the hard part. Everything else was already done because the Metropolis algorithm always seeks low energy. Let me give you another application. This is kind of a whimsical application if I like it. Actually, it's due with cryptography. And here we're going to do cryptography with the idea of they have a message and I'm going to use messages that have only lowercase characters in a space. It's only 27 distinct characters. So here's a message. It was the best of times. It was the worst of times. The encryption method here is relatively straightforward. Every one of those 27 characters map it to a unique different character. It's pretty straightforward. When you do that, you get the cipher, the encrypted message. And those, unlike for those of you who do crypto quips, the spaces aren't preserved here. So you lose the space. So the challenge is do this backwards. And from the garbage cipher, get the message back or equivalently figure out what the encryption was. And the challenge is you've got to quantify something. You've got to quantify messageness. Put a number on every string and say, this one is more likely to be a message than another. And an easy way to do that is just look at two-letter frequencies. So you've got to go catch all the two-letter frequencies. How often does AB occur in CE A? So I actually went to the tale of two cities and counted all the two-letter frequencies including space. There's 27 characters in play. 26 letters in the alphabet and space. What do you think the most frequently occurring two-letter sequences? E space? Second? Space is big here. Space T. Third? TH. Not surprising. All the ED words. And then you start getting HE and it goes on and on. There's 27 squared of these. You get obscure ones if you keep going. RV occurs two out of a thousand times. So here's kind of nice. Once you have these, to every message, you just sort of calculate the total frequency of the two-letter occurrences and that gives you a gauge of how message-like it is. So the top one would be high probability. And the bottom one would be low probability because the top one has more high probability two-letter frequencies in it. Again, that's all you need for the metropolis algorithm. And again, it maps right on top of everything else. So, here's an example. So this is another person from the world of literature. This is not Dickens. This is another passage you might recognize. It's been heavily encrypted. I'm going to start running the metropolis algorithm on it and raise your hand when you think you know what it is. I think many of you will recognize. Again, there's another pretty famous person in the world of literature. So here are a thousand iterations on the metropolis algorithm on this. Anyone have a theory? Don't shout it out. Just raise your hand if you think you're kind of wrong. There's another couple thousand. Another couple thousand. One more. Okay. Just hold that. First one up. One more. Famous man of literature. Bob Dylan. Recent winner of the Nobel Prize in Literature. My personal hero. And nicely encrypted. Let me just leave you with some words of wisdom from our Nobel laureate when it comes to doing this sort of thing. Remember, the highways were gamblers. You better use your sense. Whenever possible, take what you've gathered from coincidence. Thank you very much. All right, we have time for some questions. You can raise your hand, and I will bring the microphone too if you could stand up. If we have any questions, here we go. Well, first I want to ask the person who raised their hand, were you right? Oh, yes. Just a second. Way to go. How is it possible to compress a seemingly random sequence without cheating and embedding? Seemingly random means you can't compress it. You gave the example of Pi could be compressed. If you believe Pi is random, it's not random from the sense of compression. There is no definitive statement something is random. That was one of the takeaways. The reasonable people can differ. Pi, as it turns out from that perspective, is not random. And so what's then the trick for compressing Pi? There's a lot of very simple algorithms that produce the digits of Pi that you can show in a high school algebra class how to get digits of Pi very easy in an algorithm that's no bigger than this. So you can compress the infinite number of digits of Pi into something you can write down in a space about that big. It's compression. I was just wondering what the time complexity of the metropolis algorithm is if you know it. It scales linearly in the size of the problem. Which is actually very nice about it. Yeah, yeah. A little more philosophically, there's just a paper that came out recently on so you can encrypt things and make things and kind of obfuscate what you want to send to people in a message. Do you think that you should have a second amendment right to protect to encrypt your own data as a weapon? My personal opinion is you have a second amendment right to encrypt your own data. We're going to do two more questions and you can always come up after and talk to Matthew. I don't remember how this project called it's they tried to get some non-random radio signals from far away civilizations. What do you think about this project if it's possible if it's possible some kind of success? Well, they applied this sort of process to those signals to discern whether or not they are random or not. If they are random there's nothing there and there aren't random, there might be something. There's other things that could be causing it but it's, if I understand your question that is exactly what the reason people are still pursuing a operational definition of randomness. Most definitions of randomness aren't particularly operational in the sense that they can be used a little bit too esoteric like the infinite bit-stream thing so that's why people are still seeking operational definitions because there are applications like that or it would be nice to know. Alright, last question in the back. So everything you've been talking about algorithms and von Neumann and all this pertains to conventional computing. Do you have any applications of what you've been saying to quantum computing? It's a new game of course and quantum computing is pretty much stillness infancy. It's not really attacking any real problems that I know. There's always a claim that we're just about there in terms of quantum computing. Quantum computing in some sense is a massively parallel computer and a lot of these algorithms particularly metropolis algorithms have a natural parallelization of them. It would be interesting I'd have to start from scratch and think through what's the quantum version of the metropolis algorithm because it is a new paradigm but it does fit into the massively parallelizable nature of these algorithms at least. I think that's a great question. I'd be interested to see in 20 years if quantum computing is at all a real tool or still one of those. You could do it if only we had something worked out that we don't have worked out yet. So it's a great question. Thank you very much. Alright one last round of applause for our speaker, Matthew Ritchie.