 Am I doing the microphone the right way? If you can't hear me, put up your hand. All right, sorry. Wow, that was a really good talk. And now I feel like that lady who went on the Martha Stewart show to talk about how she makes fried chicken. All right, well, anyway. So two words or two longer phrases are anagrams if they have exactly the same letters or possibly in some different order. So like soapstone and teaspoons both have two O's and N and E and A, two S's and a P, but in different orders. And after I found that out, I found out that you can actually buy soapstone teaspoons. So it's the perfect gift for the anagram lover in your life. And if you're into anagrams, you know that you just like see them everywhere. And this talk, like the first edition of it started with like three minutes of like, oh, and here are my favorite anagrams and like page after page after page. And then I tested it on my kids and I'm like, huh, this talk is 18 minutes long. So here's some Philadelphia street signs and Locust is anagram of Clout's. Irving is an anagram of Virgin. Arch Street, if you take the ST on the sign there, is an anagram of Starch. Pine Street is an anagram of Instep. And I'm gonna try to hold myself to like just those. Anyway. Finding anagrams is an awesome computer application. And people ask strange questions about how you do this and there's one good way to do this and this is how you do it. You're gonna convert each word that might be an anagram to what's called a canonical form, which means you're just gonna like put it into some way that two words will have the same canonical form if and only if they're anagrams. So the easiest way to do that is just to sort the letters into alphabetical order because if two words are anagrams, they have the same letters. And when you sort the letters into alphabetical order, you get the same thing. So here we have the word sanction and the other word contains and when you sort the letters into alphabetical order, you'll get ACI, N-N-O-S-T. But if you have some other word that isn't an anagram, even if it looks kinda similar like continua, you get ACI, N-N-O-T-U and that's not the same, so it's not an anagram. And it's really easy to write a program that goes through your word list, your dictionary or whatever, calculates the canonical form for each one and then hashes them into a hash or a dictionary or whatever you call it and when two words go in the same hash bucket, they're anagrams of each other and then they get printed out. And in 1991, I didn't know Perl yet, so I did it in AUK and I was gonna, I actually like rewrote AUK a code for this just to like remind myself what it was like. I was gonna show it and wow, AUK sucks. So you have to write your own sort function. Anyway. Excuse me. All right, so let's see. I had the word list from Webster's second international dictionary. I didn't know at the time, but that had been like laboriously entered by Dennis Ritchie for very much the same reason. And I had some other miscellaneous, whoops, miscellaneous dictionaries and I fed them my program and I got out, the output was every single anagram in this 250,000 some word, word list and it sucked. This is too small to read. Hold on, let's make this bigger. So like the first word on the list is that AAL all is an anagram of a la, A-L-A. Well, you know, you don't need the computer to tell you stuff like that. It's like, oh yeah, and the computer is like, oh and, eat is an anagram of tea, isn't that awesome? And then like, if you're like looking at this, you can see it's just like, and then there's AAM is an anagram of a ma, whatever that is and then zootype is an anagram of ozotype as if we cared. And all right, all right, well clearly like, this isn't working. What if I look at the really long ones? Maybe the long ones will be a little bit more awesome. And the longest ones, I sorted them by length and they were just as bad in a kind of a different way. The one I want to focus on here is kind of, it's colocysto duodenostomy, right? So there's at least two things wrong with this. One of them is what the heck is a colocysto duodenostomy? It turns out, colocysto is the gallbladder and duodenum is the duodenum, which is like the bottom end of the stomach and ostomy is like a surgical procedure to make a hole from one to the other. Stomosis is like Greek for a hole. So it's when you make these things should be like connected but they're not. So you make a hole between them. And then the enneagram of that, colocysto duodenostomy is duodenocolicistostomy, which is when you make a hole from the duodenum to the gallbladder. So A, these words are words that nobody knows. That's a problem. And B, the enneagram is boring. And if you can look at the other ones, like they're all like boring in the same way. Chromafotolithograph, photocromolithograph. So there's 40,000, 44,000 entries in this file. There must be something good in there and how are we gonna find it? And colocysto duodenostomy gave me the idea of like, okay, here's what's wrong with this. It's got too few chunks, because you chop it into three chunks and you switch the gallbladder and the duodenum. And maybe enneagrams with more chunks are more complicated. And that idea has a lot to recommend it. For example, if we say that a score is how many chunks you have to cut the thing into before you can rearrange it to make the other one, then, okay, short words will always have a low score because you can't cut a five-letter word into more than five chunks. And mathematicians will tell you, well, everywhere it has an enneagram, it enneagrams to itself and you could say then, but yes, but that only scores one. And they'll have to say, oh, well, all right. So how do you calculate the number of chunks? This is not immediately obvious. And this example, this is kind of a weird example because these are words that people aren't really familiar with, but for this discussion, there's really excellent words for reasons that'll become clearer later. A critted eye turns out to be the scientific jargon for grasshoppers, so you all know that that. And sideridae is a family of sea urchins and there we got a picture of one there, it's spiny like you'd expect. And here, these nine-letter words, I have them mapping to each other with these arrows showing that you can rearrange them with eight chunks. And so this should score eight. Well, no, no, no, no, no, because there might be a way to do it with fewer chunks. Like you can see, they both end in idai, i-d-a-e, and maybe that should be a whole chunk. And in fact, if you analyze this carefully, you'll find out that really there's only five chunks here, or you can do it with five chunks. You map the idais, and you map the other two ids, and then the acr, go to the acr, and that's the last three. All right, so how do you find that? Well, you know, the idai kind of jumps out at you, and it turns out actually, though, that there's three different ways to do it with five chunks. It's really not that obvious. And I can stand here and claim that you can do it with four chunks, and who's gonna gain say that? Come on, Plotkin? I don't think so. All right, right. So how do you calculate the minimum number of chunks? In 1991, I wrote this program to do this, and it's really simple. It just tries every possible mapping between the letters. And you first think, well, okay, but there's nine letter words, there's nine factorial possible mappings. No, no, no, no, we don't have to do that, because we know the c has to go to the c, because there's only one of each, and the e has to go to the e, and the r has to go to the r, and then there's only two places the a can go, because there's only two a's in cedarite i. And so it says, all right, well, there's two places the a's could go, then it recurses, it maps the c to the c, and it maps the r to the r, and then it says, okay, there's two places the i could go, and it recurses again, and it only has to recurse nine times and make six possible choice points. So there's only eight things it has to check, and it quickly comes out with the fact that the least chunky, no, least chunky, most chunky? The fewest number of chunks is five, because there's only eight mappings to consider. Some of them are harder than that, like to handle Colossus to do a doodynostomy, there's 7,680 possible mappings, and in 1991 that took a while, like a few seconds, and to do the whole dictionary of a quarter million words took, you know, like two or three hours or something, but the results were totally, totally worth it, because I found the single best anagram in English, and I'm gonna tell you what it is. One of the words is cinematographer, which is like as familiar and well-known as any 15-liter word could possibly be. And what does cinematographer anagram to? It anagrams to mega-chiropterin, which isn't that familiar, but it's a giant bat! It's a giant bat! Death from above! Ah! All right, so I was completely satisfied. I considered the project a success, and I put it aside for 25 years. Oh, here's other stuff we're gonna skip because Erty tells me I've got 64 seconds left. So 25 years go by, and I thought, you know, I use this kind of brute force algorithm to find the best chunkings. I asked on Stack Exchange, is there a better algorithm for this? And the answer is there is. You can turn a pair of anagram words into a graph structure, depending on how the letters overlap. And here we can see, in cinematographer there's an RA and an ER, and in mega-chiropterin there's an RA and an ER. And to get the fewest number of chunks, you'd like to keep these letters together. You'd like to map the RA to the RA so that instead of going to separate places and making two chunks, they go to the same place and make one chunk. And you'd like to do that with the ER also. But because of the way they overlap in mega-chiropterin, you can't do both at once. So you build a graph where the nodes represent these pairs of letters that you would like to stay together, and then nodes are connected by an edge if they're incompatible and you can't do both at once. And then once you've built out this graph, you find the maximum independent subset, the largest set of nodes that are not incompatible, not connected by edges, and that tells you the minimum chunking. All right, I'm at 10 minutes now. Erty, can I get 60 more seconds? All right. Maximum independent set is NP-hard, and if you were at my talk last year, you know that means nobody knows a good algorithm for it. So people will tell you, oh, well, it's NP-hard, just gotta give up. No, no, no, no, no. Because NP-hard, it has so many loopholes, and one of the loopholes is that if the problem's actually easy, it's just still, it's easy. Like, if you're doing this for soapstone and teaspoons, the graph you build, they only have one overlap, so the graph has only one node, and finding the maximum independent set of a graph with one node just isn't that difficult. It's got one node. And this is a pretty typical situation. So I built this thing out. Even in hard cases, it's not all that hard. Here's all the complicated overlaps in acrytidae and sidaridae. There's like a whole bunch of them, and the graph has eight nodes, and it turns out there's like three different maximum independent subsets that all have four nodes each. And even if you do this in the most naive possible way, there's only 255 possible subsets, and you just have to examine them. And to examine 255 subsets takes approximately zero time. Here's call assist to do a denostomy. It looks a little worse, but it's still just manageable. Anyway, I did this, I implemented the thing, I ran it on my dictionary, and it spit out all the enneagrams ordered by number of chunks in three and a half seconds. Woo! Yeah! So then, all right, one slide left, okay? Erty is the timekeeper. So I am a data hoarder. No, I guess I am proud, actually. I save everything, and I still had the 1992 source code that did the scoring in the naive way. It did take me a couple hours to hunt it up. I lost it, but I found it again. And I ran it, because I wanted to know, what is the actual improvement here from using the better algorithm, as compared to using 25 years' worth of better hardware? And so I ran the old code, which used to take two and a half, three hours, took four seconds. So the moral of the story is, sometimes you need a clever algorithm, and sometimes you don't. And then I, thank you all. Thank you.