 Hi, everyone. I'm Alex and I'm really excited to talk to you today about two of probably my favorite things in the world, Ruby and Harry Potter. The title of the talk is Ruby Us Hagrid, writing Harry Potter with Ruby. This whole talk is based around a kind of crazy idea. Can we use Ruby, just the regular Ruby language that we know and love, to write a brand new Harry Potter story automatically? This immediately raises some other questions. Probably the first one is why on earth would we want to do that? Then what would that actually look like? What can we achieve if we use Ruby to write Harry Potter story? Then the big one obviously is how on earth do we actually do this? Let's start with the why, why we might want to do this. I realize I'm probably talking to two slightly different audiences in the room right now. First you have people like me, the true blue Harry Potter fans. For all of us, this is pretty straightforward. Just imagine a nice big beautiful pile of brand new Harry Potter books, which means we can just stay happily in the wizarding world forever. There's also going to be a lot of you who have never caught the Harry Potter bug, bit baffled by it all. For all of you, instead, I'd recommend visualizing a big beautiful pile of money because that is what awaits you if you can satiate the rabid hunger of people like me in category A. By the way, if you're not quite sure which category you fit into, there's a little test I've come up with. Just look at this picture and then based on your reaction you can sort yourself into the appropriate category. Those are some of the ways. Maybe some slightly more serious reasons why hopefully this is interesting. It's going to give us a nice intro to natural language processing, NLP, which is really a hot topic right now. It's also I think going to reveal a lot of the simplicity and beauty and elegance of Ruby. We can actually do this with very little Ruby code. Also, I hope it's going to reveal something more general about tackling seemingly hard problems. I'll talk a little bit about that at the end. Now what can we achieve? What will this actually look like when it's done? I'm going to give a bit of a spoiler and show that the output of the final program that we'll write. Let's give it a read. Neville, Seamus and Dean were muttering but did not speak when Harry had told Fudge a few weeks ago that Malfoy was crying, actually crying tears streaming down the sides of their heads. They revealed a spell to make your blood jet, said Harry, anger rising once more. It's definitely not Pulitzer Prize winning stuff, but it more or less makes sense. It certainly has the style of a Harry Potter story. Hopefully you'll be doubly impressed when you see actually how little code goes into making this. Now the big one is how we actually go about doing this. On the face of it, this does seem like quite a hard problem, to look at even one sentence from that extract that I just read and think about how do we begin writing Ruby code to create a sentence like this? There are a couple of key ideas that I want to introduce to help get us started. One is that we want to tell this story word by word. At any point, we just want to be focused on generating the next word in the story. The second key idea is that probably pretty much all of us in this room have a great source of inspiration for this problem in our pockets or in our bags. Those are our smartphones. Why are smartphones a good way to get started on this problem? Pretty much every modern smartphone has one of these, a predictive keyboard. Usually we use a predictive keyboard as a typing aid to write things faster. What's interesting is we can actually use this to generate sentences basically unsupervised. Here's a video I recorded on my phone. Basically what I'm doing is hammering this middle suggestion button. You can see what happens is it starts to generate an English sentence, a sentence that sounds like it's been written by a human. What else is interesting about this is it's not supposed to just be imitating any human. It's supposed to be imitating me because, and some of you might know this already, your predictive keyboard adapts to you over time. It learns your style and tries to imitate that. How does your phone do this? Let's take an example. This is what I get as my suggestions when I type birthday into my phone. The first suggestion is party followed by cake and then we have wishes as the third suggestion. Somewhere in the memory of my phone, it knows that out of all the times I've written birthday, let's say that I've used the word party 30 times to follow on from birthday and cake a little less 20 times, let's say. By knowing what words I've used in the past, it can suggest what words I'm likely to want to use in this instance when I've written birthday. Why is this relevant to our problem? We can take that exact same idea and start doing the same thing with the way that J.K. Rowling uses language in the Harry Potter series. For example, the word golden appears about 200 times in the Harry Potter books. These are the top continuations that come after golden. The word egg is the word that follows golden most often. That comes 13 times after golden and then snitches the next most common and so on. A couple of bits of terminology I'll keep using throughout this talk. This initial word that we use to generate the suggestions, we'll call this the headword and these suggestions we'll call them continuations. The third key idea is that we want to break the way we tackle this problem into two phases. First, we want to learn the style of J.K. Rowling, learn the style of the Harry Potter books. The second step is we want to then use everything we've learned to generate new stories. Let's first look at the learning stage. The learning stage is super simple. All we want to do is basically look at every single word that J.K. Rowling uses in the Harry Potter books and collect these stats. For each headword, what are the continuations that come after it and how many times does each continuation appear? We do it for golden, as we saw, but then for goldfish and gold. For all the words in the books, that's about 20,000 unique words altogether. How would we represent this in Ruby? You can maybe guess from the way this is laid out, but a nice way to represent it is just as a simple Ruby hash. Something like this is what we want to end up with. Let's look at how we actually do this in code. The very first thing that we're going to need is some copies of the books in machine-readable format. Just text format is completely fine. I forgot to mention, by the way, I've put some notes and all the slides and everything online, and you can find links to these kind of text files there. I'll show that link again at the end. We start with those text files, and then we want to start by doing some cleaning. Something called tokenization, which basically means getting rid of special characters, we'll lowercase everything, we'll turn everything into a symbol as well. That's going to be a lot more memory-efficient to work with. Taking a sentence like this, we'll end up with some output like this. Once we've tokenized everything, then we're ready to actually build up our hash of the headwords and the continuations. This is a really nice example, I think, of how elegant and simple Ruby can make this. This is all the code that we need to do this. Let's have a look at what's going on here. We start off by using this nice built-in method, each cons, which is short for each consecutive. That's basically going to take each consecutive, in this case because we've passed the argument to, each consecutive pair of words. We'll start with the cat, then we'll do cat sat, sat on, and so on. Then for each headword, we're going to say, okay, if we haven't encountered this headword before, let's just initialize a new hash inside. To go along with our headword, we'd start with a new hash, which will have a default count of zero. Then we'll just say, okay, for this combination of headword and continuation, let's increment the count by one. That's all that's going on here. The first iteration, you can see we'll get the, and we'll say, okay, cat follows the word the one time. Then on the next iteration, sat followed cat one time. We'll continue iterating through all the words in this example sentence, and we'll eventually end up with something like this. We do exactly the same procedure, but instead of this example sentence, we do it on that corpus of all the Harry Potter books. That's our learning phase done. That's all we have to do to learn the style of the Harry Potter books J.K. Rowling. Now we need to figure out how we use that to generate new stories, and there are a few different approaches. Let's start with the simplest, which is called the greedy algorithm. Why is it called the greedy algorithm? Well, it's because in each case, it takes the biggest, juiciest option. What that means is it just goes for the most likely continuation, the one that's appeared most often. In our golden example, remember, we said egg appeared more than any other word after golden 13 times, so we would just always pick the word egg after the word golden. Okay. Then once we pick egg, well, after egg, the word and is the most frequent continuation, so we'd pick that one next, and we just continue on like that until we have a story. This again is really nice and easy to implement in Ruby. You can see here what we're doing is we're taking all of our continuations, and we've got the word and the count, how many times it appears. Then we're just using the max by method to say, just give me the continuation with the highest count. That's all we're doing. Really nice and easy. If you've really been paying attention, then you're spotted one other problem, which is, well, what do we do about our very first word in the story? Because when we're doing our first word, we don't have any previous word to continue from, so we're going to have to do something a little different to start our story. There's a bunch of different approaches, but in this case, we can just start with a completely random word. Any random word in our vocabulary, that's what's happening here when I'm using the dot sample method, just pick a random word to start us off. Then in this case, I'm going to make a 50-word story, so I'm just going to repeatedly apply the greedy algorithm, and then word by word hopefully build up a story. How does this work in practice? I ran this for the first time, and this is what I ended up with. Oh, no, said Harry. A few seconds later, they were all the door and the door and the door and the door. This is not a great start, but maybe I just got unlucky, right? Remember, we start with a random word, so maybe this is just a really bad choice. Let me try it again. This is my second attempt. Sereptitiously, several of the door and the door and the door and the door and the door and the door. The good news is, we won't struggle to find a title for our new Harry Potter story. The bad news is obviously pretty much everything that we've been doing. Obviously, something's gone horribly wrong here. What's going on? Let's look and walk through what's actually happening here as we're running this. Let's say we start with the word several. The word that appears most often after the word several is the word of. The word that appears most often after of is the. Then after the, the most common continuation is door, and after door is and. All good so far. The problem is that the most common word that comes up to and is the. You can see what happens is we get stuck in a loop. You might be wondering, does this always happen? Sadly, the answer is yes. Weirdly enough, the word that gives us the longest story without going into a loop is actually conference. I can't make these things up sometimes. This is what we get if we use our greedy algorithm with the start word conference. This is the best we can do, 20 words. Obviously, we can rule out our greedy algorithm and say this simply doesn't work. Let's try a completely different approach to generation. Let's go to the opposite end of the scale and let's just get really random with something called the uniform random algorithm. It's a fancy name for something that's extremely simple. Basically, what we do is we just look at our potential continuations and we just pick anyone randomly with equal probability. We just draw one out of a hat. In this case, if we have three continuations, we pick one of the three with equal probability, one third probability of picking any of them. In reality, we'll probably have a lot of continuations. In this case, we have 117 potential continuations. After golden, we just pick one of them randomly. Again, really nice and easy to do in Ruby. We can just again use the dot sample method and that will just pick one of our continuations at random. How does this work in practice? Here's an example. Debris from boys or a company him bodily from Ron yelled the waters. Harry laughing together, soon father would then bleated the smelly cloud. It's probably better than the greedy algorithm. Unless you're really into avant-garde, Harry Foster fiction. This is probably a little bit weird. The other thing I would say as well is apart from the names, this doesn't really seem like Harry Potter. If you took the names out, you'd never guess this was a Harry Potter story. Why is that? Why isn't this working so well? There's a lot of reasons, but let's look at one example word here. Let's look at the word house. After the word house in the Harry Potter series, the word elf appears over 100 times, but it's just one of those 200 potential continuations after house has a one and 200 chance of being picked. By the way, a house elf is like our friend Dobby from earlier if you didn't know. The word prices does appear. The phrase house prices appears in Harry Potter, but only appears once, but this has exactly the same chance of being picked as house elf. Obviously a program that's as likely to talk about house prices as house elves isn't really doing a very good job of imitating Harry Potter. That gives us some clues about how we can improve this. Our final and best algorithm is what we call the weighted random algorithm. What we do here is we just solve this problem. We look at the situation here and the word house appears about 700 times in the series altogether and 100 of those times it's followed by an elf. Logically, it feels like there should be a one and seven chance of being picked. This is exactly what this algorithm does. We just rescale the probabilities so it matches how many times the word appears. Again, this is surprisingly easy. Just to reinforce that point, you might end up with a situation like this, where the most frequent continuation would have a higher probability, let's say one half probability of being picked and then one third, one sixth and so on. Again, surprisingly easy to implement in Ruby. This one's maybe slightly more difficult to understand, but the intuition here is you can think of this like a raffle where each word gets an entry equal to the number of times it appears. In our house elf example, elf gets 100 entries to the raffle, whereas prices only get one entry to the raffle. That's what this times here is doing. Yet one ticket for every time you... The series and then again, we just draw from that raffle using this example. So this is the weighted random algorithm. Let's see. This does spring forward as though they have lighted it to the grid. You can see that in the light of the torch in Harry's pumpkin thug. So it's still pretty weird, but it's starting to feel more like a real Harry Potter story. How do we make that last jump and prove this to my whole story at the beginning? There's one last big idea here on Regeneration, which is to rip us up our predictive keyboard, and there's actually something more interesting going on here, because if I just start my typing quit and it's my plan to get this kind of genetic test in a few different times, but if I type in something like fish and then my phone knows that I've read different suggestions, so that's the first thing I'm going to do. So obviously, my phone isn't just looking at the previous word, it's actually looking at more of the history of the town. So the last key idea is we can improve our algorithm a little bit more than just the previous word. So what that means is rather than building a hash like this with every word in its continuations, now we're going to want to build a hash. Sorry, what we do this, we're only able to think about two words, one head word combination, this is what we call a bi-gram on bi-gram into word. So instead, we're going to want to look at every unique pair of words in the book series and then all of the continuation of that word and this is the, this in this case we're thinking about three words at a time, so this is a tri-gram model, we could actually extend this to a four-gram model, five-gram model, and so on. Now obviously, because I think unique pairs of words, there's a lot more, right, so I think we said 20,000 unique words and all you have about 300,000 unique pairs of words, so a lot bigger than actually building, but this will still compute that for a second on a machine. And again, you know, Ruby does an amazing job of allowing us to do this without needing various changes. All we really have done here is add-ons, blast operators, and that basically just says, rather than taking a single head word, we now allow us to call a pair of words here. And so in this case, now with this, so these are entity three, and then a tri-gram model, and now everything on stage is the same, of course, so we'll build that up, and actually add some words instead. Okay, so this is the output of the tri-gram model as it exists. Normally, when Dudley Kemp is voiced very loudly before, Demento is said to have been voiced deadly, and we have a lot of time, all this mess is much more than the word that's carried over again, but it's not going to do that. That's what we carry on. And by the way, we can go to five-gram models, six-gram models. The problem is, once we go higher and higher, we end up getting closer and closer to what's actually in the book, so that you just can't think of how to do it in the book. So I think we're just going to have to think about how well that might be. All right, so stepping back for a second, we've seen that this problem that the staff maybe, some of you have had, and it seems like a quite complicated idea, that would be quite difficult. Who's actually going to have to do this in 20 lines of pretty short, maybe for Ruby code. And I wanted to kind of take this opportunity to just finish up by talking about, you have to use some broader lessons we can draw from this. Obviously, because we have this really interesting inspiring project, but yeah, how do you actually apply this to your general class as code? So on the surface of this talk is about Harry Potter. There's another point you have to me on you was that they thought it was about another PGP, and it was also about high problems, right? How do you take problems that's really hard on the surface, and then quite a tiny bit of resolution, you're like, oh, that was hard. So I think high problems are rather important to all of us. So if you're, you know, a new programmer, but even if you're a veteran programmer, I think it's really important to reflect on how to solve high problems so you can pass those skills on to the next generation. Now I think there are three lessons, you know, that I certainly learned to be enforced in making this video, and I just want to tell you my advice. So the three was good tips, I guess, of tackling high problems that we saw in this talk. The first is to understand how to break down the problem. The second is to really pay attention to failures that are tackling hard problems, and the third is to kind of find the right answer for your problem. So let's start with the first one. Understand how to break down the problem. So this problem has been out for one bite at a time, and I think this is a really nice philosophy to apply to any hard problem. But often the tricky thing is working out, what is the bite of this problem to make? What are the full building blocks to be used as well? So this is sort of in this talk before, the story-telling problem, the bite size was a single word, right? We generate a story problem. Now sometimes for a hard problem, the bite size will be quite obvious. So let's say we're making a chess game, well, a little unit of work is probably a single move, where we kind of, quite an essential one, when moving at a time. But it's sometimes going to be more answering, so let's say we're writing something like that thing, an algorithm, by the way, for Hogwarts, it's not any better, and how would we, you know, this game can seem like a quite hard to make a problem. How do we break down the functional management or what ultimately a routing algorithm is just a series of decisions about a turn, right? Go left, go right, go straight, so we can think of that as one bite, this problem is a single turning decision. And sometimes it will be really not obvious at all, and if you haven't encountered a problem before, so let's say we're doing face detection, face recognition, again, it seems like a really hard problem on the surface. How would one ever do even begin? Well, if any of you have ever had this or done face detection or recognition, then you'll know that the units in this case are what are called facial landmarks, so these little red dots, and so it's all about placing these little facial landmarks and then translating that into, you know, recognition or approach detection signal. So the first tip is really, you know, when you're tackling a new hard problem, ask someone familiar with the problem and what are the building blocks for solving this problem? How do we tackle this one step at a time? How do we break it down? So the second lesson to take is that we should really pay attention to our failures. So obviously we had a bit of a false start when we were telling our stories. Now we could have just said, oh, you know, that doesn't work, let's throw it away and try something different. But what I really had to do is look in detail at the failure, try and figure out, well, why did that fail? You know, map out what was happening every step and working out where the problem was, because that was then what yielded the solution, what yielded the next step. So, you know, it may sound obvious, but anytime you're tackling, especially a really hard problem, just really pay attention to those failures. And ask, you know, ask someone who knows again about the problem domain, why didn't this work? What specifically about my approach was flawed. And then the last lesson to draw is that finding a good metaphor for a problem is actually really, really valuable. So when I was wrapping my head around this, I found this metaphor of the predictive keyboard to be really, really helpful in understanding how to go about this. Now, it's really easy to come up with bad metaphors, but coming up with good metaphors is much more tricky. And I think a good metaphor has a couple of qualities. So the first is that it keeps and captures all the essential parts of the problem. So you can trip out things that aren't, you know, that aren't relevant and you can modify things, but there are certain essential parts of the problem that you just have to keep. And the second thing is that a good metaphor allows you to sort of play around with the metaphor and then learn something about the original problem, right? So it's not just a way of understanding the problem, but it's like you can actually make progress on solving the problem by playing around with the metaphor. So let's take another example of this that I really, really like. So let's say I'm Dumbledore and I'm trying to schedule the Hogwarts classes. Now I have to schedule the classes so that there aren't any clashes. So if, you know, a student is taking two classes, they shouldn't be scheduled at the same time, but I do want my timetable to be efficient. So if I can schedule two classes at the same time, I want to do it. So for example, here's a bad example of the scheduling. So in this case, hopefully you can see the colors. I've scheduled Arithmancy and Ancient Runes at the same time for the same color. But, you know, Hermione is taking both of them. So this is a, this is a failed attempt at scheduling. So I can definitely look at this in the tabular form and it works okay. There's a sort of nice metaphor, a nice different way of thinking about this, which is graph coloring, which some of you might know about. So basically what I do is I just draw out the classes as dots on a piece of paper or whatever. And then if two students are, or sorry, if a student is taking two classes, the same student is taking two classes, I just draw a line between the dots. So for example, remember we said Hermione is taking Ancient Runes and Arithmancy. So these two have a line between them. And then basically my challenge is I've got a color in the dots so that at the ends of a line, I don't end up with the same color ever. Okay. So, and so for example, this is an example of a valid coloring. And what's interesting about this is if I can come up with a valid coloring, I've also come up with a valid timetable. So it may take some thinking about, but if you reason this through, you'll realize these are actually equivalent problems. If you can, if you can do the coloring with the lines in between them, then you've eliminated all of the conflicts and you have a valid timetable. Now, I absolutely love this metaphor because, yeah, sure, we can look at things in the tabular form. But, you know, having this metaphor to draw on, we can actually start playing around with this. We can get a piece of paper and a pen and some coloring pencils and start making progress on this. And it keeps all the essential parts of the original problem. So everything I learn by playing around with that metaphor, I can apply to that original hard problem. So again, find somebody who knows about this problem domain and ask them, what's a good metaphor for this problem? And as I said, you know, favor anything where it's something you can physically play around with on pencil, you know, something you can play around with pen and paper or a physical object or something like that, it's really, really helpful. Okay, so, yeah, those are the kind of key lessons about hard problems that I think that, but certainly I learned from doing this talk. So, yeah, just thank you for listening. I've put the slides and notes online at this address. And I think we're cutting quite short for time, so I won't take questions now, but feel free to come up to me afterwards if you have any. All right, thanks very much. Thank you.