 So the blurb kind of makes it look a bit like I'm giving an overview or something of it. But I'm only talking about two things, really, because there's not much time. So I'm talking about T-Sneed, which is a way of visualising the relationships between data and recurrent neural networks. T-Sneed stands for T-Distributed Stochastic Neighbour Embedding, which is a way of saying you're looking at a high dimensional space, and you're casting a blanket through this space, which is your dimensional picture, and you're casting points in that high dimensional space of that picture. And that gives you a map of where things, how things relate. And here's an example. So if you look at an image and reduce it to 20 by 20 pixels, or any number of pixels, or even if you take the whole thing, then if each pixel you treat as a value, you get a vector of, if you just reduce it to 20 pixels square, you get 400 numbers, and that gives you a 400 dimensional space, and then all of the images would be cast somewhere in this space, and then images that are similar in their composition will end up close together, and images that are different will be far apart. And the T-Sneed casts it into a two dimensional picture. And so here's an example of a T-Sneed map now. So what do we have? Well, like up here, these pictures are close together. You have to look up in the corner. You can see they're actually from the same studio, the same lights in the background. Here's the whole of the pictures that are arranged together, because of museum-y things photographing is to that background. There's another pile of round things down here. Now, this is just doing the simplest thing I could possibly do two nights ago. Here's a row of fashion items. So these pictures, there's a lot of fashion photos, and they're all clasped together, not because they're the same, without the same kind of thing, but it's because the conventions of this photography, they were taken the same way at that time and get a bit further along and get some of these funny things. Down here, there's a style of portrait that all these people photograph in the same style. Somewhere I found a, near there, there's a whole lot of horses. Here they go. It's not that these horses are the same. It's the way in which people photograph horses is the same. The composition is the same. Here's some people who look like horses. There's some trees that look like a horse. You could use something like that for exploring an image. But I'm not talking about that today. I'm talking about, let me go there. So that's just using, like I was saying, is reducing the grayscale and using the pixels as the features. So that concentrates on the composition and not on things like facial recognition, which is the same. And so if you use more sophisticated things like count of the faces or something, you'd get a different clustering of the images and it might be worth doing. But I quite like the, just using the raw values because you get that compositional insight. So how you'd use this for text is you'd extract features from your documents and then you'd cast them onto the recently map and then you'd see how the text ends up. Now the example which is I'm using, using character diagrams, which I'll just describe briefly, that means you count how often each sequence of characters ends up. And this sentence, the cat set on the map, the three characters, P, H, E occur twice. The A, P space occurs three times. It's a very petty little sentence. And just those counts become a vector, which is a location in that high dimensional space, which then you can reduce to two dimensions using P-Sneak. Now this is a project I did with National Library, with Jay. There's lots of money watching you around for World War I stuff of course this year. You'll get sick of it by the end of tomorrow. So looking at newspaper articles and looking at the character diagrams and newspaper articles and mapping them out, now the coloured according to the year that they come, so I don't know if you can see it back, the darker ones are 1913 and the green kind of goes through the year wars and it ends up at red. Now each of these boxes, these are bits where I've zoomed in. I don't have a nice zooming interface so I can't show you in real time. I was going to try and do that but I wanted to be able to do it instead. So each of these, this is what I was just looking around, and like the ones shipping charts up the top in the moon and tide charts are kind of similar to the list of numbers. The list of people is another strong cluster, whether they're casualties or they're graduates from universities and stuff. And then there's somewhere, there's a whole lot of ones about French warfare, which are just a little short, two sentences, but they're using the same, because they're using the same phrases over and over again, they end up with the same character, the Engram count, so they're all clustered together. Now if you could zoom in there, if I had that thing working, and click on those and see what they were about, you'd see that they were all little short stories about French warfare. And then there's another bit about warfare involving French, using group movements like, you know, the 30 miles from Belfast or something like that. There's also, now that's just looking at articles. Now if I put in the paper's past, Corpus says articles and advertisements, and that's like the two types of things you've got. Now here the red dots are articles and the black dots are advertisements and they separate quite nicely, but there's a few black dots in the red and there's a few red dots in the black, and that could either be that that's an ad that looks like an article to this analysis, or it could be a misclassification for some reason. And so I looked at this, five of them down there in that yellow rectangle, and in fact they, you can't read it. They are actually articles that are misclassified as ads. There's another way which you can use T-Sneak is you can verify some of your metadata. It shows up things, just outliers, and kind of shows the way in which the outliers. And can you read this one? I don't know. So here's another zoom from somewhere in here, but I don't know where because I lost it. Just showing that this is a collection of the articles that are all health related and they happen to be recommending one product to another, like Genuine Sand is extracted, the quick editorials. So that's the first thing, how long did that take? Now I'm going to talk a bit about Returrent Genuine Networks, which is what I do a lot of. Language modeling, I've got a demo here too. So I don't know if you can read this because it's quite small. This is a Returrent Genuine Network reading Dickens and trying to learn how it's predicting the next character of the stream as it goes along. And every now and then it generates a line based on its predictions on its model. And it's trying to learn to predict what Dickens is writing. And so at the top it was nonsense. And as it learns it's getting more and more Dickensian. And that number over there, the cyan number, is that's the cross entropy against the evaluation set, the indication of how well it's matching Dickens. So it's getting lower and lower and it'll get down to... So it's only a small model with this one, but it would get down to about two or something, which isn't all that good. Anyway, I'll leave it going for a while. But as far as language modeling goes, that's actually quite good. It's better than the state of that a wee while ago. And you don't need to see the... Every current neural network is a neural network where the previous generations hidden state is passed on to the new state there. And now if you don't understand the squiggly line, that's okay. I'm not going to talk about that. But it's kind of like the input is just saying what letter turned up. And there's these weights that kind of go into the... Great around circles are the hidden state and then the pink ones are the output. And it just kind of remembers as far back as it needs to, basically, to make a prediction for the next character. It's by far the best way of modeling language at the moment. This is a better picture. So a character level language model predicts the next character based on what it's seen. And you can also do a word level language model, which is better, but it's less robust against noise and strange characters. Now, I tried using these recurrent neural networks on the paper's past gorpers, but it didn't work because the OCR was bad. And there was just too much noise. And this is going off on a tangent again. It's slightly different. Now, I have used recurrent... Who saw my talk last year? So I'm trying not to talk about the same stuff. Not many of you. Okay, last year I talked about detecting authorship on whale oil and using recurrent neural networks. And then I developed this technique where the recurrent neural network language model models all the authors language at once and has different outputs for each author. And by the relative accuracy of their predictions predicts who the author is. And then I entered this in a competition for the International Author Identification Competition. And I won by quite a margin. So this is actually the best way of identifying authors in the world at the moment. So that is something you could use on your text archive if you have stuff. You don't know who wrote it. And you have examples of people who you think might have written it. This kind of machine could tell you. Another thing you can do with a language model is make the OCR better. Now, I'm not actually trying to pick on the people who did digitise the paper's past because they were using what was good at the time and they were using old microfilm and stuff like that. But a good language model shouldn't make these kind of mistakes. It shouldn't think that let it feel is a good way to start a sentence. It shouldn't... I mean, if you see this, you should never think that that is a good word to put out. If you're looking at an article from 1913, you wouldn't put the copyright symbol in. It wasn't current. You wouldn't use the word euros. In fact, the paper's past and it has the euro symbol in it and some of the OCR, which was invented in 1996. So the OCR machine doesn't know, it should know that it would never find a euro symbol. I mean, it wouldn't even find any symbols. This is hot metal printing. They've got a tiny character set. They're not going to have anything, but... Language models before recently, they were always kind of baked in and nobody altered them. But here, this is modeling Dickens better than... After five minutes, modeling Dickens better than most OCR models would. So if you were trying to do OCR of Dickens, which you wouldn't need to because it's already done, but if you use this language model, you'd do better than a commercial OCR system supposing you had the other parts of it done. So a recurrent neural network language model would be very useful in doing OCR of New Zealand texts because you could use the language of the period. You can use the place names. You can use Maori words or whatever. And quite quickly, you make up a language model that's much better than anything that's been used before. So it picks up the language straight away in the sense that a lot of them don't pick up the language? I would pick up the patterns. But it ends up picking up... I don't know if this is doing it yet. This isn't really quite doing it, but it ends up picking up, you know, expecting a noun and then it's expecting a verb, whatever. It picks up syntax. Okay, and that's all I had to say. Not much. And Catalyst, I work for them part-time now. And they paid for me to come here. Questions? Thanks. I really enjoyed that, Douglas. It always blows my mind when you talk. If you are applying a recurrent neural network language model to a body of texts in a particular context, I don't know, early 19th century texts or whatever, how large does the corpus, the reference corpus need to be in order to get meaningful results? Does that make sense? It depends what you mean by meaningful. Well, like this one here, it's done two and a half million, two and a half thousand k of dickens. And that's the chapters or model gaps, so it's not just one book. It's doing a whole lot. And if it only had that much, and trained over and over on that, and it was a bigger net, I would get down to cross-entropy, like you could get 1.8 or something like that with this kind of model, which is good for a character level model. If you had a word level model, you need more text and you get better results. So if you've got huge amounts of text, but a character level one tends to do better with a small... So like a million words will be enough for anything, for character level. It's the best one. Well, that's the measure of... The other ones are kind of equate to the same thing, like publicity and probability and stuff. But it's the one that suits me anyway. I wonder if we could... Oh, it works. If we could pick up the comment that you made about the Maori language and using the recurrent neural networks with the question about amount of character compared to word in a Maori setting, can you sort of give us some advice on that? If you don't have much text, the characters are better, and a few hundred thousand characters as good, you know. With words, you need more. If you have heaps and heaps and heaps, the words will be better. I think maybe with Maori, characters might actually do better with English because it's more regular, and there aren't so many characters, and English is a terrible spelling, and Maori hasn't. Is there any benefit in doing character level or a request of technique first and then following it with a request of word process? Maybe. Another way to do it would be to have a word level thing and then fall back to character level when it finds word that doesn't know. Kind of combine them like that. It would be easy to do a character when straight off, and if it doesn't work, then you try something else. I'm just thinking, the paper's past example, you'll have the OCR correction at the character level. Some of the characters in a word may be correct, but others are just completely wrong. I'm just interested in which way is better. I don't really know, because I'm not an OCR person, but I'd love to have a go. Hi, could you train it to recognise common misprints? A C and L is sometimes supposed to be a D when it's not quite printed right. Could it learn something like that? Right, so you mean that... That character, that character combination sometimes comes up accidentally as C, L when it should be a D in a word? That's the kind of thing it would recognise. If a D was more likely there, it should be saying this is more like a D than a C, L. It's kind of in dialogue with the optical part of it which is suggesting something and says you're talking rubbish when you're suggesting C, L. Could you actually just OCR with that, like off the images? You showed your contrasty thing to start with with the images. We use, as the person responsible for giving you that text... Sorry. A lot of that is we're using Abbey Fine Reader which is what our vendor uses and it's from Russia or whatever. He doesn't really know Maori that well so you give it the specific things or whatever. Could your neural magic actually do OCR, like off an image, and make text? Is that something it could do? Not... I don't really understand what you're doing, to be honest. It's crazy. Do you use recurrent neural networks for handwriting recognition, for example, where they excel? And I don't know if they're using them for OCR directly from the image, but they should be trying it. That brings us to the end of the session time. So again, time to move around if you want to. Thank you very much, Douglas, for that. I think we've got some very fascinating people in the audience.