 By the way, with respect to the paper I will mention next, on the next meet-up, there will be common motive between these two papers, and actually not one common motive, so second common motive I will betray next month, but this month I will tell, there are a lot of papers lately on the machine learning that shows substantial progress, like 2011 deep learning, visual deep learning paper by Yan Lekun, and his team that basically made, shown the whole new class for, for image processing, and general thought is that we are reaching the moment that learning is not that difficult, if you know what to learn, how do not offer feed and you have the right amount of data, obviously. Because as known from the work of I think Minsky, in 60s they had this, this chart with that shown how much you can learn from, with the different sophisticated methods, and there was the basic inference was that if you have enough made data, and you are learning out of gold, it is even not very sophisticated, but it is able to represent the function space, you will get the right results anyway. So if you have something more than regression, maybe polynomial features for example, yeah, of sufficient degree, and even regression with polynomial features can learn something, if you have enough data. The interesting thing about deep learning is also recently the game against go, that's the next month, that's exactly the next month's paper. And as I said, you may not believe, but there is a common motive here, that I will discuss maybe at the end of this meetup or next month. So this paper is solely about human level learning through probabilistic program induction. So by human level concept learning, we mean that from a single example, we can extrapolate what is the class that we try to detect, try to produce another exemplar of the same class, and recognize whether the other character is at least vitally similar to the class. If it represents the same concept, which is something completely different than classical machine learning, where you expose yourself to a big training set, like hundred of images or more, in a hope to recognize even not so much changed instance of the same problem. And the best example is how humans do it. So basically here you see, for example, segue. Maybe it's the first segue you've seen in life, but you can recognize the wheels. You can deconstruct that there is a stick in the middle. There is a guiding stick on the top. So you can already disassemble it. And that's what authors believe is critical for human level concept learning. So that by induction, it has to be constructed from smaller parts and from the parts that I already know. So you can recognize the wheel itself. You can recognize the steering stick. It's not a wheel. You can recognize the suspension. So you will recognize the basic things from which this machine is made, even if it's the first time that you've seen the segue. So the idea of this learning algorithm is to buy us the algorithm towards the things that we construct in a certain way. In this example, characters are usually drawn by hand. Or even if they are printed, they barely resemble us to a particular script with which they were written. So maybe we can use this knowledge and say, oh, but how would it be to paint a character? And here is a more concrete way of how humans would learn it. So on the bottom, you have the character in a font that doesn't belong to the handwritten script, but it is similar to handwritten characters. So we can use the same method. Then on the top, we have different handwritten characters. And if you think about it, you can disassemble in memory all of them into basic elements, like this character. It's very complex, but you can disassemble it to a circle, to a hook, and to a last twisted line. If you disassemble it into three dashes, three lines, that were drawn separately, you can reproduce it after seeing just one character, you are able to paint it. You can reproduce it this way. And that's what makes us so wonderful about writing and painting. So how would it be to make a computer recognize the way that the character was drawn? So first we can say that we have basic elements, which are small lines. So we can have a turn, we can have a hook or a wave. As you prefer, we can have a line down and a line up. Each of them, if it's used for a character, is reasonably unique. Maybe not line down and line up, because it's sometimes difficult to distinguish which way we draw it. But if after this line is there is a hook or something, we can basically guess that first we draw from the top left and then we go down and have a hook. Then we have these basic motifs and from these basic motifs we assemble the longer lines. So basically those that are drawn without putting the pencil off the paper. So as long as we can do it in a single stroke. This single stroke is basically a composition of this connected sequential. So we have this epsilon as single stroke. And if you have these, you can also immediately constrain the locations. That's important for every learning algorithm. The most important thing is to restrict the search space, so that it's a reasonable search space so that we don't learn forever many details that do not generalize. So that's the first way of limiting the search space. And the second is taking two stroke characters. So we have epsilon and a hook and we know that we cannot draw them together. Because our parsing reached the limit, the line ends. So we have three line ends here. We know it. Two strokes and we try to assemble them together. So the relative orientation between these two strokes tells us what it is. Not absolute position but relative. And this allows us to distinguish characters. So for each element we can draw the hook, the turn or straight line with slight, they call it motoric noise in this paper. Basically there will be noise in the angle or that it will be slightly curved. And thus you are able to reproduce for the same character also reproduce this noise in the components. So you already can show the exemplars. That look the same like the same character but are slightly different. And how do you distinguish between these characters is that you use different motifs in different order. And it seems like maybe a very complex thing, very complex way of learning. So they basically shown in this example that if you have the basically black and white image without any bias you can easily assuming certain width of your pencil you can easily parse it in multiple ways as different lines. So for example minus 593 is one of the parses. It's one line and then it traces back on the same line. No problem with that. We just consider all possible parses as different strokes and try to match each of these strokes together and then we say how similar is it to our pattern. So we basically try parsing it as sequence of strokes then match each of sequence of strokes to the patterns, to the basic motifs and then to the patterns and check which one is more similar. So it turns out minus 593 is actually not very similar to any pattern that we stored. But minus 505 can be parsed in two different ways, depending how fuzzy it is. Why minus? Is it some kind of distance measure? Yeah, it is a distance measure. But the difference between two these parses is significant enough that they distinguish I suppose sometimes it may lead to conflation or confusion. But it's still about the accuracy we will tell in a minute. So I would tell you first we assume that we know the components and if the space of the possible parses with the given components is limited we have very limited search space and then it's very easy to distinguish between different alternatives because we don't conflate things that do not matter in the search space. Our search space from something like 100 by 100 images gets limited to just few patterns with few relative positions. That's it. There's just few bits laterally for each of these colors. And they also tried to show what is the best way to do it. So initially they tried very sophisticated, very sophisticated hierarchical deep learning. Of course there was deep convolutional network, that's a classic way of doing image analysis. But without limiting the search space it still goes down in the error rates because here the lower is better. So basically the error rate is the penalty. But it never goes as much down as when you try to hire people. People are the pink ones, the violet ones. Why? Because people know what should be there. So they limit the search space very well nicely. So we do not really see this every pixel does matter. If you remember a character you don't really remember where there was a small dot or small blemish, in most cases, unless that was very stressful situation, you will remember a general motif. So you will remember actually very limited amount of information because you have preconceptions, prejudices of what should be there and how it was made. And to reach this accuracy, they've limited the search space and BPL, this is their solution, Bayesian program learning. So basically Bayesian inference on what was the way of drawing these characters. Probably with some tweaks can reach the same accuracy as models. The thing is that when you start adding legends, so you start adding interruptions into these motifs, it starts being less accurate and only as accurate as deep learning models. That's probably because their program learning doesn't infer that this could be broken in line. There is an error inside. That's something that can be actually sometimes recognized in handwriting. I don't know if you had ever occurred to you, but sometimes if the character is obscured by some, but part of the character is obscured, you will read the different character because you will infer something else. So this is problem common for people also. The way to avoid it is to add the model about how it can be obscured. So maybe it's just small interruption or maybe some other factors like other object can obscure the character. But this is the thing that the Bayesian program learning doesn't know. So it cannot infer where did this disruption come from. So I would say the main motive here would be the dramatic reduction of search space by assuming how these characters were made, really adding the knowledge to the learning method about how much simpler are characters from arbitrary images. Because on the right side in this benchmark, we definitely see how generic image recognition application works. And on the left side, we see how much better is the method that assumes that it has to be a drone character, even if it's not. It just has to be similar to a drone character, yeah? So that would be it. I hope that was dynamic enough and I captured your imagination. And do they say anything about how to get this concept information, for example, the books and then lines and that kind of stuff, or is it just given? So I assume that's application-specific. I understand that they, in this case, they basically first use convolutional network to just distinguish how the individual strokes are drawn. Yeah, so it just distinguishes. Oh, this is a strong going this direction or the other direction. And then they merge this information into full lines. That's described in a little detail in the paper. But they have a GitHub reference link to the code. That's what I found very interesting was I think the ability to define the primitives, so to say, and to create a model of them in the same sort of framework as what you're using is something that's super critical to be able to do it. And I mean, I'm trying to think like how can you think about this from a perspective of what humans, the way humans learn is, I guess at some point, you either based on history or based on teaching, are told that these are the primitives that you want to keep looking at. And then, OK, I see this primitive here, I see a veil, I see a going. So there is this deep dream paper that was referred by Martin. And if I may refer to this paper, basically in this paper, they show that like basic recognizer, basically one-layer network, say, just recognizes one thing. But when you arrange it into deeper layers, basically each level will recognize slightly different function. And you can see that different neurons in hidden layers basically recognize different motifs or functions. So we can assume that from the learning perspective, it's some program or some method to be learned. And our neural networks are just particularly flexible mechanism to recognize any program so they can adapt to learning. The visual, recognize the visual images by decomposition into basic functions and different levels. They can also do this. So basically here, the first level of the machine is just five by five convolutional network. I assume that there is something that works like convolutional network. That's called, I think, the visual memory. Basically, you focus on a single spot and recognize a single spot. But the objects that you have seen and recognized stay in your memory and you can refer to them even if you close your eyes. So the first very simple example is you close your eyes and you see the room still. Sorry, for the left-hand side, this classification task, what was the actual, was it recognizing the letters or what was it? So they actually, on the right side, you see, on the left side, it was one shot, so one classification, among 20, among... What are the classes? So they used, I think, about 10,000 characters from different human scripts and alphabets, including, say, Leitin, Sae, this was quite huge. I assume that 10,000 cannot cover the full traditional Chinese characters, but, I mean, maybe Chinese is not the easiest example here, because you have really many strokes, yeah? And the order is not clear. Are there any examples other than this character recognition of people So their method is specific to character recognition. These kind of characters? So the ones that you draw with relatively small number of strokes. So not traditional Chinese, maybe simplified Chinese, yeah? But how do you go from the generative model to getting a class? Like, okay, this is the letter A, or this is the letter Z, or this is the Greek letter, alpha. So first you have this very small convolutional network that just guesses in which direction you draw the stroke. Then you connect this into a single stroke, yeah, until it ends. And when you have a stroke, then use these motifs, turn, hook, straight line. You decompose the... Yeah, you decompose it. It's like letters of alphabets, in a way. Yeah? And then you have a sequence of letters of alphabets, basically. It's word recognition. It's very simple. There's sequence recognition. You recognize a sequence of the same things. Yeah? Also, you know that the alpha is drawn with blue pen. You can see here. So basically, you can see these two letters together as a single stroke, which means turn, turn. That's very simple to find in the database, yeah? And you have a database of all the known letters, all how they're composed. But it wouldn't be exactly the same, right? So the one you... The model you get from your example... He has some noisy part for... That's the thing. This database has 10,000 different characters, but it has also multiple exemplars for each. Oh, your input is also examples of the actual... Yeah, so for example here, on the lower part, you have the same letter and different examples of it. So this is in your input, in that sense. The known character... This is protesting. Your input should be able to read the motive from a single example. But you also have examples of the actual, like the actual letters, right? Like you're trying to guess. You have a single example of a letter. Yes. And then you can recognize, say, 20 other examples of the same letter. Slight variation of it. Yeah? Yes. I mean, these are not so slight, because normally it's considered not too easy image recognition task to recognize all the rotations. And here you have distortions, you have motoric noise, everything. It's like, as long as it's still the same character in the sense that human would... So you have a single example of every... Yeah, it's 10,000 different... ...alphabet, or a letter of an alphabet of 10,000. Yes. Okay, makes sense. So what you're trying to basically create is a generalized understanding of how this alphabet works, or what the primitives of this alphabet are. It's like learning telegraphy. Well, the primitives in a way are built-in, because the hooks, the circles are built-in in a way, yeah? So are these automatically sort of learned by the network, or are they programmed? That's actually not entirely clear from the paper. Okay. But they basically embed this knowledge. Okay. It's more motives, and then they... So it's basically a pattern matching at first, then use the same patterns to... If the same pattern matches, or the same sequence of patterns match, then it looks a little bit overfitting to this kind of problem to me. So I don't know how to generalize it. It's a nice idea to do this. So you have a general character recognition from hundreds of characters, which are usually considered the most difficult. Well, I'm not sure why is it overfitting, because normally when you recognize the task of reading, you assume that you have actually particular alphabet or particular language, and you just are interested in a sequence of characters to close out, yeah? Here, you actually do not assume particular alphabet, because they are multiple alphabets, and you just want to know the character sequence coming out of it, like unicode codes. So it's actually, you know, it's more robust than what humans usually do. This cannot find, for example, sigma in rigged letters. Unless it's in... So you can only recognize something that you already see in the training set. No, I mean, you cannot define sigma with this character with these primitives, you can, because you would have... More primitives? Line left, diagonal right. diagonal left is not Europe. Yeah, it's not... It's not Sean. No more. Like the girls. There are more primitives. Yeah, there are more. Okay. Suppose you have 20 primitives, and you're out up to, like, three strokes, and they all join each other in different relations. You've got an explosively large search space. So they're platformally... They also discuss traditional Chinese, where you have easily the characters of 20 strokes. Yeah, 30 strokes. Maybe a harder example. But so it is for humans to learn the alphabet of 10,000 characters is typical. I could write Chinese. It's some horizontal strokes and vertical strokes, but in fact, Chinese people say, well, in there, there's, you know, various boxes, and there's different elements of this. But really, they've faked it, because they'd have been the Chinese elements of the top bra than just vertical and horizontal strokes. In order to make it work for Chinese and reduce the size of their search space, they'll put the fundamental hold-ups in there. Yeah, they also use random strokes. That's what they actually do here. So they try to make, you know, the motifs. So they make radicals of the characters in other languages. So this is something I didn't really get about paper. They have this nice segue example. But it's easier if you... The components specified are handlebars, a stick, and some wheels. Yeah. There's only, yeah, there's only, yeah. If you already know that, then it's like... So that's given them. Yeah, exactly. Until the end, I always kept thinking that's what they guessed from their alphabet, which is, I think, that's pretty cool, right? To be able to take an alphabet and say, hey, looking at this alphabet of, you know, whatever, a thousand characters, it seems that these are the primitives that are being used, and use that to then do the rest of the thing. That would have been super cool. Yeah, exactly. But that's not what they do. If they've just taken whatever number of things and the relationships, they may have searched through 10 to the 9 combinations, picked out the ones which marked the 10,000 as close as possible, pixel by pixel. Yeah, but you don't write the same people like this. No, I agree. It's interesting, but it's a bit like you're eating a bit like that, right? So then the thing is, of course, they chose their solutions so that they can have... So this is bias of science or nature. They chose their domain to have a great solution that surpasses current efforts by large confusion. That's standard. But besides that, the fact that they are able to just take the single character, disassemble it into parts and replace it or recognize it in any other way, I would say that is impressive. Even if they assume some preliminary knowledge, yeah? So at the point of recognizing that they already have the models of the known letters, then you have this new test character and you construct the model again. At some point, you have to compare the model of the input and the standard... So I assume that the other characters are to decrease confusion. So you know that if your character is as similar, as likely as the new character or several of them are equally likely, then the probability, the relative probability that this is the correct guess is small. Because it's how they compare the two models. So the models of those you've already seen and the new letter, how do you compare the different models? The different... Yeah, it's... ...programs, I guess in this case. This example. We have different elements, different strokes. We compare the strokes and if they are dissimilar, you know, you always have a measure of similarity here at every level of the argument. You have similarity to say a hook. If you passed a hook or a turn, you say this is turned with this accuracy. And then you can have... So this is conditional probability of almost a small element and you could have probability on that, on like level from your model. Right, but there's one model, right? So now you have the new model of the new character and then all the 10,000 you've seen earlier, how do you find the most closest or the best... I think they're basically comparing them. So the other question is, how do they justify this embedding of this information with respect to human behavior? It's like, what is analog in human behavior to having this embedded inside? They would say that inductive learning is the analog. So they would say that we learned that, for example, if the laptop is open and there is presentation, somebody had to prepare the presentation. Okay. Probably one of us has been preparing... Here is not true, I just took images from the paper, but you know, the same way, if you have a chair, you infer that it's made of plastic and steel, for example, or aluminum. Because these are materials that match the usual composition of the chair. Probably aluminum just by weight. It doesn't have to be. But basically humans have so much preconceived knowledge. So from the birth, you learn so many things you can use to limit the possibilities here. Yeah? But that paper is claiming one-shot learning when, in fact, there's only one shot if you give it a little learning ahead. No, no, no. It's one-shot in a sense. They would say, teach it on 90% of characters. They will have the same achievement if you just give it one example. So here you can say, you know what I mean. No, no, have like, please, can you see what I mean? So you give it one example of each character. That's one shot. Yes. But you give it some alphabets so it knows how different the characters may be. Because otherwise it will be like a small child sees letter A and sees letter U, which is kind of similar. And it's A because it doesn't know any other letter. So letter must be A. By induction, it's quite probable that it is A because there is no other letter. Yeah? So that's the typical induction thing. I'm guessing that there's also... If it's character, it's drawn. If it's drawn, it's strokes. It's most likely to be few strokes because that's more convenient for people. So it's a program. So somebody drawn it. I don't care how it looks now. I know that it was drawn. And that I know that it has to be sequence of strokes. If it's sequence of strokes, I know how to reproduce it. Does it make sense? So the thing that I'm not convinced of is that the features are very well selected for this kind of task, which is OK. Then if you give these features and if you select these features and pick, then it seems like an easier solution. But I think... So they also describe in less detail how they preprocess the data. But I understand that they also... So I also concentrated on this. Not on the preprocessing, in finding the motifs, for example, in the characters, but to make sure that you recognize the final character. Because it's the part that has the critical influence on the final effect. So if you make it slightly less accurate on the stage of learning motifs, it will not change the overall performance because you still recognize the sequence of motifs. Maybe we can continue this question, how to infer these figures, primitives, and we can write a paper about it. And underpants. Like what? Underpants known. First we gather the underpants. You gather all the underpants. Question mark. No. OK, so... Martin, for comparison, you have OCR engine, you want to detect... You want to... Or you have a library of ancient medieval scripts. You give it one example of each letter, and you expect it to read the books, basically. I think it's a bit too demanding because the calligraphy style was a bit too elaborate and it had too many variants, so you need to give it fewer examples for each letter because there are ligatures in the script. But the fact that you could do it would be a great achievement, except that here it recognized just one letter, which is much easier. They just want to amass the human performance. OK, maybe that's where we're overselling ourselves in that matching human performance is kind of a given here because that's what you're priming to do. Superceding human performance seems like the next step. Yeah, but that's the next month. That's about superceding humans. I hope by that time they already win against the schoolmaster in Korea. But for the go thing, I can see that from the first principles, it's a natural progression through Atari to go. They're not adding in super extra knowledge along the way. But here they seem to be priming this thing to win. I wonder if this can read the recapturing, the capture images. It would be nice. Can you speed this, like the strokes from fonts in that it just recognized all fonts ever? Yes. Like instead of some human stroke, just do it for fonts that are probably distorted in the capture because this solves captioning sort of problems. That's sort of what they're doing. So the nice explanation that you will see here is here. So it can learn from some fonts, like this lower example, from a single character in this font, you can draw it. The thing that most of the time should be doable from their method, but it's not guaranteed is that if you have this varying width of the line, I don't think they train their machine for this. So that may give problems. So this is yet the another level of what humans know. This is the same line, but it can vary in the width. And their program still doesn't know it. Yeah. Canada. Canada? So 48 characters. Canada, yeah. I made us the one with the blue there. B3. Yeah, B3. B3, B3. But as I said, they use the database of 10,000, which probably includes Canada, but also many others. So there is basically this database of 100 characters in different languages that they mention in the paper. They also give the link to it. Anybody can download it. They also give the link to their BPL. So you can just download the engine and use it, which is awesome. That's how research should be done, by the way. Everything linked to GitHub and reproducible. The page of the supplement information. I tried to read parts of the code, but it's just too much. Thank you. Thank you for great questions. Now I'm not sure that I understand the question myself. No, I think you understood enough that you could explain to us and make us confused.