 It's a great honor to be at EuroPython. This is definitely my favorite Python in general, my favorite conference. And I hope you have fun today in my presentation and you learn something about some weird neural network models that we are going to cover. OK, so first of all, a short introduction. My name is Nemanya. I come from Serbia, from University of Novi Sad. Novi Sad is the second largest city in Serbia, where I am a PhD student in my third year. And I'm also teaching assistant there at the Faculty of Natural Sciences or Faculty of Sciences. OK, so my research topic is neural network robustness, which has been sort of a hot topic recently. Basically, I'm trying to make neural network models that we have today more robust or less prone to error in difficult situations or when someone is actually trying to make our models go wrong. You can find me on these addresses. This is my email. I also have a blog where I write occasionally about some fun machine learning related stuff. I recently wrote an article. Is there anyone from Google here? OK, then I can talk about it, I guess. I wrote an article how you can SSH into Google collab notebooks and use SSH if you don't want to use the Jupyter notebooks. I guess that's not breaking their terms of service. But if they are, well, I'm glad they're not here. OK. You can also find me on GitHub. I have lots of projects there. You can look at them, site projects, some games, and things like that. So on most social media, you can find me at this handle, which is eight letters, because when I was making my first email, that was the limit. So it kind of stuck. OK, so this presentation, just to go briefly over it, I'm giving you a chance to run away if you don't like it. So it's going to be about this very weird and unreasonably, let's say, working, because I cannot explain the exact reason why it's working, a neural network model, which has to do with something about classification based on missing features. So we'll talk about image classification in general, but it can be used in any convolutional model. What it tries to do is to mimic something we all do every day, which is deduction, which other neural network models don't know how to do still. So when I say deduction, I simply mean, OK, so if you know what all the possible values are and you know what something isn't, you can deduct what it is. OK, so that's what we're trying to do here. It helps in certain scenarios. So we'll talk a little bit about occlusion. So when object you're classifying is behind, for example, another object which you are not interested in. And the general way this presentation is going to go, I'm going to talk about some implementation details, then show you some code and things like that. OK, so the full source code is available, but more about that later. OK, another word of warning. This is all very experimental. And I cannot claim with certainty that all that I'm telling you is true. It's true to the best of my knowledge. We sent a proof of concept academic paper to this neural network portal journal, which is on University of Prague. And I'm looking forward to their comments, this academic journal that specifies in weird, let's say, neural network models among other things. I'm also looking forward to your comments. So you can say this doesn't make sense at all. And I will agree with you. And of course, you know that machine learning models, especially deep learning, deep neural network models, are very hard to interpret. So they may work, but we don't know why. And this is the case today. OK, so what I'm basically saying is don't believe me and question everything I say. I'm looking forward to your comments. It would mean a lot. OK, so let's just go briefly if you are not that familiar with convolutional neural networks, how they work. When you work with neural networks, they sort of become very simple, because when you understand how they work, they are really not that complicated. So if we look at the picture on the bottom of the slide, you can see several layers in a convolutional neural network where a first couple of layers are these layers called convolutional layers. So convolutional layers or convolutions in general operation of convolution is not complicated, not related only to deep learning on neural networks in general. So convolution layers and convolutional operations on images have been around for some time now. You can use them, for example, for edge detection on images. But what made them really work well with neural network models is that before neural network models or the whole neural network thing exploded, let's say, a couple of years before, you had to hand craft or hand make these convolutional filters or kernels. So convolutions as operations work as, you can see on the image of the dog, there is these small squares. So these are called convolutional kernels or filters. And a convolutional operation of convolution is basically taking this filter, sliding it over the image and looking for matches, for example, just multiplying some matrices, basically. So before neural networks, you had to make your own filters. But with neural networks, you can learn these features. So in these filters or kernels, whatever you want to call them, there are some low level and high level features of images that we are classifying. So basically, in convolutional networks, you're using these convolutional layers for extracting features from an image. OK, so for example, on images of dogs, you can say that features on these images are if it has ears, if it has eyes, if it has a vaguely tail, for example, then you can say it's a dog. So these are some high level features, for example. After these convolutional layers come the fully connected layers, which are the traditional, let's say, neural network layers, which contain the weight and biases for actually training your classification algorithm. So to sum it up, you use features from an image to classify it. OK, so you can say if something has eyes, ears, and a vaguely tail, then you can say it's a dog and not a cat, because cats don't have vaguely tails, most of them. OK, so what about missing features classification? So my idea was with this neural network model and algorithm, what happens if we try to classify something based on the features that are not in the image? OK, so instead of saying a dog is what we said before with some features, you can say, OK, dog doesn't have headlights if you're classifying also cars. OK, so here is a motivational example from the MNIST data set. You've probably heard it. It's a data set of handwritten digits. OK, so we have a digit 5 here and two very high level features. We have one a circle-like feature and one like corner line feature that goes to the left and then from left to the right and then to the bottom. Now imagine if you couldn't see the 5, and I tell you we are classifying an image of a digit. And I'm 100% sure it doesn't have these two features, OK? So it doesn't have a circle-like feature and it doesn't have a corner line feature. So because it doesn't have a circle-like feature, we can safely assume it's not a 0 because 0 is a circle, basically. We can also say that it's not a 6, 8, or 9 because these are the digits that have the circle-like feature. We can also say it's written on the slide 1, 2, 3, 4, 7 because all of these digits have this corner-like corner line feature. So what we're left with is that we are looking at the number 5, even though we didn't use the features of the digit 5 to classify it. This is the main point. So if there are some questions, please. All clear here? Awesome. OK, so now why would you do this? I mean, if you know what features are of the image 5, why would you go the other way around, OK? So the main reason, and going back to adversarial learning and occlusion, what happens if we have partial inputs? For example, a digit in our example is damaged somehow, or half of the pixels are missing or one part of the image is corrupted somehow or blurry or things like that. So the classifier that I'm going to show you that works with missing features works much better on these damaged images than the classic neural network classifier, OK? So how do we implement this? Well, mostly we went through already how you can implement it, but we are going to go more in-depth. So we are going to implement this with negating or inversing the output of the last convolutional layer, OK? So at that point in the network, we are getting all the features and their positions in an image. So if we inverse that, what features are there and what features are not there, we get what features are missing and where are they missing, OK? We'll go more into detail. So we have several steps. We will need to extract the features from the images somehow to know what images are there and what features are there and what features are not. During the neural network training, we take one digit in our example with the minist data set. We push it to the convolutional layers. We get what features are there in the image. And immediately we can deduct from that what features are not there in the image. OK, so the feature set is a finite set, OK? So it has a number of elements. And if you know what elements are there, you can immediately know what elements are not there. It's very easy. After we inverse or negate this vector, we'll talk about what we need to do. We can train the rest of the image normally. OK, so let's go through the steps. The first step to get the features, we can handcraft these features. You know, just draw them. But that's difficult, boring. Because when you change the data set, you have to do it all over again. What we can do and what my algorithm is doing is simply training the network normally for a number of epochs, let's say 10, and then just taking the weights from the convolutional layers. Very simple. So it's basically transfer learning, OK? So you're just taking a snapshot from your model and applying it to your new model, which you change somehow. It's automatic, so you don't have to do the boring stuff, and much faster and easier. OK, so for the step two, we need to talk a little bit about activation functions. If you are familiar with your networks, I'm guessing you are. So activation functions are what you apply to your activation neuron, which is the sum of the old weights and the biases. And we need to be careful here because we want to inverse or negate the output of the last convolutional layer. So we need to be aware of what activation function we are using in that layer. So the transformation of this feature, a positional vector that we are using, will depend largely on the last activation function in our last convolutional layer. So simple example, if we have a sigmoid function, sigmoid function outputs a number between 0 and 1. We can say 0 means the feature is not there, and the 1 means the feature is there, OK? So if we want to inverse or negate this vector, we simply apply this formula for each element in the vector. We just say 1 minus x. So if a feature was present and the value was 1, it will become 0 and vice versa. Very simple. And that's really nice, but it's 2019, and we shouldn't be using sigmoids in our neural network model anymore. So what if we want to use the popular choice, which is the rectified linear unit, RILU. You need to be aware, and I'm speaking from experience, because this was one of the, let's say, not bugs, but one of the gotchas in implementing this model. RILU activation function or its output is difficult to negate, because it can go to infinity. So you cannot just say 1 minus something, OK? It's very difficult to know where is the upper bound, OK? So there are some solutions. So if you're using PyTorch, there is, let's say, a hard upper limit version of RILU function. I don't know if you've heard about it. It's called RILU6. It basically goes to just not from 0 to infinity, but just from 0 to 6. So you can just say, OK, when I get this vector, I can just say 6 minus x. And then I will get the missing pictures. Don't ask me why it goes to 6. I'm not really sure why it goes to 6, why it goes to 6. I actually implemented in mind wherein it just goes to 1, but it works largely the same. You can use leaky RILU. You can use the new activation function switch. This could work based on looking at the graph how the function looks. I haven't tried it, so if you try it, let me know if it works. You can use the tangent, the hyperbolic tangent function, but beware also with it. It goes from minus 1 to 1, so the formula will be a little bit different. It won't be 1 minus x. It will be just minus x. So if it was minus 1, it will become 1. And if it was 1, it will become minus 1. It's very difficult to make activation functions exciting. But sorry, you have to bear with me. So let's see some code. So this is the negative learning network implemented in PyTorch. In PyTorch, it's very easy to make weird neural network models because you have full control over what happens. We basically write a function of the forward pass through the network. And now you can see in the first two lines, we have just some normal convolutional max pooling layers with some dropout, nothing too spectacular. In the third row, you see the x.view. We flatten the vector, so we have 320 features or 320 positional features because the positions are important. And then you simply, in your class that you implemented, actually in my class, I have this net type field. And if it's set to negative, I just negate the vector. If it's set to negative real, I do 1 minus x, which we talked about in the previous slide. OK, so the interesting thing about just going back for just one second, if you try to negate the real activation function with 1 minus x formula, even though it shouldn't work, it will work. Because, OK, so this was, as I said, one of the gotchas because it goes from 0 to infinity, OK? So if it's 0, it means that the feature is not there. So when you do 1 minus 0, it's 1, so the feature is there. But when you do something, some large number, I don't know, 1,000, so you do 1 minus 1,000, it's minus 999. And because I had some real activation functions after the convolutional layer, it just, because it really ignores all the negative values, it just worked. So you can't probably get away with running it. So this code will work, even though it shouldn't work. OK, so you can see if we are using the net type negative relu, we just use the function ones like x, which basically makes a tensor of the same dimension of x, with all ones in it, and we just add a negative of that vector, which is basically the same as doing 1 minus x, OK? And then the rest of the network is completely normal. OK, so we've covered two steps. So we got to the features, and we now know how to extract the missing features from an image, OK? So it's almost ready to be trained, but we have, let's say, a little issue. It's actually a big issue. But when we modified our forward path through the network with activating the negation part, we didn't want to do this, but we also affected the convolutional layers during training. OK, so remember, we have pre-trained convolutional layers, and now we are seeing some really weird patterns. And because convolutional layers are also learned as a part of training the neural network, they will get all the filters and it will get corrupted. OK, so it won't longer be the features from the digits data set or whatever data set we're using. The negation of the neural network will affect these convolutional layers in a very weird way. I don't have a visualization at hand, but it looked like junk, basically. It didn't look like features from the image anymore. Simple solution, you can freeze the convolutional layers. It's very simple in PyTorch. When you freeze them, they won't no longer be modified. During training, you just use them as they are. Optionally, we can also, the other layers, which are going to contain all the weights for the negative network, we can reset them. It's an optional step, but it helps with convergence. So if you don't reset them, the network will still achieve the same accuracy, but in a larger number of epochs. So it's just an optional thing, too, which helps a little bit. Here is the code. So the first two lines, we need to comment. You can see we are re-initializing the fully connected layers. The hidden doesn't mean I'm hiding something from you. It's, I think, 50. It's just a constant. Freeze convolutional layers part is also simple. You just go into your model through your convolutional layers. We have only two here, con1 and con2. And we just say the weight of it or all the features don't require gradation, so autograd won't mess with them. We need to re-initialize also another gotcha. Ask me how I know. Is you need to re-initialize the optimizer if you are changing your layers. So you have to, it will still attempt to modify them and draw an error if you don't do this step. OK, so we have completed our model. Now we need to test it out. And for testing, we introduced this. We called it PMNIST, or partial NIST data set, which is very simple. So in the original data set, NIST had 60,000 training samples. We didn't mess with these. And we also had 10,000, I think, validation samples. So we just extended it a little bit with these 10,000 validation set images. We introduced new 40,000 images in a very simple way just to test it out. We have this, you can see on the image on the bottom. So on the, all the way to the left is the complete validation set image. Then we have something that we are calling, I think this is, we call this vertical cut. So because vertically the 50% of the image is missing, then we have horizontal cut, which the left side of the image is missing. We have diagonal cut because we're running out of ideas. So you just cut some squares from the image. And we also have this, we call it triple cut because we just removed three small squares from the image. So very simple to make. Just some additional remark, wait time, okay. It would be probably easier to just train on partial samples, okay. So if you want a neural network, which can classify halves images, you just train your networks on half images. But we want to, this is just, let's say proof of concept. We want to emphasize that you are not going to always have these easy ways to get to the partial input sets. For example, think of traffic signs. So you want to classify traffic signs and what if a tree is in front of the traffic sign, okay. So you can also have, you only have a partial view of it. A human will see, okay, it's not red. So it's probably not a stop sign or things like that. And it will be immediately able to tell not to stop. So we're trying to mimic this. Another remark is that on the unmodified validation set, we still have top accuracy, let's say, great accuracy, just as much or even a little bit more than the traditional model. So this method doesn't break your network when the input is still one piece or whole. Okay, so it only helps with the partial inputs. It helps a little bit with the whole inputs. And PyTorch makes implementing weird models really a treat. If you're not using PyTorch, you should really try it. There is a talk, I think, about TensorFlow 2.0, which has some of the new dynamic functional features. Later on, I think after this one, I'm going to go to that one. Okay, okay, so some results. So when you train this model, it's not a very big network. It's basically just the example network as a baseline from the PyTorch repository. So not a huge network, I think it has four layers. So these are the results. So you can see we have our five validation data sets. We have our accuracies and in the column Delta, Delta is how our model improved upon the original model. Okay, so on the unmodified validation set, our model improved just a little bit, some 0.31%. But you can see on all of the other, let's say partial inputs, we have some improvements. And with a very simple modification, which you can do very easily. We can see that the vertical cut for some reason, it's very difficult to interpret what's going on. But for some reason, for the vertical cut, we are seeing 9% which is a big increase. We have 10,000 images, so 9.05% means that our new network can now classify 905 images more compared to the previous network. Okay, just briefly to go over the future work, we want to try this on the different, sorry. It's just on your numbers that 3 is actually quite good when you look at it. Yes, but it's not, this model is not state of the art model. Okay, so. I get it, but I mean it's a good improvement. Yeah, yeah, it's a good, it's, you know, but it's smaller than the rest, so. Yeah, but the improved. Yes, yes, of course, yes, that's the point. We already experimented with CIFAR data sets, so CIFAR 10 and CIFAR 100. We are seeing similar results there. We want to try different architectures. I can tell you now that deeper models which have higher level features, so more convolutional layers, yields with better results. So the higher the level of the features, the better results for negative classification we can get. We want to try under cellular networks. We already tried a little bit to play with deep, full neural network, which can, you know, modify the inputs based on the output of your neural network. We want to see how it affects our network, and you can find on this very easy to remember link, you can find the complete PyTorch implementation, which is basically the same implementation from the paper we sent. Okay, so five minutes early. Thank you so much for your attention. I hope you had fun, okay. Thank you, Nemanja. So, any questions? Don't be shy. So, yeah, that was a very interesting topic. Thank you very much. Yeah, one thing that kind of struck me is that it's quite similar to the idea of, you know, the 20 questions approach of figuring something out. Oh, yes, yes. You mean the game? Yeah, exactly, yeah. Is there any way to kind of try to use that methodology to kind of... That's a really good idea. Yes, because maybe they would like it. Yes, thank you. Yeah, that's a very good idea. We can try that, yes. That's the principle of deduction, okay. So it would probably work really well here if you can model it somehow, but thank you, yeah. That's a great suggestion. Okay, anyone else? Don't be shy. Okay, so then I guess, thank you again. Thank you.