 Welcome to our introduction to deep learning with Toybi. Deep learning also often called machine learning is a hype word which we hear here in the media all the time. It's nearly as bad as blockchain. It's a solution for everything. Today we'll get a sneak peek into the internals of this mystical black box they are talking about and Toybi would show us why people who know what machine learning really is about have to face palm so often when they read the news. So please welcome Toybi with a big round of applause. All right good morning and welcome to introduction to deep learning. The title will already tell you what this talk is about. I want to give you an introduction onto how deep learning works. What happens inside this black box but first of all who am I? I'm Toybi. It's a German nickname as it has nothing to do with toys or bees. You might have heard my voice before because I host the Nusschale podcast. There I explain scientific topics in under 10 minutes. I'll have to use a little more time today and you'll also have fancy animations which hopefully will help. In my day job I'm a research scientist at an institute for computer vision. I analyze microscopy images of bone marrow blood cells and try to find ways to teach the computer to understand what it sees namely to differentiate between certain cells or first of all find cells in an image which is a task that is more complex than it might sound like. Let me start with the introduction to deep learning. We all know how to code. We code in a very simple way. We have some input for our computer algorithm. Then we have an algorithm which says do this, do that, if this then that and in that way we generate some output. This is not how machine learning works. Machine learning assumes you have some input and you also have some output and what you also have is some statistical model. The statistical model is flexible. It has certain parameters which it can learn from the distribution of inputs and outputs you give it for training. So you basically learn the statistical model to generate the desired output from the given input. Let me give you a really simple example of how this might work. Let's say we have two animals or well we have two kinds of animals unicorns and rabbits and now we want to find an algorithm that tells us whether this animal we have right now as an input is a rabbit or a unicorn. We can write a simple algorithm to do that but we can also do it with machine learning. The first thing we need is some input. I choose two features that are able to tell me whether this animal is a rabbit or a unicorn namely speed and size. We call these features and they describe something about what we want to classify and the class is in this case our animal. First thing I need is some training data, some input. The input here are just pairs of speed and size. What I also need is information about the desired output. The desired output of course being the class so either unicorn or rabbit. Here denoted by yellow and red axis. So let's try to find a statistical model which we can use to separate this feature space into two halves one for the rabbits one for the unicorns. Looking at this we can actually find a really simple statistical model and our statistical model in this case is just a straight line and the learning process is then to find where in this feature space the line should be. Ideally for example here right in the middle between the two classes rabbit and unicorn. Of course this is an overly simplified example. Real world applications have feature distributions which look much more like this so we have a gradient we don't have a perfect separation between those two classes and those two classes are definitely not separable by a line. If we look again at some training samples training samples are the data points we use for the machine learning process so to try to find the parameters of our statistical model if we look at the line again then this will not be able to separate this training set well we will have a line that has some errors some unicorns which will be classified as rabbits some rabbits which will be classified as unicorns. This is what we call underfitting our model is just not able to express what we want it to learn. There's the opposite case the opposite case being we just learn all the training samples by heart this is if we have a very complex model and just a few training samples to teach the model what it should learn in this case we have a perfect separation of unicorns and rabbits at least for the few data points we have if we draw another example from the real world some other data points they will most likely be wrong and this is what we call overfitting the perfect scenario in this case would be something like this a classifier which is really close to the distribution we have in the real world and machine learning is task with finding this perfect model and its parameters. Let me show you a different kind of model something you probably all have heard about neural networks neural networks are inspired by the brain or more precisely by the neurons in our brain. Neurons are tiny objects tiny cells in our brain that take some input and generate some output sounds familiar right we have inputs usually in the form of electrical signals and if they are strong enough this neuron will also send out an electrical signal and this is something we can model in a computer engineering way so what we do is we take a neuron the neuron is just a simple mapping from input to output input here just three input nodes we denote them by i1 i2 and i3 and output denoted by o and now you will actually see some mathematical equations there are not many of this these in this foundation talk don't worry and it's really simple there's one more thing we need first though if we want to map input to output in the way a neuron does namely the weights the weights are just some arbitrary numbers for now let's call them w1 w2 and w3 so we take those weights and we multiply them with the input input one times weight one input two times weight two and so on and this this sum just will be our output well not quite we make it a little bit more complicated we also use something called an activation function the activation function is just a mapping from one scalar value to another scalar value in this case from what we got as an output the sum to something that more closely fits what we need this could for example be something binary where we have all the negative numbers being mapped to zero and all the positive numbers being mapped to one and then this zero and one can encode something for example rabbit or unicorn so let me give you an example of how we can make the previous example with the rabbits and unicorns work with such a simple neuron we just use speed size and the arbitrarily chosen number 10 as our inputs and the weights one one and minus one if we look at the equations then we get for all negative numbers so speed plus size being less than 10 a zero and one for all positive numbers being speed plus size larger than 10 greater than 10 this way we again have a separating line between unicorns and rabbits but again we have this really simplistic model we want to become more and more complicated in order to express more complex tasks so what do we do we take more neurons we take our three input values and put them into one neuron and into a second neuron and into a third neuron and we take the output of those three neurons as input for another neuron we also call this a multi-layer perceptron perceptron just being a different name for a neuron what we have there and the whole thing is also called a neural network so now the question how do we train this how do we learn what this network should encode well we want a mapping from input to output and what we can change are the weights first what we do is we take a training sample some input put it through the network get an output but this might not be the desired output which we know so in the binary case there are four possible cases computed output expected output each two values zero and one the best case would be we want to zero get a zero one to one or and get a one but there is also the opposite case in these two cases we can learn something about our model namely in which direction to change the weights it's a little bit simplified but in principle you just raise the weights if you need a higher number as output and you lower the weights if you need a lower number as output to tell you how much we have two terms first term being the error so in this case just the difference between desired and expected output also often called a loss function especially in deep learning and more complex applications you also have a second term we call the learning rate and the learning rate is what tells us how quickly we should change the weights how quickly we should adapt the weights okay this is how we learn a model this is almost everything you need to know there are mathematical equations that tell you how much to change based on the error and the learning function and this is the entire learning process let's get back to the terminology we have the input layer we have the output layer which somehow encodes our output either in one value or in several values if we have a multiple if we have multiple classes we also have the hidden layers which are actually what makes our model deep what we can change what we can learn is the other weights the parameters of this model but what we also need to keep in mind is the number of layers the number of neurons per layer the learning rate and the activation function these are called hyper parameters and they determine how complex our model is how well it is suited to solve the task at hand i quite often spoke about solving tasks so the question is what can we actually do with neural networks mostly classification tasks for example tell me is this animal a rabbit or a unicorn is this text message spam or legitimate is this patient healthy or ill is this image a picture of a cat or a dog we already saw for the animal that we need something called features which somehow encodes information about what we want to classify something we can use as input for the neural network some kind of number that is meaningful so for the animal it could be speed size or something like color color of course being more complex again because we have for example rgb so three values and text message being a more complex case again because we somehow need to encode the sender and whether the sender is legitimate same for the recipient or the number of hyperlinks or where the hyperlinks refer to or the whether there are certain words present in the text it gets more and more complicated even more so for a patient how do we encode medical history in a proper way for the network to learn i mean temperature is simple it's a scalar value we just have a number but how do we encode whether certain symptoms are present and the image which is actually what i work with every day is again quite complex we have values we have numbers but only pixel values which make it difficult which are difficult to use as input for neural network why i'll show you i'll actually show you with this picture it's a very famous picture and everybody uses it in computer vision they will tell you it's because there is a multitude of different characteristics in this image shapes edges whatever you desire the truth is it's a crop from the centerfold of the playboy and in earlier years the computer vision engineers was a mostly male audience anyway let's take five by five pixels let's assume this is a five by five pixels a really small image if we take those 25 pixels and use them as input for on your network you already see that we have many connections many weights which means a very complex model complex model of course prone to overfitting but there are more problems first being we have disconnected the pixels from its name a pixel from its neighbors we can't encode information about the neighborhood anymore and that really sucks if we just take the whole picture and move it to the left or through the right by just one pixel the network will see something completely different even though to us it is exactly the same but we can solve that with some very clever engineering something we call a convolutional layer it is again a hidden layer in a neural network but it does something special it actually is a very simple neuron again just four input values one output value but the four input values look at two by two pixels and encode one output value and then the same network is shifted to the right and codes another pixel and another pixel and the next row of pixels and in this way creates another 2d image we have preserved information about the neighborhood and we just have a very low number of weights not the huge number of parameters we saw earlier we can use this once or twice or several hundred times and this is actually where we go deep deep means we have several layers and having layers that don't need thousands or millions of connections but only a few this is what allows us to go really deep and in this fashion we can encode an entire image in just a few meaningful values how these values look like and what they encode this is learned through the learning process and we can then for example use these few values as input for a classification network the fully connected network we saw earlier or we can do something more cleverer we can do the inverse operation and create an image again for example the same image which is then called an auto encoder auto encoders are tremendously useful even though they don't appear that way for example imagine you want to check whether something has a defect or not a picture of a fabric or of something you just train the network with normal pictures and then if you have a defect picture the network is not able to reproduce this defect and so the difference of the reproduced picture and the real picture will show you where errors are if it works properly I have to admit that but we can go even further let's say we want to encode something entirely else or well let's encode the image the information in the image but in another representation for example let's say we have three classes again the background class in gray a class called hat or headwear in blue and person in green we can also use this for other applications than just for pictures of humans for example we have a picture of a street and want to encode where is the car where is the pedestrian tremendously useful or we have an MRI scan of a brain where in the brain is the tumor can we somehow learn this yes we can do this with methods like these if they're trained properly more about that later well we expect something like this to come out but the truth looks rather like this especially if it's not properly trained we have not the real shape we want to get but something distorted so here is again where we need to do learning first we take a picture put it through the network get our output representation and we have the information about how we want it to look we again compute some kind of loss value this time for example being the overlap between the shape we get out of the model and the shape we want to have and we use this error this loss function to update the weights of our network again even though it's more complicated here even though we have more layers and even though the layers look slightly different it is the same process all over again as with a binary case and we need lots of training data this is something that you'll hear often in connection with deep learning you need lots of training data to make this work images are complex things and in order to meaningful extract knowledge from them the network needs to see a multitude of different images well now i already showed you some things we use in network architecture some subnetworks the fully convolutional encoder which takes an image and produces a few meaningful values out of this image its counterpart the fully convolutional decoder fully convolutional meaning by the way that we only have these convolutional layers with a few parameters that somehow encode spatial information and keep it for the next layers the decoder takes a few meaningful numbers and reproduces an image either the same image or another representation of the information encoded in the image we also already saw the fully connected network fully connected meaning every neuron is connected to every neuron in the next layer this of course can be dangerous because this is where we actually get most of our parameters if we have a fully connected network this is where the most parameters will be present because connecting every node to every node this is just a high number of connections we can also do other things for example something called a pooling layer a pooling layer being basically the same as one of those convolutional layers just that we don't have parameters we need to learn this works without parameters because this neuron just chooses whichever value is the highest and takes that value as output this is really great for reducing the size of your image and also getting rid of information that might not be that important we can also do some clever techniques like adding a dropout layer a dropout layer just being a normal layer in a neural network where we remove some connections in one training step these connections and the next training step some other connections this way we teach teach the other connections to become more resilient against errors I would like to start with something I call the model show now and show you some models and how we train those models and I will start with a fully convolutional decoder we saw earlier this thing that takes a number and creates a picture I would like to take this model put in some number and get out a picture a picture of a horse for example if I put in a different number I also want to get a picture of a horse but of a different horse so what I want to get is a mapping from some numbers some features that encode something about the horse picture and get a horse picture out of it you might see already why this is problematic it is problematic because we don't have a mapping from feature to horse or from horse to features so we don't have a truth value we can use to learn how to generate this mapping well computer vision engineers or deep learning professionals they're smart and have clever ideas let's just assume we have such a network and let's call it a generator let's take some numbers put them into the generator and get some horses well it doesn't work yet we still have to train it so they're probably not only horses but also some very special unicorns among the horses which might be nice for other applications but I wanted pictures of horses right now so I can't train with this data directly but what I can do is I can create a second network this network is called a discriminator and I can give it the input generated from the generator as well as the real data I have the real horse pictures and then I can teach the discriminator to distinguish between those tell me it is a real horse or it's not a real horse and there I know what is the truth because I either take real horse pictures or fake horse pictures from the generator so I have a truth value for this discriminator but in doing this I also have a truth value for the generator because I want the generator to work against the discriminator so I can also use the information how well the discriminator does to train the generator to become better in fooling this is called a generative adversarial network and it can be used to generate pictures of an arbitrary distribution let's do this with numbers and I will actually show you the training process before I start the video I'll tell you what I did I took some handwritten digits there is a database called MNIST of handwritten digits so the numbers of 0 to 9 and I took those and used them as training data I trained a generator in the way I showed you on the previous slide and then I just took some random numbers I put those random numbers into the network and just stored the image of what came out of the network and here in the video you'll see how the network improved with ongoing training you will see that we start basically with just noisy images and then after some what we call epochs so training iterations the network is able to almost perfectly generate handwritten digits just from noise which I find truly fascinating of course this is an example where it works it highly depends on your dataset and how you train the model whether it is a success or not but if it works you can use it to generate fonts you can generate characters 3d objects pictures of animals whatever you want as long as you have training data let's go more crazy let's take two of those and let's say we have pictures of horses and pictures of zebras I want to convert those pictures of horses into pictures of zebras and I want to convert pictures of zebras into pictures of horses so I want to have the same picture just with the other animal but I don't have training data of the same situation just once with a horse and once with a zebra doesn't matter we can train a network that does that for us again we just have a network we call it the generator and we have two of those one that converts horses to zebras and one that converts zebras to horses and then we also have two discriminators those that tell us real horse fake horse real zebra fake zebra and then we again need to perform some training so we need to somehow encode did it work what we wanted to do and a very simple way to do this is we take a picture of a horse put it through the generator that generates a zebra take this fake picture of a zebra put it through the generator that generates a picture of a horse and if this is the same picture as we put in then our model worked and if it didn't we can use that information to update the weights I just took a random picture from a free library in the internet of a horse and generated a zebra and it worked remarkably well I actually didn't even do training it also doesn't need to be a picture you can also convert text to images you describe something in words and generate images you can age your face or age a cell or make a patient healthy or sick or the image of a patient not the patient self unfortunately you can do style transfer like take a picture of vanguard and apply it to your own picture stuff like that something else that we can do with neural networks let's assume we have a classification network we have a picture of a toothbrush and the network tells us well this is a toothbrush great but how resilient is this network does it really work in every scenario there's a second network we can apply we call it an adversarial network and that network is trained to do one thing look at the network look at the picture and then find the one weak spot in the picture just change one pixel slightly so that the network will tell me this toothbrush is an octopus works remarkably well also works with just changing the picture slightly so changing all the pixels but just slight minute changes that we don't perceive but the network the classification network is completely thrown off well sounds bad is bad if you don't consider it but you can also for example use this for training your network and make your network resilient so there's always an upside and downside something entirely else now I'd like to show you something about text a word language model I want to generate sentences for my podcast I have a network that gives me a word and then if I want to somehow get the next work in a sentence I also need to consider this work so another network architecture quite interestingly just takes the hidden states of the network and uses them as the input for the same network so that in the next iteration we still know what we did in the previous step I tried to train a network that generates podcast episodes for my podcasts didn't work what I learned is I don't have enough training data I really need to produce more podcast episodes in order to train a model to yeah do my job for me and this is very important a very crucial point training data we need shitloads of training data and actually the more complicated our model and our training process becomes the more training data we need I started with a supervised case the really simple case where we really simple the really simple case where we have a picture and a label that corresponds to that picture or a representation of that picture showing entirely what I wanted to learn but we also saw a more complex task where I had two pictures horses and zebras that are from two different domains but domains but with no direct mapping what can also happen and actually happens quite a lot is weak weakly annotated data so data that is not precisely annotated where we can't rely on the information we get or even more complicated something called reinforcement learning where we perform a sequence of actions and then in the end I told you that was great which is often not enough information to really perform proper training but of course there are also methods for that as well as there are methods for the unsupervised case where we don't have annotations label data no ground truth at all just the picture itself well I talked about pictures I told you that we can learn features and create images from then and we can use them for classification and for this there exist many databases there are public data sets we can use often they refer to for example flicker they're just hyperlinks which is also why I didn't show you many pictures right here because I'm honestly not sure about the copyright in those cases but there are also challenge data sets where you can just sign up get some for example medical data sets and then compete against other researchers and of course there are those companies that just have lots of data and those companies also have the means the capacity to perform intense computations and those are also often the companies you hear from in terms of innovation for deep learning well this was mostly to tell you that you can process images quite well with deep learning if you have enough training data if you have a proper training process and also a little if you know what you're doing but you can also process text you can process audio and time series like prices or stack exchange stuff like that you can process almost everything if you make it encodable turn your network sounds like a dream come true but as I already told you you need data a lot of it I told you about those companies that have lots of data sets and the publicly available data sets which you can actually use to get started with your own experiments but that also makes it a little dangerous because deep learning still is a black box to us I told you what happens inside the black box on a level that teaches you how we learn and how the network is structured but not really what the network learned it is for us computer vision engineers really nice that we can visualize the first layers of a neural network and see what is actually encoded in those first layers what information the network looks at but you can't really mathematically prove what happens in a network which is one major downside and so if you want to use it the numbers may be really great but be sure to properly evaluate them in summary I call that easy to learn everyone every single one of you can just start with deep learning right away you don't need to do much work you don't need to do much learning the model learns for you but they're hard to master in a way that makes them useful for production use cases for example so if you want to use deep learning for something if you really want to seriously use it make sure that it really does what you wanted to and doesn't learn something else which also happens pretty sure you saw some talks about deep learning fails which is not what this talk is about they're quite funny to look at just make sure that they don't happen to you if you do that though you'll achieve great things with deep learning I'm sure and that was introduction to deep learning thank you so now it's question and answer time so if you have a question please line up at the mics we have in total eight so it shouldn't be far from you they are here in the corridors and on the sides please line up for everybody a question consists of one sentence with the question mark in the end not three minutes of rambling and also if you go to the microphone speak into the microphone so really get close to it okay um where do we have uh number seven we start with mic number seven hello my question is how did you compute the example for the funds the numbers I didn't really understand it uh usual said it was made from white noise yeah I'll give you really brief recap of what I did uh I showed you that we have a model that maps image to some meaningful values that an image can be encoded in just a few values what happens here is exactly the other way around we have some values just some arbitrary values we actually know nothing about we can generate pictures out of those so I trained this model to just take some random values and show the pictures generated from the model the training process was this uh uh min max game as it's called we have two networks that try to compete against each other one network trying to distinguish whether a picture it sees is real or one of those fake pictures and the network that actually generates those pictures and in training the network that is able to distinguish between those we can also get information for the training of the network that generates the pictures so the videos you saw were just animations of what happens during this training process at first if we input noise we get noise but as the network is able to better and better recreate those images from the dataset we used as input in this case pictures of handwritten digits the output also became more lookalike to those numbers these handwritten digits I hope that helped so now we go to the internet can we get sound for the signal angel please sounded so great now we go to the internet yeah it sounds like yeah now we're finally ready to go to the interwebs Schorsch was asking do you have any recommendations for a beginner regarding the framework of the software I of course am very biased to recommend what I use every day but I also think that it is a great start basically use python and use py torch many people will disagree with me and tell you TensorFlow is better it might be in my opinion not for getting started and there are also some nice tutorials on the py torch website what you can also do is look at websites like open ai where they have a gym to get you started with some training exercises where you already have data sets yeah basically my recommendation is get used to python and start with the pytorch tutorial see where to go from there often there are also some github repositories linked with many examples for already established network architectures like the cycle GAN or the GAN itself or basically everything else there will be a repo you can use to get started okay we stay with the internet there are some more questions I heard yes um rubin eight is asking have you ever come across an example of a neural network that deals with audio instead of images me personally no at least not directly I've heard about examples like where you can change the voice to sound like another person but there's not much I can reliably tell about that my expertise really is in image processing I'm sorry and I think we have time for one more question we have one at number eight microphone number eight is the current faster recognition technologies in for example iphone x is it also a deep learning algorithms or is it something more simple do you have any idea about that as far as I know yes that's all I can reliably tell you about that but it is not only based on images but also uses other information I think distance information encoded with some infrared signals I don't really know exactly how it works but at least iPhones already have a neural processing neural network processing engine built in so a chip dedicated to just doing those computations you saw that many of those things can be parallelized and this is what those hardware architectures make use of so I'm pretty confident in saying yes they also do it there how exactly no clue okay I myself have a last completely unrelated questions did you create the design of the slides yourself I had some help we have a really great congress design and I used that as an inspiration to create those slides yes okay yeah because that's a really amazing I love them thank you okay thank you very much toiby