 Gut, wir kommen zum nächsten Talk. Ich hatte von einer Freundin erzählt von dem Talk, den ich erzähle, und sagte Machine Learning und Deep Learning und TensorFlow. Und sie guckte mich entsetzt an und meinte, oh shit. Ich muss gestehen, ich habe keine Ahnung davon. Ich bin froh, dass ich die vier Grundrechen atmen kann. Es gibt eine Menschen, die Deep Learning, Machine Learning und TensorFlow die Bibliothek beherrschen. Und damit auch umgehen kann. Und der erzählt uns, wie man damit sauber programmiert, wie man Dinge richtig gestaltet und richtig macht, damit es auch zu einem Erfolg wird. Michael studiert am KET und macht gerade seinen Master in dieser Thematik, in Informatik. Und er möchte uns darüber jetzt etwas erzählen. Es wird anschließend noch ein bisschen Q&A geben. Dieser Talk wird auf Englisch sein. Er versteht jemand kein Englisch? Okay. So, ich repeat in Englisch. So, dieser Talk wird über Machine Learning sein. Ich habe eine Freundin erzählt, vor einigen Minuten, über diesen Talk, die Machine Learning und Deep Learning und TensorFlow. Und sie sagte, oh shit. Ich habe keine Ahnung davon, aber Michael hat eine Ahnung. Also, er studiert Informatik im KIT in Karlsruhe. Und er macht seinen Master. Und er erzählt uns jetzt, wie man diese Themen zu verhandeln und einen schönen Code zu schreiben. Also, bitte. Applaus für Michael, Applaus für Michael. Danke. Also, wie schon eingeladen wurde, werde ich über gute Patterns in Deep Learning mit TensorFlow sprechen. Und im short-Disclaimer herab, dieser Talk wird nicht über Deep Learning und die Theorie behind, sondern nur die Implementation und wie man ein gutes Code schreiben sollte. Also, ich möchte mich ein bisschen motivieren, warum das ein wichtiges Thema ist. Und für das, natürlich, muss man starten mit Beispielen, wo man Deep Learning benutzen kann. Und das erste, wahrscheinlich das meiste, famouse Beispiel ist called MNIST, das betrifft, dass man humanisch betrachtet wird. Und so, dieser Dataset war schon in den 80er-Jahren eingeladen. Und es ist jetzt, dass man die Postal-Office in der Welt benutzt, um die Postal-Zip-Kode zu erkennen. Und das ist ein Beispiel, wie man Machine Learning benutzen kann. Und ein weiterer Beispiel, das ist mehr recent, und es ist nur in diesen Jahren, ist für Autonomous Vehicles. Sie müssen die Verwaltungen perceben, und das kann durch die Computervision gemacht werden. Und in diesem Beispiel, das Kitty Road Dataset, ist es um das Detektion, in diesem Bild, das du siehst, in Red, wo es eine nicht-driveable Region ist, und in Purple, wo die Vehicle kann fahren. Und es gibt auch noch einen Task in diesem Bild, nicht nur die Segmentation, wo man fahren kann, und wo man nicht fahren kann, aber es gibt auch die Bounding Boxen. Und so, dieser Task ist ein sehr schwieriger Task, und man hat viele verschiedene Dinge zu tun. Und so, man braucht eine gute Code-Struktur, um diese Probleme zu lösen, und wir müssen solche Probleme in der Zukunft lösen, vielmehr als wir sie schon machen. Und so, ich gebe diesen Gespräch, um euch bereit für die nächsten Probleme zu machen. Und hier ist es eigentlich ein Dataset, das ich selbst begonnen habe. Und es ist für eine Navigation für blinden Menschen. Denn blinden Menschen in Parks, haben sehr schwierige Probleme, mit der man die Werte erzielen kann, weil mit der Strecke, die Werte sind sehr schwierig zu distinguieren aus dem Gras von außen. Und so, das ist ein möglicher Task, das man mit Maschinen- und Deep-Learning und Computer-Visionen lösen kann. Und es gibt viele Taschen, wie die, die da draußen sind, für dich zu lösen. Und dann, finally, ein weiter Beispiel, ein bisschen mehr für unsere Deutschen. Es ist um die Person-Identifikation. Und das ist v.a. wichtig für die Sicherheitsprobleme. In diesem Fall auf der linken Seite, wir haben unsere ANG. Und wir wollen wissen, ob die Person auf der linken Seite auch ANG ist oder ob sie eine andere Person ist. Und so, das ist auch ein Task, das ist wirklich wichtig für die Sicherheitssektor, um reliably identifizierte Personen zu identifizieren. Zum Beispiel, wenn du dich identifizieren willst, musst du mit einer hohen Erkürzung sagen, wer ein Terrorist ist und wer nicht der selbe Person ist als ein Terrorist. Und so, wenn du dich um viele Forschung fährst, und ich komme aus einer Forschungsbewegung, und das Problem ist, wenn du dich um eine Forschung fährst, gibt es einen Spaghetti-Monster. Und ich sage nicht, dass ein fliegendes Ding mit den Spaghetti-Tentacles ist, sondern dass es ein because you have to understand so much code just to change something. And so what I want to tell you now is how you can avoid that spaghetti monster with your code. Because quite honestly I'm a bit selfish. I want to work with your code, but I don't want to read your code if it's like the code in this example. So I hope you will learn something in this talk. And then there's another reason why you should care. And there's this quote by Steve Mcconnell, that there are 15 to 50 errors per thousand lines of delivered code. And well, you can do the math for this example. And we actually found some bugs in that code. So yeah, you should reduce your amount of lines of code. So okay, then the question is, let's solve that problem. But how do we ... And I will tell you in this talk some ideas. Those ideas are not the gold standard yet. There is no gold standard. We have to evolve a gold standard. And so I'm just telling you ideas. And I hope you can evolve those ideas and take them into your own code and make your code better. And so yeah, let's tell you. So first, how does deep learning in a nutshell work? Because our code should reflect the actual thing we are solving to be understandable. And so with deep learning at the beginning we have a problem. I used Angie in this case. We want to detect if that is Angie or not. And then when we have a problem, we want to develop a model. That model is our problem. In the case of deep learning, the most popular model nowadays is a CNN. A convolutional neural network. I'm not going to explain how it works. You can Google that later if you don't. And that CNN basically what it does, it takes in an image. And then the output is a prediction. In this case the prediction is it's Angie with 99% accuracy. Or it's 99% sure that this image shows Merkel. And while in this example it's correct, maybe if we feed in another image it would give a wrong result. And we don't know that. So we have to check the quality of our model somehow. And this is not the deep learning terms yet. This is just the intuitive way of seeing the problem. And for checking the quality usually we want some error measure. And with that error measure, then later we want to somehow improve our model to be better because we want to reduce that error. And as it turns out, this intuitive view on the data is the same as actually, if you're looking at it with the terms of machine learning, the first one is defined by your data. Your data basically tell you what your problem is. And so that's the first stage in the case of detecting Angie. That would be images of Angie and images of someone else. And then the label, if that image is Angie or if it's someone else. And so then the model, it's still the model, it's a CNN. The error or the quality measurement is called a loss of learning. And then the step is called the optimizer. And so what you can see here actually is that if we look at this, we can see already that the problem is separated in four different categories. We have the data category, the model category, the loss and the optimizer. And so why do you put all that together in the file when you're coding? Why don't you put it into four different files? And so my first design pattern I propose and it's really simple to actually implement, is not to write that everything in one file but split it among four files. One file which handles the data loading and management. The next file which actually then handles the model implementation itself. So that's just the neural network in this case of deep learning. And the next file would be the loss. And what you come and see if you look on GitHub, that people mix up the model and the loss especially. And that code tends to get unreadable. And then the last step would be the optimizer, which we can also separate. And so that basically should help you to get rid of the spaghetti monster to some extent. But then once you started doing that, you're getting quicker in development, your code gets better readable and you can start evolving beyond that. And then you will run into the problem that you're really quickly developing new experiments on your data, new model types, new loss types. And then you're running all those experiments with different configurations and you start to get your losing track of your experiments. You don't know what experiment was wrong with what. And so the next pattern is actually using a hyperparameter file. And in this hyperparameter file, you put together all your hyper information about the network. In this case, this is for MNIST. I put in there, you can see the learning rate on the top. Then you can see there's some other parameters for the optimizer, like decay and more stuff. I put there where my stuff gets saved in which directories. And so later, when I'm looking at my results of the experiment, I can put that parameter file next to the results. And when I've done like two weeks of experiments, then I can look at the experiment which got the best results. I don't remember what the code for that result was, but I can look in the hyperparameter file, which tells me what configuration I ran. And so, while this is a bit, well, you can think of it's unnecessary, but I can tell you from experience that you really should do this. And the other hand is if someone else wants to try your code and tinker with your code, then he can look into the hyperparameters file and really quickly change like the data input directory or where the checkpoints are stored and he doesn't have to look through all your code and find, okay, where does he save his stuff on slash magic slash stuff with deep learning. And so it's far easier for someone else who sees your code on GitHub to use it. And so then another pattern I want to propose is functional programming. Many of you might don't like functional programming, but I think in the case of deep learning, it actually fits the mental model. Because I don't know who of you has heard a lecture on deep learning, but the first thing they will tell you is that a neural network is a universal function approximator. Already the definition of a neural network is function. So why do you use object-oriented code when it's a function and you can compose functions? And so then you use a new way about your neural network and your code better matches your thought. Because you can think of like I've shown on the bottom. If you have a detector network, it's composed not of one single network, but actually it's different functions. On the first layer you can have, or on the first function you apply, could be a feature encoder. In this case I chose VGG. And what that feature encoder basically does, it takes in the image and then it turns that image into some set of features and then you can apply the next function and that function transforms those features into the labels. And this sounds a bit like a traditional computing pipeline for computer vision. But actually all you do is make your code a bit more structured so that in your code you can understand which layer of your network is responsible for what to do. And the thing here is that if you, in this case I chose the encoder, but funnily I can swap that out for the Google Inception V3 encoder or something else. And if I have this functional approach, all I have to do is just swap out one function and I can concatenate all those functions together. And so coming from the background that a neural network is actually a function approximator, if your code reflects that function you start to think that way. This might be a bit overwhelming at the first to understand and grasp, but if you start writing your code in a functional way you will see how it changes your thought pattern compared to object oriented programming in this case for the model. And the same is true for the loss. Your loss is basically also a function and your loss just transforms the label that you are getting in. The label is the ground truth and the prediction of what your network said. In the case of ng that would be the 99% and the true for its actually ng. And what this function does it transforms those two inputs into an output which just tells me I'm off by 001%. And you can also stick together if you're getting to more complex stuff, more and more loss functions. And so this might sound a bit scary but we'll see some code in the soon. So let's get back to our first design pattern and actually look at how we can do that stuff. Because to now it's just like ideas and I thought well at a hackerspace I need to show you actually how to do this stuff. And for the data stage the data stage usually you have some data set and your data set consists out of some features. In this case on the left hand you see the images, the raw images and some labels for those images which are basically the ground truth that you want to as a result output for your network. And your data set only ever consists out of features and labels. So there is no other configuration of your data set if you think of deep learning. There is just features and labels. So you can write your code in a way that it supports inputting arbitrary features and accepting arbitrary labels and then you have general code that loads your data set and you don't have to change it for every single data set that you want to feed into your model and that saves you a lot of time and it will make your code more readable. So in this case if you have some kind of loading the data set then you will understand every other way of loading the data set because it's the same basically. And so as I told you they are the same basically. So what you can do then is you can go ahead and create a data lake or some call it the data store and that's basically a central place where all your data that you have in your company put all that data together in a central place with a unified API like I told you it would be something that outputs a feature and a label and you can gather all the data you have in that way and then if you want to do explore some deep learning you can just take that data out of the data lake without the need of complex data transformations and if you look at the existing data sets in the private person who wants to tinker around there are also data lakes for example in the Keras library that you can use to load some data from other people over unified access and then if you have loaded your data this is one important step that most people forget and it's so important it's not really a pattern but I want to point it out in this example have a look at your data is your data making sense do you have enough data is your data balanced or unbalanced for example if I only have images of Angie and then I have one image of let's say Obama and I want to try to learn what is Angie and what is Obama that's almost impossible because I only have one example that makes sense and the other thing that you can detect if you inspect your data is if you have any bugs in your code because sometimes it happens that your image loading is bugged somehow or that you loaded an image flipped or the colors are not in the correct order and if you inspect your data you will see those errors so don't skip that step and it might take some time but it's worth it down this line if you have a data set or a data lake if you want to implement your own because I guess most of you want to tinker some and don't want to use a pre-existing one and so one way I found it it's very convenient and easy to use is that you have a function that loads your data set and this function outputs you a generator and the parameters for that generator and I don't know how familiar you are with Python but I guess generators are a concept that's in many programming languages but there's one thing that's special to Python Python has a weird way of multiprocessing and TensorFlow is a library for Python so we have to deal with that weird way of multiprocessing and as it turns out one of the most efficient ways of multiprocessing is using pooling and for those poolings you can have real multi-threading otherwise it's a single threat and to not run into the scary word you can Google that if you don't know it the global interpreter log you have to make sure that your functions are not relying on one shared object so the way I found that works best in Python is having a generator a set of parameters that is just a dictionary or something that can be easily copied and then having that parameters generate a threat separately and not sharing any data between them and your generator of course because it's multi-thread needs some sort of stride offset and probably it's useful for you to have a function that you can call an infinite time amounts so yeah and what can you do with that generator how do you plug that generator now into your training because TensorFlow has an API which only either accepts one generator or which accepts a completely different time of data and the completely different type of data is TF Records TF Records is basically some database and this is very specific to TensorFlow TF Records is some kind of a database which is designed to be readable very fast and the data can be almost directly transferred to the GPU with the least amount of delay because in modern systems the GPUs are so fast that the bottleneck if you don't write your code carefully will be getting the data from the memory from the hard disk to your GPU and TF Records is a data format by TensorFlow which is optimized for speed so that this bottleneck doesn't appear in your code and there are simpler ways to get your data to the GPU if you're really serious about deep learning I recommend you invest the time into understanding how TF Records work and use them and what I did because I don't want to write that same boilerplate code all over again I wrote myself since I have a unified API with the generator and the parameters a function that's called writeData and this function does write TF Records they're stored on the hard disk and then later I have a function that I call readData which just opens those TF Records and then outputs me a tensor so I only had to write that write and readData once because I realized that there's only the label and feature tensor that you should care about and I can reuse it for every project and the implementation is agnostic to how my label tensor and my feature tensor actually look and so that's also something I would recommend you either use the implementation I found or find your own implementation that's agnostic to your dataset so you can use it for any dataset the same implementation and this will also reduce you a lot of headache because writing and reading those TF Records is not as straight forward as you might think and you will have weird bugs if your implementation with writing and some of these bugs might only appear after you have already trained 5 hours and you don't want to have a bug that only appears 5 hours after you started your work and then you have to start over all again and that just is well annoying so yeah and just recently maybe some of you know there's Google I.O. currently in the USA so it's sort of a difficult spot to give this talk because they are just now announcing a lot of new stuff for TensorFlow and there turns out to be a new guy TF dataset which actually also supports loading those TF Records in a very efficient and fast way and I'm not yet sure how useful that guy that method is going to be but maybe you should keep it on your radar in the future because it's so standard for loading data into TensorFlow and so I just wanted to point it out here and it basically will replace the read data part so now we have loaded our data we have it in some tensors and tensors are basically a variable in TensorFlow on the GPU and what we now can do is we can actually define our model in the image here you see one of the most famous models of the modern age and as you can see that model is quite complicated only if you look at the graph already and now if you try to write some code that's well not really straightforward and then if you imagine down here there's some loss this is the loss of the network in this white box here is another loss and on the right hand you have three losses in those network and if you now don't do the splitting pattern I explained you at the beginning then you will have a super difficult to read code because you will have well somewhere here in the middle a loss then you will have somewhere in the upper end of your network a loss and at the end and if you want to change your loss to fit your own problem then you maybe find the loss at the end but you don't find the two losses and then you wonder why nothing works for you and that's very disappointing so in the model you should just write your model so everything in here that's colored basically and for your model you should look at reusability and that's the part that's also addressed with the functional programming style because if you put every part of your model as one purpose put it in a different function so that if you want to compose a new model that's built off parts that you have already written for other models like for example you want to have a Merkel classifier at the back and you know how to write that and now you don't want to classify Merkel but you want to classify Trump then what you can do is use that same code and use it in another place for your different network so that you don't have to rewrite all the code and so keep an eye on reusability and put everything into separate functions into as small steps as possible and the next thing that's a bit more difficult maybe to explain why that would be important is making your model multi instantiational usually you could think you have one model on your GPU and you use that model and you train it but if you just train your model you won't ever know how good your model is because if you show the model some data and then later ask it what it thinks of that data it's like if you're learning for an for an Exem and you're just keeping going over the same old Exems and then you realize oh well I'm perfect at doing those Exems but actually you just learn them by heart and so to avoid that with the deep learning you need one model that's learning and then you need a copy of that model to be executed on data that the learning model has never seen to make sure that it's like a test situation when you're writing an Exem for example you're seeing data that wasn't in the learning you're having new tasks to solve and we want to test that generalization so you need to be able to multi instantiate your network and there's a simple function call in TensorFlow and that's called reuse weights and all this does is it if your weights are called the same if you call your two networks the same then they will share the weights across them and so you can instantiate your model one time then you can instantiate your model a second time and you could even do that a third, a fourth or even more times and then one last thing if you remember the 2800 lines of code at the beginning make your code simple write simple code, don't write too complex code and also think of your network not being your network if you publish it somewhere on the internet for example on github but think of it make it easy for others to wrap your network to use your network give them API to use it for example if you have a special encoder just put that encoder in an extra function so that they can use that module separately because we are software engineers basically and we want to engineer solutions for problems and so write simple code and wrap it into other networks to reduce bugs and now finally I'm going to show you some code on how to do that I won't say that this is the gold standard but that's the way I found most useful and it works for me and so I basically have a function that says create model it has some input at the top the input tensor that's basically what you give in to your model so you don't have a mode that's basically telling the model if it's in training mode or evaluation mode because some layers might be different for those and then you have your hyperparameters from the hyperparameters file which configure your network so you don't have a constants in your code but you actually use your hyperparameters to trigger your if statements in your model and then there's some technical bits so I'm not going to go into too much detail why you use it but just use it because if you're inspecting your network in a tensor board and you want to see the structure of your network you can then there search for this variable scope in this example MNIST network and you can easily find again your network because if you're having large networks or even medium sized networks it tends to get messy in that view and then the next thing the scope reuse variables basically tells you if I'm not in the training mode but actually in the evaluation mode use the variables of the training network and then at the bottom you want to define your network in this case I left a blank here because that would be a little bit more code and basically what you then want to do is that I found the easiest way to make your model reusable which contains every important layer of your network and one important layer is your logits that's the layer that's before the softmax layer then of course the softmax and it's the probabilities that your network outputs but also if you have a feature encoder in your network it might be the feature encoder that's an important layer that you want to return in that dictionary and what this enables is if someone else is using your code he can just call the create model method and then he can print that dictionary on the console and dig into it and look what tensors do I have what shapes do they have do they work for my problem or not and so this is a very easy interface I think everybody can implement dictionaries it's nothing too difficult or scary and so I think this is a simple way of doing it and you don't need complex solutions and also you have to think of we want to write kind of functional code so we want to output every side effect that we have and the model is the side effect of the create model function and so then on this slide is what was on the previous slide the to do it's actually quite readable but you don't have to read that the code is on github if you're interested in the details but basically this is the complete implementation of a network and the most important thing on this slide is that if you can you should use the tf.layers API the tf.layers API is an API that's well tested and you have lots of lots of predefined layers that you can just stick into your network for example convolution layers dense layers, max pooling layers and you don't have to reimplement them with additions and multiplications from scratch and it's well tested and much used library so then we have defined our model we have our create model function and that model was the code from the previous slide and now the next step would be looking at the loss at evaluating how good our model actually is and the loss function you see on this slide is the loss function from the previous model and if somebody of you has listened to a lecture on deep learning this is the kind of an optimal curve how it should look it should go down at the beginning very very fast and then it should stay low and as you can see here our model starts to converge at about a thousand iterations already and then later on it stays low no matter if I'm feeding it the training data or the validation data and the validation data is never seen except for the validation so it's not in the training process so in this case basically if you have such a loss function your model is well prepared and has generalized in a good way what you shouldn't see is that your loss function your validation loss is going up here at the end and that's the typical case of overfitting you looked at too many examples and then your network just knows all the exams that it has already written but doesn't know how to write a new exam so it doesn't really learn how to add 1 plus 1 but it learned that 1 plus 1 is 2 and if you give it 1 plus 2 it wouldn't know what the solution is and so how do you engineer that loss function and that loss function is very important for your network and you have to engineer it in a way that your network will converge in a good way and what it basically is it's the clue between your model and the dataset because in the model definition we didn't have any knowledge about our dataset except that we have an input tensor and that input tensor was the image but we don't know anything about the labels yet and so what the loss does it glues together the predictions of your model with the labels of your dataset and those are the 2 inputs that your loss function gets and it doesn't get any more inputs than the parameters and it contributes a lot to the quality of your model in the case of MNIST we will see that the loss function is quite simple but if you are training more complex tasks like for the case of autonomous driving then your loss function will not be simple but it will be like 100 or 200 lines of code to actually implement a loss that works well and that's robust to different situations and so again for the loss as for everything I don't know why I have to tell you that but people on Github seem to forget about it reusability is key you want to write your loss once and then apply it to as many models and datasets as possible and there are some losses that are supported by tensorflow like the cross entropy loss and then the L1 loss or L2 loss that's the L1 norm they are not directly implemented by tensorflow it's like two lines of code but you have to write them always again so you might want to wrap them in a function that's called L1 loss and just use that all over the place again and then there's more complex losses like alpha balancing or focusing the loss which are state of the art in papers and those losses are a bit more involved to write and you don't want to rethink how to write them every time again and if you have a network that has problems with learning because your dataset is a bit unbalanced you might want to consider a focal loss and then if you wrote it in a reusable way you can just stick that focal loss in there and 5 seconds later you have implemented your model with focal loss and you don't have to read the paper again understand how it worked implement it again and you will spend like one or two days until your focal loss works and if you don't write it in a reusable way you will waste those two days every time you write it again and also the problem is that there is an open source implementation by the authors of the paper but it's in that cryptic form that I showed at the beginning so you can't use it in your own code because you don't understand how it works because it's not reusable and so if you discover a fancy loss by yourself write it in a nice contained function as least inputs and outputs as possible and so how does the loss function actually look like if you want to create a loss in the case of MNIST it's quite simple the inputs are our model and the labels why do we need the model and not only the predictions it's because as we have seen in of inception our model doesn't only have one output but it had three outputs also relevant for the loss and so we need the whole model and all its endpoints and then what we do is the first line is just putting the node the mode of the network in a nice string format so we can use that for our output printing and then the next step is in the case of MNIST we have to make our labels fit the output format of the model and then we can calculate a softmax cross entropy with logits that's the tensor flow implementation of the cross entropy loss for this talk it's not important to understand what actually this function is you should put some function in there in this case it's just the cross entropy but it could be some more complex function like the L1 loss or the focal loss because you feed your network batch sizes you don't feed it one image at a time but you feed it 16, 32 or 100 images at a time and so you get a 100 loss outputs but your loss only can be one number and so what you have to do is you have to reduce that into one number and that's the loss operation which you can see here and that loss is then put into a tf-summary scalar and that tf-summary scalar is basically so that tensorboard can nicely visualize your stuff and the matrix is another way of putting it into a way that you can visualize it we will see later what the matrix are useful for and so the two things that your loss basically returns is first the loss operation and then you have to optimize that's how bad your network is the error and then the matrix those are you can don't only have one loss but you can have all kind of other matrix that later will tell you how bad your network is performing because the loss is only one number and as you might already know from traditional debugging it's good to not only have one number of your code and the optimizer happens to be in TensorFlow the simplest part that you can imagine is just one function call there is the tf-train library where there is the optimizer all common optimizer are implemented there they are in a nice clean interface you just give them the learning rate some optional parameters and then you tell them what they should minimize and in this case they should minimize the loss so the optimizer is no magic at all so now we have seen how we can implement those four stages the first stage was with tf-records the second was the model cnn the third the loss that we've seen and the optimizer was that one line of code and then with that optimizer we have to call that optimizer and the way TensorFlow works it first creates a compute graph and that graph is not executed it's just defined nothing happens so you have to execute that graph and what you can use to execute it is the tf-estimators API that's pre-written for you but it doesn't support very well the way of evaluation and training at the same time but they're changing the estimator RP right now so maybe in a month or so the estimator RP might be a very great way to do this so I wanted to point it out here so here's some code that you can also see in the GitHub but for the sake of time I won't go into detail this is just the proof that all the patterns that I described to you earlier are compatible with the estimators framework and the way I did it was because I didn't like the estimators and they weren't giving me the flexibility I started customizing them via callbacks or I wrote them in such easy way that I could reimplement parts of it as I need it so my implementation of the estimators is just 77 lines of code to train a network and I guess 77 lines of code is quite easy and quick to understand instead of like 1000 lines of code that are split upon dozens of dozens of files like the estimator and what I have in this estimator package by myself is just a train and evaluate method and an easy version of it that's even simpler we'll see an example of the easy version and so what does this train and evaluate do and what does the estimator do what they do is they merge those summaries that we defined in the loss if you remember we had that tfscalar thing where we told it we want that loss to be visible to tensorboard and they have to be merged so all those values and then they have to be written into a log file that's the first few things you have to do and then you have to save your hyperparameter file into the log directory because if you don't do that the hyperparameter file is completely useless because you have it when you start the train but not later when you look at your results so you should that third step is the really important step and that's the step that the tfestimators API doesn't do for example and then the last step is a training loop which is basically just a for loop which calls your training operation and your summary operations and maybe some callbacks that you passed in and so here's all the code that you need to write in your network it's that clean and simple you just go at the top and define your you import the helper functions the easy train and evaluate the hyperparameter loader i've pre-implemented them i will put the github link at the end and then you import your own create model function that you have written somewhere for example for MNIST then you import your loss function and after that you then load your hyperparameters using the loadparams method and you call the easy train and evaluate using your hyperparameters your model and your create loss and i guess everybody in this room even if you don't really program python will understand what those lines of code do and they're simpler and less code than the 2000 lines of code we saw at the beginning and also we have seen how the complete loss was implemented that were like 10 lines of code we have seen how the model was implemented and so all those pieces it's quite little amount of code that you have to actually write to do deep learning with those patterns so you don't need 2800 lines and so to sum it up what patterns i found the most important there are the first three patterns the first was splitting everything into data, model loss and optimizer and treated as four separate parts so you can reuse them the second was functional programming because a neural network is a universal function approximator keep that in your mind when you're developing it and using a hyper parameter file and then when looking at the data set we saw that you should use the tf records they are much faster and that you should probably in the future consider using the data sets once the api gets stable and then for the model you should use that reuse variables method so you can have a training and an evaluation model to see if your model is actually overfitting or not so that you know if your model generalizes well or it doesn't and then the sixth step that was a bit between the lines is creating summaries and matrix and those summaries and matrix they are plotted in the tensorboard and you can also plot them in mudplotlib and make them available to you to inspect what your model is doing because those summaries they are basically your debugger for what's going on in the neural network you have no chance to look into it except for those summaries so take them seriously and now after I've talked so much about deep learning here's a quick disclaimer by Occam's razor the simplest solution is almost always the best not every problem is suited for deep learning some problems are better to use with linear regression in the first step when I told you to analyze your data and you realize it's some kind of a linear problem use a linear regression don't use a deep neural network it's overkill and the same goes for decision trees, K&N there are so many normal deep learning methods consider them first and look at your data what fits your data the best and if none of those traditional approaches fits then the solution is probably a deep neural network and so there's one last thing that I showed you is all available on GitHub in my GitHub repository StartTF I have on the main readme the examples, the exact same examples that we saw in this slide and there's also the implementation of the write data the read data that you can dig into into detail if you are interested or just reuse it if you want to and if you want to know how to actually implement something visit my workshop later and so with that I'm open for questions wow, I'm deeply impressed I understood nothing are there any questions come on no one understood anything there's no dumb questions you can also ask me anything you want if you want to know something about deep learning in general and not how to program it okay so in my experiment okay okay usually I don't have enough dataset on the internet is there any way that we for example I only have hundreds samples or thousands samples not so much as on the internet how is there a method that we can get more samples or yeah I understood your question what you can do in the first place is when we're looking at that image for example when you're looking at your data and you realize you don't have enough data or if it's unbalanced or something you can try data augmentation and the most steps there are rotating your images a bit moving them around cropping them, scaling them maybe changing the illumination and if your data is unbalanced you for example if we only have 10 images of someone as than Merkel we can duplicate those images or reduce the number of images of Merkel and try to balance our data and then there are of course some things that you can do with your model if you say let's have you only have a thousand images for a segmentation problem what you can then do is and that's why reusability is so important um you can for example go ahead and use the inception model and you take this model and cut it off right here just before the the the final classification layers you take those weights that were trained on millions of images by Google for you for free and you use that encoder and stick it in the front of your and solve your problem and so that's actually a good question because it demonstrates how important modularity is because you can just now use a module written by someone else and import it in your code and you should also check out tensorflow.hub it's a new API that was just released today, yesterday or somewhere around that time so any more questions what about Keras how does that how that can can that be used these high level APIs or how do they fit into this schematic yeah Keras is an API that's designed to make but the problem with Keras is that it makes deep learning simple because you can't create your own optimizer in such an intricate way or your loss as you can with bear tensorflow and what you can then do is specifically for that question there's in Keras you can use all those layers like the dense layer and instead of putting them into let's say a sequential model you can just give them at the end as an extra parameter the input of the previous layer and then as the output you get your next layer and then you can stack them together like normal tensorflow code and you can then put them right in this slide where the to do is you put your first Keras layer and as an extra parameter you feed in your input tensor and for the next you and you create your model in this way and then the last output is fed into the prediction step but you can't use the sequential model wrapper so any questions anymore okay you showed us how to separate the encoding after training we have some nets that detect several features and they are currently going on for example low bay to reuse parts of a train model to use the features are you working on that as well or do you have any ideas on that how to play that out in the future what you can basically do is this create model function I've shown it here as if you have one create what you can do is you can call another model function inside your create model function and this way your network the feature encoder part we are talking about feature encoding you have that as a model which you then input here as a create model function you just call that one and then you have your separate part on putting heads at the end if you want I can show you some code on that later I have code for that and so that's actually one of the things why you should do these create model functions because you can then do that very easily without any hassle is it possible to extract the classes also next to the weights and so on or I forced to provide the classes I'm sorry I don't understand your question can you rephrase it somehow or you can also ask in German if you want to ich möchte versuchen problem zu lösen von dem ich die Klassen vorher nicht kenne die möchte ich zusammen mit dem neuronalen Netzwerk trainieren ist es möglich oder ist es aus Sichtlos das ist eine gute Frage also das Framework ist nicht darauf spezialisiert, dass man unbedingt die Labels haben muss also man muss ja im Prinzip wenn man hier diese App anguckt, müssen wir nur irgendein Model zurückgeben und in diesem Model haben wir in dem Dictionary irgendwelche Tensionen drin stehen wo ich mir bei TensorFlow gerade nicht so ganz sicher bin wie man da Tensionen erstellen kann deren Länge man gar nicht kennt also würde ich mir vorstellen dass es vielleicht ein TensorFlow Problem ist aber im Prinzip ist diesem Pattern dieses Pattern das so zu teilen ist es egal was das Model ist und wie das aufgebaut ist und auch dem Los ist egal wie das Model aufgebaut ist es muss nur irgendwie wohl definiert sein oder es muss nur irgendwie TensorFlow formulierbar sein und ich weiß nicht ob man das Problem in TensorFlow formulieren kann ehrlich gesagt weil auf der Grafikkarte braucht man ja fester Arraygrößen und das ist dann ja schwierig wenn man die Labelgröße nicht kennt wenn ich die Frage richtig verstanden habe ich könnte vielleicht einschränken wie viele Labels maximal sind aber das was ich gerne wissen würde ist welches Datum welches Label bekommt das würde ich gerne zusätzlich zu den neuronalen Netz trainiert haben wollen ist das möglich also das wäre dann im Prinzip ein Ansupervised Learning Prozess das müsste eigentlich auch damit möglich sein ich muss zugeben ich bin kein Experte bei Ansupervised Learning alles was ich in der Regel mache ist Supervised Learning deswegen bin ich dazu den Details leicht überfragt aber ich sehe nicht warum das mit dem Pattern ein Problem haben sollte ich muss nur sagen ich weiß nicht wie man es in TensorFlow dann tatsächlich implementieren würde also eine Variante wäre natürlich dann mit Autoencodern aber im Autoencoder habe ich ja auch wieder ein Label also wenn ich zum Beispiel ein Autoencoder mache wo ich lernen möchte zum Beispiel ich gebe lauter Bilder von Katzen rein und ich möchte jetzt implizit Features von Katzen lernen was für Features eine Katze hat und weil ich dann machen kann die Autorencoder schreiben und diese Autorencoder kriegt als Input der Bilder von Katzen und den Output den er produzieren soll sind auch wieder Bilder von Katzen und das ist ein symmetrisches Netzwerk und was man dann nach dem also der Trainiervorgang ist dann wieder Supervised weil ich habe ja Input und Output sind bekannt das heißt ich habe Katzen und Katzen also Input und Output als Katze und was ich dann am Ende machen kann ist ich kann meine Bitte durchschneiden und dann ist der Layer an dem ich es durchgeschnitten habe und dahin ist dann mein Feature Encoder und da habe ich dann meine Katzen Features und dann kann ich nachgucken da gibt es zum Beispiel DeepDream kann ich gucken was diese Features bedeuten und kann gucken dann kriege ich zum Beispiel wenn ich das eine Feature aktiviere kriege ich dann, das wird nur aktiviert für Inputs die zum Beispiel Nasen sind oder die die Nasenhaare sind und so könnte man also das wäre so eine Art von Anzugbeweist Learning die mit der ich mich ein bisschen auskenne aber technisch programmiert technisch ist es halt auch wieder ein Superweist Problem ich habe mein Anzugbeweist Problem so umgeschrieben dass ich Label und Feature beides kenne A practical question your workshop where is it taking place in an hour and is it going to be in English or all German I planned on doing it in German but if there are some people who only speak English I can consider doing it in English too that's not the problem with the language the original plan was holding it in German and for the answer where it is we have the far plan here now it's in the half gay workshop room I don't know where that room is yet I have to explore that myself but yeah wow thank you very questions thank you for this talk applause