 All right, good morning, everyone. Welcome to day two again of EuroPython 2021 and to the Data Science Mini-Con. So with me today, we have Andre from Crayport who's gonna be talking about AutoML with Keras. Hey, Andre. Hi, hi, everyone. All right, so just a little bit about him. He is a Data Scientist with Crayport and he's trying to help build AI for energy trading. And before him being a Data Scientist, he used to be an astronomer. So without further ado, over to you. Thank you very much. And yeah, thanks everyone for coming to my talk on the automated machine learning with the Keras library. Sorry for potentially not maintaining the icon tag and my slides are over here. So before we actually start, I would encourage you to after this talk, also to go check out a talk from yesterday's afternoon by Katarina and Mattias about automated machine learning with AutoSK Learn. It was really excellent with some of the topics we naturally overlapped, but it's good to give you a bit more overview. So like we said, my name is Andre. I do Data Science at Crayport. We are this year proud to sponsor. And so we are looking forward to chatting with you or even playing some games with you in our booth. So come and visit us. And with that out of the way, we can start. Now, like many people from time to time, I'd like to think to myself as a cynic. And as such, as a cynic can understand or perceive human progress as coming in two steps. These would be laziness and chasing away disappointment. Now humans are lazy beings and there is a lot of things that we don't like to be doing and that we would like to outsource to something or someone else. And every once in a blue moon, there comes an inventor or a genius or something like that who claims to have invented a machine, a tool, or whatever that helps to take this pain away to deal with it to help outsource it. And everyone is very happy how progressive we are. That is until the point where the second step comes, people start using it and realize it's not quite what's been sold to them. And that a lot more progress needs to happen before it catches up with the expectations. Now with quite a bit of tongue in cheek, we could have an example of the computers. Now I believe since pretty much the Godfried Leibniz, people have been thinking about building a thinking machine that would be doing the thinking for us. Thinking is hard, it's easy to be wrong, it's embarrassing, et cetera, et cetera. This was what people wanted to outsource. Now in the 20th century, someone actually built a computer, a thinking machine and people very quickly realized that even though, okay, these machines think in one way or another, it's kind of awkward and it's super difficult to communicate to these machines what we want them to think about, what we want them to do. And even today, after decades and decades of development, even though we got much better at communicating with these computers, we still need basically experts to do this for us efficiently. Now we could similarly shoehorn this story to machine learning. The laziness starts with the modeling. Again, modeling is hard thinking about how the nature on something works is difficult. And here comes machine learning that promises to take this pain away. You take your data, ideally gathered by something or someone other than you and you give it to your computer who will figure out everything it needs and enable you to make predictions what will happen. Now most of us probably know that even though machine learning has been around for a while, for example, the back propagation algorithm for the neural net training has been around since the 80s. It's only relatively recently, let's say 10-ish years, that kind of an order that machine learning has been really gaining traction. And we are getting better at that. But again, and this is the common theme, you need experts to do this efficiently. And this is kind of the theme that with all of these things, specifically also the machine learning, we would like to kind of give it, let's say to the masses so that everyone can more or less easily be doing that. And this is what the automated machine learning is striving to do. And it's something that we are, let's say starting to make a first steps towards. So in this talk specifically, we will be talking about like a very important part of this automated machine learning and that's automating the hyperparameter search. Now, just to be on the same level, everyone, we probably all know that machine learning models have parameters and hyperparameters. Parameters are the numbers that directly interact with the data and give the predictions in one way or another. And their values are set automatically during the model training. Now, hyperparameters on the other hand are set by some kind of an outside authority before the training and they don't change. They define the model architecture. Now, usually there are many parameters but only a few hyperparameters. We have a very simple example here, this schematic of a multi-layer perceptron, a very simple neural net, data flows from the left to the right. And if you have seen this structure, you probably know that each of the arrows here is one parameter. Now, if you count as it is in this picture, you have 64 parameters, so a lot of them. But depending on how you count, but you could argue that you have only three hyperparameters, which are the dimensions or the sizes of this middle layers. Neural nets again come in layers. So three hyperparameters, which would be five, four and four. And we can change these hyperparameters which would change the number of parameters. Now, why do we want to pick good hyperparameters? Well, this influences the model performance. If you pick them well, our model might perform well. If we choose them badly, it will perform badly. The question is how to do this. And sadly, it tends to be more art than science. Now, there are several algorithms that might help us to do this. And here I list a few. There is, for example, a grid search in which you pick a set of values for each of your hyperparameters and you check the combinations of each. So you build a model with a different set of hyperparameters, you train it, you check the performance and you repeat and in the end you pick the one that performed the best. It's schematically kind of shown in this bottom left colorful picture where we all look. The colors are, of course, the final model performance. Similarly, there is random search where we pick the hyperparameters obviously at random. Now, this is good, but also bears the danger that we might, for example, avoid the area where the performance would be the best. For example, here in the bottom right figure, it might be this red peak that no model actually covers. Now, this tries to be addressed a little bit with Bayesian optimization. Now, here, the hyperparameters are not picked entirely at random, but on the one hand, the algorithm remembers where within the hyperparameters space it already looked and it tries to explore more. And on the other hand, it also tries to predict where it would be promising to check next, where it expects a good performance to lie and exploit that. There is also a hyperband algorithm which starts with this parallel algorithm. It starts with a lot of choices for these hyperparameters and it tries to very quickly decide during the training which of the hyperparameters that are not promising are probably not going to give good performance. And it just stops them and reallocates the computing resources to the one that it deems promising. So these are some of the algorithms. And now let's look at how we might actually use them. So as the name of this talk indicates, we will be dealing with these algorithms within the Keras library. Keras, as we all probably know, is a deep learning library in Python that allows us to build deep learning models so neural networks. Now to use these parameters, we will be using two other libraries, namely KerasTuner and AutoKeras. I encourage you to go check them out, give kudos to the authors and basically the main message from this talk is go check these things out, play with them, they are really nice. KerasTuner is a part of Keras itself. So there is a chance that you probably already have it. It contains within it all the algorithms that we have just discussed on the previous slide. AutoKeras is a wrapper around KerasTuner and it adds a few more features or a lot more features that enable a user to take, to automatize some of the stuff. Now in this talk, what we will be doing is we will be trying to solve or at least give a hint of a solution of a toy machine learning problem. And we'll look at three fictional people with different levels of machine learning knowledge and we'll be looking at how these people with their knowledge might use these two libraries to solve this problem. And this actually implicitly brings us to the start of the talk where we were talking about needing experts to do machine learning. Well, this is basically taking the first steps to allowing us to do machine learning also if you are not really an expert. So let's see how this will work. Our toy problem will be one that many of us have probably interacted with already in one way or another. It will be the MNIST digit classification. So we will have a ton of these little images like shown here, these individual stamps, each of them having one handwritten digits on them and these three fictional people from the previous slide will be trying to build models that assign the correct digit to this image. So all the images from the first line will have assigned the label zero all from the one will be one, et cetera, et cetera. Now there is also an example notebook coming with this. This is the link. The link is also in the talk abstract. And again, I encourage you to go and check it out. It contains everything that I talk in this talk about. It's a bit more verbose. It has some more examples. And yeah, you can have a play edited and see how it all works. And now assuming that someone is still around after this, we can actually start. So the first person that we will be talking about is a data scientist. So this person has a decent machine learning knowledge. They claim it's good. And that means that they more or less know what they are doing. Now this person in solving this classification problem would like to retain a lot of control over the process. This, for example, allows them to be efficient when dealing with possible issues. They know what they've done. And if something goes wrong, it's easier to identify. And also by looking at how the whole process of hyperparameter surgery is going, they might, for example, gain new insights, what works, what doesn't work, what tends to happen, et cetera. So this person might have somewhere in their code this kind of a function. The function builds a Keras model, as its name suggests. It has no inputs. It does a few things we will look at and then it returns a Keras model ready to be trained. Now let's look into a bit more detail what happens in the individual steps. So to define the layer, the data scientist might use the Keras functional API. And this is how it works. You essentially define each individual layer as a variable and you use some kind of a Keras object. Then you assign some properties to this layer, for example, these filters and kernel size, which define exactly these hyperparameters we are talking about. And then you connect it to some kind of a previous layer, in this case, the input layer. The similar thing then works with other layer types. And in the end, we have the whole neural net ready. The next steps, which I call the prepare model is simply connecting this neural network to the outside world. So defining inputs where data will be coming in and defining the outputs where the predictions from the model or something will be coming out of. And the final step is the compilation of the model where among other things, we, for example, determine the optimizer, which is the algorithm that will be driving how the model learns. And this can have a hyperparameter of its own, for example, here, their learning rate with a value of 1,000. So we have a function like this kind of sketched out here. And you will see that you will, of course, understand that this is a bit awkward when you want to try a different set of hyperparameters, of course. You need to go to the function, you need to at least update these numbers or maybe even add new layers and update a lot of things. So now let's try having a look at how we could update this function so that we can use it in this automated hyperparameter search. And for this, we will take three of the code snippets from the previous slide and see how we will edit them. First, the function itself. First, it didn't have any inputs, but now we will add this input we will call HP, which stands for hyperparameters. And this allows us to go inside of the function and make these definitions of other things a bit more verbose. So let's see, for example, what happens to the convolution layer. We see that the hyperparameters change, expanded a little bit, and instead of these hard coded numbers that we have before, now we have some kind of placeholders. For filters where we have 32, now we have this HP, this hyperparameter.int. So we have some kind of an integer value. We call this convolutional filters, and we say that we want them to be at least 16, at most 128, and potentially we also want this integer to be increasing in the steps of 16. Similarly, with the kernel size, which starts at the fixed value of three, and now instead we want the algorithm to choose one value from a list, one, three or five. And similarly, you can do this with other layers, with floating point numbers, et cetera. Now this hyperparameter is not just limited to the neural net layers, but pretty much any kind of parameter within our build model function. So for example, our learning rate, we can, again, instead of a fixed value, we can make it variable to be chosen. So again, now we sketched out how we would update our function, and we can proceed to the hyperparameter search. So our data scientist will be using the Keras tuner library, and for this example, they will be using the Bayesian optimization. So one of the algorithms that we discussed, a few slides back, you can, of course, use the others. The data scientist will take our new and updated function, and then they can define a tuner, which will be an instance of this Bayesian optimization, where the hypermodel will be our model defining function, and we also need an objective. So some kind of a criterion that if we have two models can tell us which one is doing better. Now this hypermodel doesn't need to be function, it can be a class, as we will see, but it kind of works the same way. And anyway, now with the tuner defined, we can now load the data. Fortunately, the MNIST data comes with the Keras, so we can really load them very easily, and we can start a search with this tuner. Now just, we can quickly switch to a notebook. This is the notebook that I mentioned that you can download and play with, and let's just quickly run through some code to see how it works or to repeat. So we imported a lot of stuff, and now we loaded the data. This goes a bit further than what we had in the slides, but essentially, this is how the data looks like. So for example, the 11th element is this kind of an image, and we know it's a number three, like this. So in this cell, then we define the class that we call the MNIST hypermodel. But this is essentially what I mentioned, this is the class instead of the function. But it contains this build function that does what we've seen in the slides. For example, in here, we define the convolutional layer, and you are probably familiar already with this line. The filters will be HP.int with some particular values. So this is the same thing. Now we define the tuner as we've just done in the slides. The only difference here is that we define the hypermodel as this class. This is it. We can just quickly look at what hyperparameters we will be optimizing. So it's not important the output, but you can study it afterwards. And then you can start this tuner.search with the data that we loaded before. And this starts giving some kind of an output that it's potentially very useful for us. So it's now running the first trial. It's giving us a set of hyperparameter that it's currently using, and it's giving us the progress bar that you are probably familiar with if you've ever tried some training with Keras. And this will proceed in a series of loops, giving you a lot of information. Now, this is taking a while. So actually, let's go back to the slides and see what that might happen later. So let's give it a second to load. Yes, here we are. So we started tuner.search. In the notebook, it takes a bit more parameters, but this is really the only thing you need to do. And the output that it ends up with later is this. So we've seen that it's running the trials. It also records how it did the best. So for this particular screenshot, it finished with 90% accuracy, but at some point previously it almost reached 95%. It reached this 95% with this best set of hyperparameters so far, and currently it's running with this set. So that's quite clear. And in the end, after this is done, the data scientists can extract the best model and use it for their purposes. So we've seen that this person can put away a lot of the pain of the hyperparameter search, but still retain a lot of control of what's going on. Now let's switch to the second person that's going to be trying to solve this problem, and this I'm calling the technical manager. This person knows some coding and also knows machine learning, not at such a detail level as our data scientists, but they know some kind of high-level principles. And we will see that these tools actually allow the technical manager to translate these principles directly into the code more or less. So how will this work? Well, the technical manager will be using AutoKeras. This is the wrapper around Keras Tuner. And now they will be defining some kind of this hypermodel in a way that's similar to how you define a model in Keras. So you will be kind of defining these individual layers or blocks that will do the thing. So the technical manager knows that they are dealing with images. So they will start with an image input. They also know that you need to normalize the data in one way or another to ideally get a good performance of your model. So they will add this normalization block connecting it to the input node. Then they know that it's a computer vision problem and convolutional neural nets are good at solving those. So they will add a convolutional block just like this. And finally they know that they are dealing with a classification problem. So they will end up with a classification hat. And this is pretty much it. So high level concepts translated into the code. Now you will notice we put no options there, but if you actually go to AutoKeras, it allows you to go into quite a lot of detail telling the model what to do or telling the algorithm what to do. If you don't do this, there is a lot of default values that will be chosen for you. And this is it. Now they can define what is called the AutoModel similarly to the Keras. So they define how this will communicate with the outside role, what the inputs and the outputs are, and again defining the objective, which will be the validation accuracy. And with all of this in place, they can just go with the data, they load them just as we did in the notebook, and they can run fitting immediately. No normalization needed because now it's a part of the model. You will again notice that this is the very similar input to what happens with Keras, but now it's of course training a lot of models under the hood and giving you just the best one. And once they start running this, again the familiar output pops up with the trials, the best trial so far, the hyperparameters, now you see that there is a bit more of them and they are a bit more complicated, but essentially broadly it's the same and the progress bar at the bottom. So technical manager is able to solve this problem even though not being an exactly a machine learning expert. And now let's get to the data scientist of the future. Now I am labeling this person like this very reluctantly because the idea is that this person doesn't actually know anything about machine learning. So it's a bit of a scary future, but if you want you can replace this with your parents or your grandparents. It should work the same. And this actually allows us to see the first steps towards the machine learning as it was more or less promised to us. So this effortless thing. Now admittedly our problem lens itself well here, but it's a good example. So this person will use AutoKeras as well, but they only know the setting of the problem. So they know they want to build a machine learning model, so they specify one. They know they have images on the input, so they will specify that. And they know they want to classify these images on the output. And that's really it. That's all they do. They load data like this and they can again run this AutoModel.fit just as we did before. And it just works. Now it turns out that this can be made even easier because image classification is such a common problem. So there is already an image classifier within AutoKeras. And similarly, there are other these type of hypermodels for other kind of common applications. But anyway, this is defined. Once they start running it, it again gives them by now a familiar output and they managed to do this without really much knowledge about machine learning. And so this is really it. We've discussed a bit about automated machine learning, specifically the automating of the hyperparameter search, so how it can make life easier for data scientists, but also how it can give the non-experts easier access into the field and maybe get them excited because they can do powerful things very early on. And we played a bit with the Keras tuner and AutoKeras libraries that are definitely worth exploring. So thank you very much for your attention. Fantastic talk. I truly loved your talk. And especially the fact that you went in so much detail explaining the tuner, the whole Bayesian optimization. I personally loved it. I think we have one or two questions. So I'm going to put those up. We can discuss those. We have about three minutes to discuss those. So the first one is, does AutoKeras keep track of training for each combinations of hyperparameters and use those for the next round? Or does it go through all the possible combinations? So it depends. So obviously if you get out of the grid search, you can't really go through the all possible combinations. Unless you give it the choice in which case you can, you can always specify the number of trials you want to have. And this then allows you to try all of them. But I don't think it does that automatically. Now I am not sure to be honest whether it keeps track of everything. I just think it really keeps track of the best one and then you can get the best model out of it in the end. So I think that's the answer to that. Perfect. Since you mentioned about this other talk from before about AutoSK Learn, there was a question about how does AutoKeras compare with AutoSK Learn? And maybe I will extend that question to ask when would you use what? If you would do, you know. So I mean to be completely honest, I'm an expert of neither. Maybe on AutoKeras. But there is definitely overlap. So for example, AutoKeras allows you to do some automated hyperparameter search with SK Learn. Now from yesterday's talk, I got the impression that AutoSK Learn is a bit more advanced in terms of also data wrangling and for example, using this model ensemble. So it actually keeps multiple models and combines their combination. I think in this case, AutoKeras just gives you the best model in the end and do with it what you want. So there are definitely comparisons. But yeah. That makes sense and fair enough. Probably one more question. So there can be multiple hyperparameters to be tuned and this might be a very compute-intensive and time-intensive job. So is there a way to run these parallely or in a distributed way to perhaps some way reduce the compute time and the overall time it takes? Yes, it is possible. So AutoKeras, like I said, is a wrapper around Keras Tuner, which is a part of Keras, which is then a part of TensorFlow. So essentially, whatever you can do with TensorFlow, which includes all of these things that you just mentioned, it's possible to do with these tools as well. And the documentation is really nice with the tutorial. So I think I saw bits that describe how to do this. Perfect. Well, again, thank you so much for answering all the questions and lovely talk. What I would recommend is there were a lot more questions. So I would recommend if you all can move over to the breakout room for Parit and for the Parit track and maybe take those questions over there. And thank you again. Thank you.