 I will explain some stuff about the machine learning. Okay. Let's go. Good afternoon, everyone. This talk is about what ensemble models are. Some of you might have heard about it. If you've ever participated in Kaggle competition, almost invariably every winning solution is an ensemble model. So we'll learn what ensemble models are and how to build using Python in today's talk. I'm a data scientist at Cisco and I've been doing machine learning for about 14 years. Before I start about the talk, I just want to give a short puzzle. Just a bunch of birds that's there in a tree, they're actually 150 birds. Slaughter comes along and he fires three shots on the birds. His probability to hit the target is 0.2, 20% of the times he can hit. So he's shot three times. One bird is hit. The question is at the end of how many birds now remain on the tree? Yes? Yes. They all fly away, right? So you can use complicated models, ensemble models, deep learning models, but never lose the picture. It's extremely important. This is an example where you will all relate to, but in practice, kind of models that you build may be so complicated that you may not even understand that it relates to the domain or not. What's the machine learning process? We have input. It gets fed into a learning algorithm which creates a machine learning model. And that's used to build the prediction. That's easy to tell. That's what is taught in school in any course. But in reality, the input data is in a different format. You'll have to transform, you'll have to modify, you have to clean, you have to do a whole lot of stuff before you create features from your input data. And that gets fed into models and you use the final model to predict. In reality, it's even more complex because there's so many ways using which you can create features. And you can fit so many different kinds of models. It's not just one kind of model, but you can build so many kinds of models. You have linear models, linear regression, logistic regression. You have tree-based models, decision trees. And you have ensemble tree-based models, random forests, gradient boosting machines, deep learning models, support vector machines. It's a lot. How would you do something like this? It's extremely difficult. So how am I really going to select something from this? Recapping, there are two important steps in the process. You have to create features from your input data set. You would also need to select models in some way. The challenge here is that the model is only as good as you. It's because you're the person who's creating the features. So the model is only as good as the features, which are created by you. And this is where a lot of time is spent in traditional machine learning place. You spend a lot of time in identifying the features, in creating the features, and generating it, transformation, all kinds of stuff to create the features. This is made really famous by the New York Times article, which tells that this is the janitorial work in a data science process, and it takes about 80% of the time. The challenge after you create the features is that, even if you use the same features, different models give different predictions. And why is that? It's because the solution space is so huge, it's defined by your entire space of your features. And the models can go and search in different regions of the solution space. Different algorithms search in different solution spaces. And it's kind of hard to figure out which one is what I should take, so that they have better generalization property. Okay, given that you know how to create features from it, we'll now talk about how to improve model performance. This is where you'll have different models to select from. Each model has different parameters that you can tune. It's called hyperparameters. And you can try different sets of features for each of the models. How can you create a final model using this? You cannot try all possible combinations that really explode. It's exponential in nature. Exhaustive search is definitely ruled out. And this is where Ensembl models helps us come up with some kind of strategy on solving this. We'll talk about what Ensembl models are, just with some toy example. We'll create on a dummy dataset three kinds of models. One is a linear model, logistic regression. It's a classification problem. We have a logistic regression model, a tree-based random forest model, and a gradient boosting model. So it's very easy to build in Python. Logistic regression is directly there in scikit-learn. You just create, just use linear model called logistic regression. Gradient boosting machine is implemented by an awesome library called xgboost. It's better than the one that's there in scikit-learn. And you have the prediction on the test dataset. The assumption here is that all the test datasets are one. And this is how the model comes. They all have different predictions on the same 10 data points, but they all have same accuracy. They all have 70% accuracy. How do we go and build a final prediction when you have all three models with equals? So if you probably use cross-validation as your metric, you're going to have 70% in all three of them. But if you look into what is here, there's an easy way where you can combine these three models into something better. You can take the number that comes maximum. Here, it's max count. And you see that, OK, in my first row, one comes the most. So I'm going to set my output as one. And if you do something like this, the second one, zero comes more, so I'm going to set zero. You do this, you miraculously enter a 90% accuracy. This is your simplest model, maximum counting, which will also be your average. In some sense, you're going to use CPU as a proxy for creating more features or creating this particular kind of model. So we're going to talk about strategies, which can use some clever techniques to search your solution space. Having said this, I should tell that ensemble models are not new. It's been there for a very long time. Techniques like Random Forest, Gradient Boosting Machines are also ensemble models. But they just use it at a very simplistic level. Advances in computing power has led to far more powerful techniques that we're going to see very shortly. Ensemble models were used predominantly in academia for a long time. Until the Netflix competition, this happened in 2007, this was one by creating an ensemble model. And that's when industry started noting, OK, we can use ensemble models. And now you see a lot of ensemble models in production. In its essence, this is how it looks like. You have an input data. You create different kinds of models. It's very important that you create different kinds of models. These are not the same models. And you combine them using some logic. And then you use that to take your final prediction. This is what the architecture of an ensemble model would look like. What are the advantages of doing this? It definitely improves the accuracy. Not always, but most of the times it improves the accuracy. It becomes very robust. Your variance of your output reduces. And because you can do things in parallel, it's nice to do parallelization. And you can do things much faster. Two important things, as I told here, you have to select different models. So you should have base model diversity. And there should be some way to aggregate the models. So these are two things that's very important for ensemble models. The four important techniques that you can use to create base models. You can use different training data sets. You can take, so that's your number of observations. You can sample on your number of features. You can use different algorithms. You can use linear regression. You can use random forests. You can use neural networks. That's different algorithms. And each one of them can have different hyperparameters. This will cause you to build different models. Once you have that, the combination logic can be voting. We saw voting was the one that we used. It could also be averaging. If you want to use probability as your output, then you would average. If you just want a class of your output, you would just vote. And blending and stacking. Stacking works something like this, that you would split your data into two parts. You would use the first part to create something called base model learner output. So these are your base models. And that's used as an input for your secondary model. So that's what you do in stacking. This could be, so this would also be a model. This model's input would be the output of the previous model. How would you do this in Python? You would, this is your base models. Here you have your base models, which can be run using Pipeline. Scikit-learn has Pipeline. So you can take different libraries. Scikit-learn, Keras is for deep learning. XGBoot is for gradient boosting. You build a model, and you set everything in a Pipeline. And you can use randomized search CV for getting your cross-validation score. And once you have your base models, you can use, so if you want to do weighted average, you would use a library called Hyperopt. It's a hyperoptimization library. If you want to do stacking, you would again feed it to another model, probably in XGBoost or Logistic Regression, to create the final prediction. Something on randomized search CV, it's faster than what you would do in your grid search. It's much faster. Hyperopt is, as I told earlier, it's for optimization over the search spaces. It's an optimization routine. It's widely used. For parallelization, so models lend itself nicely to parallelization. You can run each model in parallel. Generally, what you would do if you run one model is you would run your cross-validation in parallel. Instead of it here, you can scale out and you can run each model in parallel. And JobLib does this task extremely nicely. You can log, you can trace it. You really know which model takes a lot of time and where you should optimize. Gives you enough flexibility in doing that. OK, to really know more about ensembles at work, really suggest you to look into capital competitions. The winning models are invariably ensemble models these days. Here's an example of how this was used by the winner for the CrowdFlower Search relevance competition. CrowdFlower ran a competition where the search results have to be, the relevance of the search results have to be classified. You can see that once you get to the feature extraction stage, different models were built. And the final model was the combination of these models. Having said this, not everything is fine in an ensemble model. If you want to interpret a model for an interpretable model, this is really not what you would use. Interpretability goes for a task. Sometimes this is also true if you look into Kaggle to get to the last 2% or 1% extra accuracy. The time it takes to improve accuracy may not make sense in real-life practice. So that's a big disadvantage. It takes a really long time to improve accuracy. So you would probably do this really full-blown methodology, only if accuracy is your most important metric. OK, what's the cool stuff around it? We actually created a package on building ensemble models. We pushed it two, three days back. It's on the GitHub site. We've just submitted to PyPy, hoping to be home. So that's my team, which worked on this. Yes, that's all I have. Thanks, Christians. OK, who has a question for now's friend? No? Yeah, just a GitHub slide, please. The what? The GitHub slide. Yeah, right. Take a picture. Oh, yeah. Another? Documentation is still not up, but it has. It has notebooks which talks about how we use this. OK, no other question. Thank you so much. OK. Thanks a lot. Thank you.