 Hello, and welcome. Sorry for the technical issues. There is problems with Macbook. And I'm Guillaume Salou. I'm the team leader of machine learning services at OVH Cloud. And today we will talk about machine learning at scale with OpenStack. And I will discuss about machine learning with you. And first I will talk about OVH Cloud. As you may guess, we are a cloud company as well. And we have more than 300,000 physical servers all around the world. Some of them are in the US, in the Canada, in Asia, and most of them are in Europe. And we have 30 data centers all around the world. So we will see our values at OVH, which is very important. And we want to be smart, smart for simple, multi-local, accessible, reversible, and transparent. And when we are thinking about open source, we have to be reversible and transparent. So it's okay for us to contribute to open source project as well. And you will see if our platform of machine learning is smart as well. And we try to do this every day and we keep those values in mind. So I will talk about my team. This is the machine learning services. And we have merged two different job profiles. First, there are data scientists. Data scientists are here to solve problems, problems about the company, and we will see some use case at OVH cloud. And on the other side, we have DevOps. They are here to ensure the problems about the infrastructure or to deploy some component on OpenStack, for instance. And at the middle of the two different job profiles, we have the platform that we are building. So here is a new job profile, which is machine learning software engineer. His goal is to provide and to build machine learning platform, machine learning tool. And we will see how we do this. So we'll start and talk about machine learning to be sure that we will talk about the same things. And we'll show one example and we will show after this how we deploy it at OVH cloud and after this, the machine learning development process and how a machine learning platform can simplify the deployment and the machine learning development process and how we are building it on OpenStack and at scale. So here is a little picture about what is machine learning. I'll let you read this. So we will take an example with Homer. In definition, machine learning is only a function. A function as F and X as input and we will return Y. So there is some code between this and what is different with machine learning? The code will be automatically generated and you don't have to wonder about this. So we will take the example of Homer. Here is the Homer of the UDCA and here is Homer Simpson. And we will try to train a model and to do some image recognition about which Homer we want to find. So we have different images of Homer Simpson and different images of Homer from the UDCA and we will see we have to put some labels. So the syntax of Homer is the same in English. So I have added an E at the end for the French syntax and we will try to build the function, the algorithm, automatically. So we have the label data and we want to train an algorithm and have an algorithm as result and after this we will send a query and have a prediction. So that's cool. We have this function. So we will put an images and we will have a probability to be Homer Simpson or Homer from the UDCA. It's quite simple and the function is built automatically at the train task. So we will see some use case at the VH and you will say but it's not Homer and it's not image recognition but it's the same. It's a function automatically built. So a first use case is electricity consumption forecast in order to predict how we will consume and to better negotiate contract with our electricity provider. So you see here is a prediction in detail for each data center and it allows us to scale generators and the supply as well to better scale and to improve our pricing. And if you remember our values we want to have low prices as well and the better prices. Another use case is the anomaly detection. In our data center, servers are water cooled. So this is pretty cool. But we have to take care about the temperature and so we have developed an algorithm to detect anomaly in each room and each rack of our DC, 30 DC all around the world. It's a lot of time series as well and we have to detect and we can't use some fixed values because it changes all over the time. It depends on the weather. It depends on the load on the server. So the best solution that we have developed is this solution and it works pretty well. Another use case is the VIP lines monitoring and forecast. Why? Because VIP lines is a trade market. So you have to buy and after you sell. So if you know how much you will sell, you know how much you have to buy. And it's quite simple. It's forecast and a function and as an input a time series, as an output another time series. So there is another and there is many, many, many different projects that we are working on. There is two groups. There is the predictive balance as well on our equipment in data center and the other part is around the fraud. Like spam, phishing, fraud detection, data leak. There is a full of project. Here is another example that we have worked on is to predict the size of our safe cluster in order to better scales our safe cluster. So now we will see the machine learning development process. There is different steps. As you may remember, there is the data collection, the data preparation, the model search and the deployment of the model. And we will see each steps. So first steps is a data scientist job. Is to understand the business problem and to modelize the data and to do some feature extraction. It's very interesting for the data scientist and they have to find a better way to develop the file or the columns that represent the problems of the business. So it's okay. And the next step is quite boring. It's the preprocessing. You have to scale, you have to encode data. Who knows machine learning in this room? Do you like to preprocess your data? Encode it. You can raise your hands if you like. No, nobody likes this. So we will see after how we manage it on the platform. The next step is the optimization part. We have to find the better algorithm and its associated parameters. So you have to try different configurations. Do some grid search. It's very boring. Take some times. And we will try to find a better solution for this. And the last but not the least is to deploy the model. We have different language, different framework. You have to do some service monitoring on your services if you deploy your model. You have to keep it updated. So it's quite complicated for the data scientist and DevOps are here to manage this. So we will see how we handle this problem as well. So, globally, we are trying to industrialize machine learning in order to simplify the data scientist job and enable quick queen fast fail. Why quick queen fast fail? Because if we have a problem, if we have the business problem, we can test it and see if it's okay or not. Because sometimes it fails but the problem cannot be solved by a machine learning algorithm. And you have a different workflow or it allows us to have a better life. So we will see our solution which is called OVH Cloud AutoML and previously we called it pre-science. If you take a look at the end of the presentation I will give you some links to try the solution and it's called pre-science. We are changing the name today. So what is OVH Cloud AutoML? It's a self-service platform in order to help data scientists to deploy model easily. At scale as well. And you have a UI and a Python CLI to automate the task. So we have chosen to do only these steps, not the data collection because it's the data scientist job. You have to prepare the data to put in the platform and let it work. And it allows us to do some retrain. For instance, you have to retrain every day. So it's fully automated on the platform. So we will see the architecture. It's the global architecture. We have decided our first decision was to use OpenStack and especially Nova. Nova for the compute. Why? Because we have GPUs. We have some flavors. We need more RAM. We need more disks for some tasks. It was quite complicated. And Nova helps us to do this. And on top of Nova, we used Kubernetes. And if you want to talk about Kubernetes and how it's deployed on OpenStack, Kevin will do a talk tomorrow in order to explain you how it's built. So we deployed some pods upon Kubernetes. And for the storage, we have chosen Swift because it's K, here it's K, and here it's K. It's okay. We don't have problems. So here are the results. You have the UI and you have the Python CLI. So here I'm trying to solve the problem of the passengers of an airport. I'm trying to predict, to create a model that will forecast how many passengers there will be in the airport. And it's exactly the same with the UI and the Python CLI. So we will see in details each steps that we have seen previously. So the first steps is to do preprocessing. Before this, there is a parser. We need to parse the data. For instance, if there is an ECSV, we have to know what are the columns. If there is an integer, a string. So the parser will detect the type and do some statistics about it. After this, there is a preprocessor which will replace missing values, which will encode categories, and it will use the statistics of the parser. For instance, if you have a low cardinality of a category, we can use a one-hot encoder. And if the cardinality is high, we can use a level coding as well. And it's fully automated. So it's deployed as we have seen on Kubernetes and Nova. And we are using Spark as well. And the result, which is serialized in PMML, is stored on Swift, as I've already explained. Then there are some issues with Spark and Swift. The most issues that we had, Spark is using Swift like HDFS. But Swift is eventually consistent. So this is a problem for us. And we wanted to solve this problem. So we begin to use a better library than the Hadoop one. The Hadoop library to connect Spark and Swift, which is a mess. So we started to use Tocator, which is a library from IBM. And it's all of some problems, but not the async list, because when you store data with Swift, when you do some list, the list is asynchronous. So we have tried to solve these problems with a custom library, but it's still not okay. And we have to work on it. But the best solution was to abstract the storage. So we have used mine.io. And we can use now, CIF, S3 connector, Kubernetes to store data. We have the choice to do what we want and to configure it. So now we are at the algorithm selection. We have decided to use a smack to configure the algorithm selection. If you want to talk about this, it's a machine learning algorithm and you can talk about it, because it's not the purpose of today. So here is the architecture. So you have some optimizer. And optimizer, we will send some query to a controller. The controller will ask some tasks to the worker, which are some learners, with different algorithms, TensorFlow, TakeItLearn. And it will try to find a better solution for our problem. And it will try what we call different configuration. A configuration is an algorithm and it's associated with hyperparameters. And the result is stored into Redis. The optimizer reads the result and sends another query to find the better configuration for the problem that he is trying to solve. And we are using smack, which is not a machine learning, but a configurator for algorithm, automated configurator for algorithm, SMAC. So here in details, how we store it then to Swift as well, because we store everything on Swift. And we have serialized models stored on Swift. At the end of the optimization. So it's okay. Now we just have to deploy the model. So here are the serialized models. And on Swift, and we have to load it on a serving API, using our Kubernetes cluster. And we are deploying our model. And we are able to scale with Kubernetes and add some worker if we want to handle some load. And if we have some big batch that is currently on the platform. And all the metrics are stored into our metrics data platform, which currently is about 500 million time series. Not only for us. So this platform allows us to focus on high value task on the data collection. The data scientist is focused on this problem. And we use the cloud to solve our problem of optimization, of preprocessing. And we have a feedback loop in order to have a better algorithm and better solution. So there is another platform, just a few words about it. We are working on it, especially for deployment of models. This platform is a serving engine in order to deploy models, but not only on the full machine learning platform, because sometimes we have to develop specific models that can't be in the OVH AutoML. So we have to have another platform to deploy models. That's why we have built this one. And this is very important because we are thinking about open sourcing and we are open sourcing this platform. And the goal, the final goal of this open sourcing is to open source first the serving engine and then all the platform. And we have started with the end of the serving because it's quite simpler for us to begin with this. So if you want to use our platform, there is some link, some QR code. It is free. It's our lab. And we are using feedback to, to collect the feedback of our customer. So the machine learning platform is the first and the serving engine is here. And we will open sourcing about one month this platform. I don't know for the... I don't know right now, but about in six months we are working on it. And a few words, we are hiring. So if you're interested in OpenStack and you want to work with cool people and all around the world, here is a link also. You can apply. And you have a job description about which is available. So thank you. Oh, sorry. I forgot. If you have questions, don't hesitate. Thank you. Perhaps I missed it, but you never talked about that you have hardware capabilities down there supporting your machine learning things like using a virtual GPUs and things like that. You don't do this simply or you have it and you just didn't mention it or I was falling asleep. Sorry, I didn't. It was very interesting actually. Thank you. I didn't mention because we are focused. There is some teams at OVH Cloud. Some guys are working on OpenStack. Some guys like Kevin are working on Kubernetes. And we are only focused on our stack. So the technical problems of our GPUs, CRGPUs, V100, and there is some guys to solve these problems to allow us to use simply. We just have to deploy a configuration on Kubernetes and to use the GPUs. And if there is a problem, right now we are working on CRGPUs because we are trying to solve problems about inference, serving engine. And we have to use CRGPUs. So I've only asked Kevin to give me this and it's a feature on its platform. Any question? I have a question. So what are the KPIs you are using for anomaly detection for different kinds of servers like logs, metrics? For anomaly detection, temperature? You are just doing anomaly detection for temperature but how did you choose temperature as the KPI for anomaly detection? We have choose because we are using water cooling and we have to think about the load or the temperature of the cooling system and we have to be sure that everything is okay. But in the future we will work on predictive maintenance and right now I think we will work on the disk but it's not sure right now. We have to do some KPIs to measure if the project is more important than another and to start this project if it's the case. Is it okay? Thank you. So I know that you said that you store the results in Swift. Where do you keep the training data? The? The training data. Everything is stored on Swift. I love it. Everything. Everything in Swift. More stuff in Swift. Yeah, with Swift's case it's not a problem for us and we have teams that work on Swift. That's why we have chosen Swift. We don't have to worry about this. It's okay. Only with Spark and we are trained to serve this. Cool, thank you. Is it okay? No question? Thank you.