 OK, good afternoon, everyone. Today, I will talk about how to share some experience when we build the AI cloud. Actually, it's we provision and manage the 10-flow cluster with OpenStack. So I will share some experience and some consolidation when we're doing these things. This is not a step-by-step guide, but I offer a lot of options. Maybe you can choose the one, maybe it fits you. And I am Leon Pan from the office of the CTO in EMC. And since I joined the EMC in 2011, I focused on the cloud computing things, not only OpenStack, but including some foundation research, for example, like the container, like authorization, like the hardware, all these things. And this is my co-presenter, Ethel Zhao. Welcome, my dear guests. Well, welcome to Barcelona and our session. My name is Akseler. It's the first half of acceleration. We work previously. I work at the EMC OCTO. We work on a series of innovative cloud platforms, such as, for example, OpenStack, Scheduling, and containers. We have done a lot of good work. Now let my colleague Lain to proceed. Thank you. OK, as you see, both Ethel and I is working in the EMC office of the CTO ARD team. And there are four directions in the ARD team. The first data, hardware, software infrastructure, then scientists. And of course, as one team, we all four directions, we work closely. So you can see there is a team called the then scientists. This is some guys very good at maths and do a lot of algorithms. And of course, in the modern machine learning, it's based on infrastructure. We have provided a lot of the calculation capability for them. So based on that, we manage our infrastructure with OpenStack for them. And we select OpenStack as our cloud management platform. The most important consideration is because it's open. The open is very important for us because we are doing the infrastructure and research. So we need to add a lot of the hardware supporting the platform. So because OpenStack is open, so we can add a new feature to it. We can customize it. So this is very important for me. I think this is the key reason we use OpenStack a lot than other platforms. You know the relationship between us and when we're up. I still use OpenStack. And we use OpenStack since the SS version very early before. But I'm not quite remember we start Diablo or SS. But very clearly, the first we touch the OpenStack is in the Diablo version. There was a meetup in Shanghai. I think it's the first or second meetup holding Shanghai is about Diablo. It's very easy to remember these things because Diablo is a very famous RPG game. Maybe you know that. So it's very easy to remember this version. And also it is the same province, a big pie apple in Chinese. So I remember after that meetup, we built our infrastructure with OpenStack. Maybe it's Diablo, maybe it's SS, but I cannot quite make sure about that. OK, so we have talked why we're here, why we talk about OpenStack. So the last thing is we dive back to our topic about AI. The AI research is started very early, even before the computing invention. That is because, you know, the human used to build a machine just like a human being, like ourselves, like a human being. So the AI is quite straightforward idea when we invent something automatically. So when we're talking AI in this and here, it's about the computing. So after the computer invent, the AI research was born in 1952. There was some research there. So you can find some, always jump to the next slide. And you can see in those days, some foundational research happened in very early in the 50s. You can see the, like the neutral level, like the logic is possibly recent. These things are all born in the 50s. And you can see you get the first boom from the 1952 to the 1956. That's because when we found something, we are very excited and think it can do anything. But when we get deeper to understand what AI is, we found the limitation. So that the first winter is coming, is because we found a lot of things we cannot solve the problem. For example, the limitation of the computer power. And we also cannot have some, we found that AI cannot solve many new problems in those days. But in the 80s, the AI is returned again because the Japanese become very rich and they raise the fifth generation computers, this concept. So the AI is back again in the 80s. But after that, in that time, now something very interesting like the expert system has become, this is very important in the AI area. Because of that, so the AI is working in many expert systems. So we can solve for very specified area. But after that, after the 1987, some history reason, so the money is gone, so the AI, the second winter is coming. So you feel to hear some story about AI in depth times after that. But currently, after 20 years development, AI topic suddenly holds again, like this, this year after 2010, so the AI come again. That because some other topic, not just in the machine learning, the topic in the cloud computing, the large scale cluster and the new hardware, and this thing have made, it seems that the limitation caused by the compute capability is so passed by partially. So the AI is hot again. And other way, in other side, the new algorithm, like the deep learning, is emerging. Comparing to the traditional machine learning, the deep learning can get the benefit by the, when we have the more data, so we can get the accuracy. So the deep learning happens, so we can use it because we also have a cloud computing, so it works very well and can solve some problem. And the AlphaGo just beat the top human player in goal, so it made it, and when it's focused on this area again. And behind the AlphaGo, there is a framework called the TenFlow, at least Google claimed that AlphaGo is backed by the TenFlow, but we don't have, it said it's true a lot. But it claimed by Google is based on the TenFlow, and also the TenFlow is the backhand of the Google brain. And it is said, over 60 teams in Google is using the TenFlow in multiple products. And TenFlow itself, he has a lot of good features, for example, it supports different isolation card, for example, it supports the GPU, FPGA, and CPU, TPU, and it's very easy to add the support, the new hardware to the TenFlow framework. And also, it supports different language when we're writing the machine learning algorithm. For example, it supports the supply pass and the passion. So it offers the options to the scientist use their language to write the algorithm. And also, it can support the digital training and serving all these two types of supporting the TenFlow. And also, it embeds a lot of typical tokens for the data scientist. And of course, it is a naturally supported docker and can run in deploying to the Qublate. So you can easily deploy the TenFlow in the cloud, no matter it's AWS or GCE or other cloud. So because this cool feature makes it become the hottest deep learning framework when it's open source. As said before, TenFlow has several key advantages make it become most attractive deep learning framework, such as the flexible for the scientist to decide the algorithm and adjust the algorithm and more. But from a system engineer's perspective, the last three features are most interesting. So I will talk to you one by one. The first one is about the portable. The portable is in two aspects, because it is well supported docker. So it can really easy to put any kind of the cloud, any kind of the infrastructure. For example, you can run the TenFlow just in your laptop and also you can put to a very large cloud. For example, you can run thousands of machines in the AWS. It's very easy to do it. And another aspect is it can support different hardware. First, there are two concepts here. The first concept is called ops. So we can add the new features by adding the new ops, which is certified operator. So for example, the metric app is one operator and we can add more different kind of operator and add them new feature to TenFlow. And also it can, it's done by adding new kernel to ops. So currently it supports CPU and GPU, so we can add more in later. And you can see adding different the hardware to the TenFlow can benefit for the calculation capabilities. The diagram shows a test in our lab. The blue line is we use four E5V2 CPU, four loads and four CPU and get the machine learning iterators. And the red one is we just use one GPU called the model is Tesla K40C. And this is not the best performance in GPU because it's a long issue. Because the Tesla will take a lot of the memory. So the GPU memory is very, very rare. So it only shows a 40% performance of GPU. So we can expect, we can add more of the hardware to the TenFlow for different kind of calculation so we can get a better performance. And the second point is the TenFlow can add the research and the production. This is this type of machine learning called offline machine learning model. So usually they have two clusters. The first cluster is doing the training. So we have a lot of data, we divide the data into test validation and training data set. Then after we train it, we get a model. So the model can send to the serving cluster. The serving cluster is what is that, what providing the service to the clients. For example, if the model is an image recognition model, so it will be put to the serving cluster. So if I have an image here, we can send the image to the TenFlow serving. And it will tell me what exactly in the image. So this is called the serving model. And you can find here, the machine learning model is not just one time drop. It can be updated by the new feature and new adding new data to make it more accuracy. So we can do in the learning update. And what I'm most interested in in TenFlow is support the digital training and serving. So doing the digital training is not very easy in the deep learning work. If you're familiar with the gradient descent, you know there is a global accuracy. So it's very hard to do the digital build. So this is an image from the paper in about 2011 from JAPTIN. And it raised how to do the gradient descent in the last scale cluster. So it's very easy to understand. So just imagine you have a lot of data. You have a very big model. So either you assign your data to a small subset, so put to the training. So this is the left-hand side. And another option is you can sub your parameters to a subset. So all this one, you send to a cluster called the parameter server to do the coordination. So this is the image from the paper. If we make it to some production environment, so it's the right-hand side. It shows the components in TenFlow. So we can see the worker is very similar to the model training here. And if you send to the parameter server, also can be a cluster to do the coordination. If you look deep into more to the TenFlow, it will be more complex. The worker will also communicate each other because it organized the data in TenFlow with a graph. But this is an open stack. So we just keep the component view as OK, I think. Because this is exactly how your cluster looks. So firstly, you have to have a cluster spec to TenFlow. How many workers know here? So I made them work with each other. And also, how to find a parameter server. So you can imagine this is exactly what you deploy to your cloud, no matter if it's a water machine or a parameter machine or a container. So there should be five machines here. So they can be a cluster for TenFlow. This is a very, very small TenFlow cluster. And in a production environment, in no way just such a fine machine cluster, it cannot work. So in production environment, there will be thousands of the server need to be coordination. And luckily, the TenFlow itself provides very good capability. So this is an image from the Google paper. So we can see it gives the performance nearly the linear increase when we increase the hardware cross. So this is what we need open stack. So because we're talking about there will be thousands of server there. So we need to manage. So we need some cloud platform for managing all these clusters. And as I said before, we have already used open stack for providing to manage our cluster, our infrastructure. So it is a very easy option for us to use open stack to manage the TenFlow cluster too. So there is some cancellation we have to make. So how about we will offer the capability to meet the machine learning requirement? How it can support the hygienic environment? Is it flexible for the extending the new feature? For example, if we need to add a new hardware to the cluster, is it work? I mean, the hardware is not a new load, but maybe some PGA card we insert to the server can be found and add to the TenFlow cluster. And can we hide the plumbing for the system engineer and dance scientists to make the dance scientists not care much about how the underlayer work? So I raise this question to Esela, and he will continue how we solve this in the open stack. Well, generally, open stack is the de facto management platform for the cloud. When TenFlow runs on open stack, we can borrow a lot of the benefits from open stack and make it powerful. Generally, we have two options to integrate TenFlow to run on top of open stack. The first is Magnum. Magnum is the container as a service running on the platform of open stack. TenFlow is relatively well-packaged with running in container and in TenFlow, so Magnum become a choice. Another option is the Sahara. Sahara manages the big data for open stack. Well, TenFlow handles deep learning, and deep learning is also big data. So can we use Sahara to integrate TenFlow into open stack? The first, we will dive into the Magnum. What is Magnum? Well, Magnum is a bridge that connects open stack to the world of container. Container is relatively a very fast-developing technology in recent days. It provides well-packaging for the application. It helps accelerate application delivery and helps to manage the cluster. There are various of very powerful cluster management tools for running containers. For example, the Kubernetes, the Mesos, the Swarm. Generally, Magnum provides a way to quickly and automatically provision those cluster management platforms like Kubernetes, Mesos, and stuff. And it uses the Bay and Bay model to abstract away the heterogeneosity of those different platforms. On the other hand, the Magnum can bridge the advanced features of open stack into the container world. For example, we can use Magnum to help us to manage the bare mantle. We can run containers on bare mantle. And we can use Magnum to help us to bridge similar data volumes into the containers and also auto-scaling. Well, this becomes handy because many applications can be relatively easy to be deployed and spin up when packaged in containers. And Magnum becomes very useful when we try to do things in this way. Let's take a look at the architecture of Magnum. Well, like many other services running on the open stack, it eventually relies on the six core components of open stack to run. On the top layer, the Magnum API accepts the user requests, parsed them, and parsed them into the deeper layers for further processing. The conductor part is the heart of Magnum. It helps the orchestrate, deploy, and automate those container management platforms in the open stack. That is to say, to help deploy Kubernetes methods warm. If you deep dive into the code, you can see a lot of templates, heat templates, to help to do the things. Well, eventually and underlyingly, Magnum relies on heat to do the fundamental orchestration to install and deploy all these Kubernetes methods cloud platforms. Remember that we mentioned Magnum is a very good tool to provision Kubernetes. And TensorFlow works relatively well when packaged into containers. And it has also officially declared the integration with Kubernetes. So our approach is straightforward. First, we have open stack. Then we run Magnum. And we use the Magnum to deploy the Kubernetes on top of open stack. Then in the end, we run TensorFlow on top of Kubernetes in containers. So in this way, we can quickly to piece together those ready to use components and spin up the TensorFlow. And we can use Magnum to expose the advanced features of open stack for those container stuff to use. The steps are pretty straightforward. For example, the first step, we need to prepare the images for the TensorFlow. For example, the TensorFlow worker nodes, the TensorFlow server nodes, and the TensorFlow parameter servers. On the next step, we write the cluster spec. In the cluster spec, it's the way to specify how the training network should be organized in TensorFlow. And for the TensorFlow to be able to run on Kubernetes, we write the manifest, which defines how the nodes should be organized on Kubernetes. And all things piece together, we run them layer by layer. And we spin up the TensorFlow to train our service and service our models. There are a lot of online tutorials to train how to some typical models in TensorFlow. For example, the inception model. Well, any option always have good size and bad size. There are several good size and bad things for the TensorFlow to be able to run on open stack in the Magnum way. The good size are the container, Kubernetes, and all the other components are relatively mature and ready to use. We don't need to write any shim layer to connect things and make them ready to work. And also, because of we are using the Magnum, it can expose features from open stack into the containers. And because we are using the Kubernetes, it also brings us some benefits. Well, the bad size are actually obvious. We are running TensorFlow inside of Kubernetes. Actually, rather than running them inside of open stack, this is not open stack native. Open stack know nothing about the TensorFlow. And it doesn't have the direct control into the TensorFlow cluster. So those layers are actually separated, transparent of each other. The deployment is fragmented. It is not a unified and integrated deployment solution. Also, there are still a few additional pros and cons for using TensorFlow with the Magnum. The good size come mostly from the Kubernetes. Well, Kubernetes provide us some handy features for scheduling, such as the node selector and a node affinity. Basically, they let us to define how we should favor some type of nodes over the others. The use case is a typical one is that we try to use GPU over CPU. And for TensorFlow, this is important. And later, it becomes important when we try to build some advanced scheduling technologies on top of that. Another thing is the rolling upgrade. Well, container is very famous for its atomic city to package images. Well, it can atomically to spin up or shut down a container. And the Kubernetes, based on this, provide commands and tools for rolling upgrade. You can input a command and upgrade the container version to a new one. It becomes handy for upgrading the data learning models. When we have trained a new model, we can tag the model with a new version and use the Magnum to atomically upgrade that to make the new model active and online without the interruption of the production service. The next thing is auto-scaling. Well, both OpenStack and Kubernetes provide their own technology stack to do monitoring. In OpenStack, we can use Nokia Monosaka Slometer. It's basically the infrastructure level. And in Kubernetes, we can use monitoring tools like Hipster and Inflex DB. They will handle the data collection and the storage. It's basically targeting on the container layer. And when the metrics are collected and certain rules are triggered, we can scale up or scale down the containers or the virtual machines. Well, Magnum is equipped with commands for you to scale up and down the, for example, Kubernetes nodes or the virtual machines it has deployed. And in Kubernetes, it provides commands to scale up and down the containers. For TensorFlow, because of those helps, TensorFlow can be automatically equipped with auto-scaling. Well, those are the benefits. And next in the new section, we talk about the Sahara approach. Sahara is the platform in OpenStack to manage big data natively. Well, Sahara used the plug-in system to abstract away the differences from different data platforms. It provides the quickly provisioning and managing for, for example, for Hadoop, for Honorworks, for Spark, for Cloud Impla, and many else. It also is equipped with an EDP, Elastic Data Processing framework, where the user can define how the data should be processed and stored step-by-step, composing the whole workflow. In the end, because it is OpenStack native, it also provides a native horizon UI for the convenience of users. Well, let's take a look of the architecture of Sahara. Well, basically, you will find many services in OpenStack share similar architectures. In the API layer, Sahara takes into the user requests and pass them to the provisioning engine, or EDP engine. Provisioning engine is used to deploy and track the provisioning of those big data clusters, like what a magnum does in the conductor. And the EDP engine is used to track and monitor the data processing workflow. Also, there are other components, like the Sahara pages. This works with the horizon UI to provide the user interface. There are several pros and cons to integrate TensorFlow by the Sahara UI. Well, the pros are like, because this is OpenStack native, we have the native approach to integrate deep learning into the control of OpenStack. Because of Sahara, this is a fully integrated and unified interface. And the EDP engine can help us to define the workflows for the deep learning for TensorFlow. And there are other benefits that come from Sahara. For example, Sahara has its own enhanced scheduling, scaling, and storage management. And the integration with UI, we can all borrow them into the TensorFlow. But most importantly, the better sides are pretty straightforward and obvious. Well, because we need to implement the plugin to be able to use TensorFlow in Sahara. For now, there are little community support for this. And to implement the plugin is no easy work. And compared to Kubernetes, because we are not using Kubernetes here, we cannot borrow the benefits from Kubernetes, such as the scheduling, loading, upgrade, and so on. So there are always a lot of options. And eventually, we have chosen the maximum approach. The main reason is that every component in this approach is well mature and ready to use. We are, every layer has provided their mechanism for being further extended. For example, write some plugins or write new schedulers. And basically, when we try to run a multi-tenant environment for the TensorFlow, the Magnum and the OpenStack together are basically enough. This is a good approach. But further, we have a question. So is it enough to solve every problem? Now Lex does invite my colleague, Lane, to proceed. Maybe we have a little more time. So the first problem is, so I have said before, we need to add a lot of the hardware to this TensorFlow cluster. And as you know, the OpenStack basically is based on the machine. And not all those hardware support the machine. So we have some solution for that. So the temporary solution is we build a hybrid environment, which is bare metal and virtualization in the same environment. And so we leverage the ironic as the bare metal. So in our cluster, we have some components in the bare metal, which we use the hardware there. For example, the GPU or the acceleration car, we are in this bare metal environment. And also, there are some water machine there, which is for the control node, which require the HAA and these things. And ironic is very interesting project. It provides a theme to the bare metal, so it can make the bare metal itself can work exactly like the water machine in OpenStack provide the flexible. And there are some key components like any other OpenStack project. And the most interesting one is the lower driver. So with that, you can make the ironic bare metal as just like the KVM, which you can use the lower to manage both the bare metal and water machine. And we found there is a session in Tokyo which is a meeting called Megaman Sahara are able to work on a hybrid environment containing bare metal and water machine. And he raised the same question as us. So we did this session. So it's very good for us to leverage it. So this is the first one is how about to solve the hardware of things in waterization. And the same thing is we call it a two-level scheduling and scaling. So if you dip a little inside the solution, we built they are in two environments. The first part is the bare metal, which is we use the lower to call the ironic to provision the Megaman bare metal. Then the Megaman will provision the tap flow inside the docker. So there are two parts. The first part is bare metal, the same part is container. So like any other multiple control play problem, we have to show the barrier between the environment. For example, in the bare metal side, you didn't know how the application looks like. So it cannot know the status. It needs the hardware function. And in the container, it cannot control to scale the machines. So we have built something for them. So firstly, we have to pass through the hardware functions to the container. So the containers can know. So in the Kubernetes, we can know this machine has GPU and all has the afloat car. And also, we have to notify the container side. We have to notify the bare metal. Now, I need some more capability. Please help me to scale out. And please help me to build more bare metal machine so we can put the docker container into that machine and now it's the things. So it needs to do some attention in our jobs. Now, also, we found some very new chance, not far ago. They see the Atlantis with Intel and Google to enable the open step on the KubeLate. It said they will initial a project, build an open step upon the KubeLate. So the problem is the open step central, KubeLate central, we are using the open step and using the KubeLate. So we are very interested in this project. So we can imagine if we have this project, so we can just deploy it. We have the KubeLate, so we can just deploy our TensorFlow in the KubeLate. And also, we can open step as another application. So we can put some of my component in that open stack and use something like query to connect the path together. It will be very interesting. At least myself is very interesting to this project. OK. So, OK, any question? So. Typically, how many container instances are running on a single host? This is a good question. Actually, if you're asking in our environment, it's not a very big environment. We are just a research team, so we are doing our job there. So not many, but it can be maybe not more than 100. Yeah, that depends on your hardware. How about your machine? How many vCPUs in your machine? So it's very hard to say which is how many is better. A member, EMC has a rotors, right? OK. So it seems there is some pressure to attend this 55 here. 55. OK, so I get another. 19. Yeah, 19. OK, congratulations. Oh, 55. Sorry, 55. Oh, my fault. Sorry. OK, congratulations. Sorry.