 Let's get started. Okay. Sure. So good morning everyone. My name is Wei and I'm a faculty member at Tsinghua University in Beijing. And today I'm going to tell you about our experience of deploying a 200-note open-stack cluster for research purposes. And we will, I'll co-present with Dr. Yang Shuo, sitting here from Huawei Technology, and show how their open source project really helped our deployment. So a little bit background is that we started building our 200-note server cluster since late last year. So it has been in operation for about nine months. And there's two goals to building this cluster. First is production. So we are trying to support hundreds of researchers, including faculty and students, to run big data, so-called big data workload. That means Hadoop, that means Spark, these kind of things. And also I am a system researcher. We do distributed system research. So especially we want to try to build the next generation of cloud infrastructure. And this platform is also the test environment for my own research. So the outline of the talk is that first we'll talk about how, why we choose open-stack and why we use that. And we tell you about our customers and also our research region. Then we focus on what kind of current immediate problems we have in operating such infrastructure and how the campus, the Huawei open source technology, saves the day. So let's give you a quick overview of the profile of our customers. So if you are thinking about the university, if you're thinking about scientific research, the first thing you think is a high-performance computing or HPC world. I say, okay, I wish I had a requirement like that. So this is a heaven for system means. So if you are not familiar with that, this is called ROX cluster, very popular in HPC world. So why? Because it's so simple. Everything is distributed on those CDs, they call rows. So you put the row into the machine, install the OS in one machine, and then it will copy into different other machines. And then you have a cluster operating running. So if the customer wants a specific package, well, you look for it, whether it's in some other rows, you haven't installed. But the only way you manage packets, you manage the software on your infrastructure is by using the prepackaged CDs, and then you install. But how about if there is software application that's not available, then we say sorry to the customer, there's no way to really hack into these separate packages without breaking the whole thing. So that's the way HPC has been working for years. But different for us is that we do have a variety of applications. So we have some people running scientific image processing. So these are a lot of images of certain roof fly. I don't know what that is. But and also we have connectivity study. We have some social big data scene. For example, we run like a large scale online education, so-called MOOCs. So for example, this course has 26,000 enrollments. So we analyze the student behavior, try to predict how well they learn. We do social network mining. And we do also do something like natural language processing. All these applications has a lot of dependencies. This dependency has a hardware dependency. Say this one has a very little hardware dependency. This requires moderate CPU memory and hard drive. But it has a lot of software dependency. It depends on a lot of different things because all these research code are written by different research groups who use different technology, whatever technology students know. And then they try to put everything together. And these workloads are sometimes resource hungry, too. For example, this one is a genome computation type of workload. So human genome. Each human genome is like a little less than one gigabyte of data. And then they try to compare. Hundreds of people comparing the genome is like doing a lot of string comparisons on random strings. And this definitely requires a lot of CPU power, a lot of disk storage, store them, and a lot of IO bandwidth, of course, networking. And it requires a lot. Like all the research code, it's like use a different kind of technology, different kind of very researchy type of code, very immaterial and very difficult to maintain. Large variety. Even worse, the single customer's needs can change. For example, this is from one of my colleagues who does computational biology. He does a protein design. He tried to predict what the protein, the 3D infrastructure, should look like after he modified one kind of amino acid inside that. Initially, his requirement is simple. He just require a bunch of CPU power to do their A star search kind of algorithm. Very soon he figured, yeah, there is a A star search algorithm runs on GPU. I want GPU. That's still simple, right? So GPU is run on a single machine, we just add a couple of GPU running drivers. And then later he figured it's really the memory bounded. In a single box, you can only, I can only provide half a terabyte of memory that's the max I can probably provide in an economical way. And then he wants more memory. So he heard about Hadoop and moved to Hadoop. But it caused a lot of distributed infrastructure change and requires different hardware resources, also the software change. Then he is now moving to Spark because he thinks that this is slow. It's an iterative process. It's essentially what Spark is targeting at. He said, it's perfect fit for Spark. Then we need to do the entire thing again. So it's difficult to maintain such like changing requirement of customer. So that really leads to our longer term research vision. We do system research. Before I tell you the vision, so summarize the motivation. The first one is customer really wants flexibility. This is a quote from Nature magazine. It says it should not be the scientist who are required to be flexible. It really should be the cluster that should be flexible. Why people say that? That's because it comes from OHPC world. You must be flexible in what kind of language you write it in and what kind of distributed infrastructure. It's MPI cluster. Now you need to run MPI. So this is motivation one. Second motivation is that scientific computation is resource hungry. Unlike most of IT infrastructure, they really care about performance because what's the data? The data is huge. The problem is difficult. So this guy is Professor David Hausler from USA Center Cruise. He is famous because he is a pioneer in the human genome project. In one of our system research conference, he gave a keynote there and at the last he said, okay, everything why we haven't cured cancer. The computational biology doesn't work because of the goddamn IO. So they care about performance. The performance kills their application. The third motivation is that in a research environment, we do have a complex infrastructure. So we get whatever economical to buy 200 nodes, but we really do not have many IT staff to support this infrastructure. So during the daytime, I only have one single system administrator managing from the cleaning the room all the way up to all the software infrastructures. And at night, I have to do it personally. Why? Because the university pays IT staff overtime. They don't pay faculty overtime. So they want me to do it. So that's all our IT team to manage the entire thing, including the software. So what is difficult to manage IT infrastructure? I came from Google. So why not two person cannot manage the entire thing? First is heterogeneity. So we have different hardware, different software, different applications. So in this research environment, and everything change. Second is that we have layers of abstractions make it impossible to understand the root cause. Once something goes wrong. Well, I love abstractions. Because it's common saying that all problems in computer science can be solved by adding another level of abstraction. But a lot of people don't say the second part is except for there are many, the problem caused by many levels of abstractions. So debugging or debuggability is one of the examples in this kind. You abstract away the interface making it easy to program, but you also hide all the details allow you to efficiently debug the system. So that's why come to our research vision is longer term research vision is that we want to do a data driven big data infrastructure. Without abstractions, nobody can understand the system. But we say, okay, with data, who cares, you understand the system or not. It's all a bunch of data correlations in the system. So we make the data collection the first class citizen, we do cross layer data analysis, we just based on all the statistics effect, we tune the infrastructure. How do we tune it? Well, we have STM, we have software defense storage, that's all built into our infrastructure, but we just don't know how to tune the parameter yet. So we think with data driven approach, we can better tune the applications, we can make it easier to debug easier to manage. So here is our proposed infrastructure. We are currently building this is that it looks the same as normally people draw like a big data infrastructure, you know, different layers of networking, storage, you build a framework and then you build application on top. What's different is that we add this cross layer data collection scene, we analyze the data independent of all these interface abstractions, we analyze them together, we try to automatically understand what's going on in the system, and then we feed it back. So we use the software defined technology in order to fine tune this. In order to run this research, we have a fairly large team of about 20 people like working on this. In order to really run this, we have to have some kind of a technology in order to deploy this and maintain this in a production type of quality type. Remember, we still have real customers. So, but the production part is our short-term problem is operational issues. Let me try to summarize that. The day one problem is how do we, from bare metal, we buy all the equipment and spend a lot of time putting them together and then how do we get to the stage we can really run the distributed software. There are multiple steps. I can quickly go through that. First step is then you need to configure networking. Make sure every, you know, all the VLANs, all the IP addresses, the setup, you configure the servers. Make sure all these things are set up correctly. Then you have, for that step, you need to review a lot of vendor-specific documentations, write a lot of hardware-specific scripts to do this. Second step, you install the OS and basic infrastructure. And each of that is multiple steps. For example, OS, you have multiple steps, which depends on each other. So it's basically a channel for operation. And you install device drivers. You install security systems, account manage, and basic storage. So each one is a lot of different steps. Then you finally, you can get to a basic environment which you can deploy OpenStack packages. So OpenStack packages looks messy, but it's basically just a part of packages. As long as you know the list, you are fine, right? So this is an easy part of OpenStack configuration. So really the hard part of configuration is to set up all the configuration files, make sure they match each other, and all the network nodes, compute nodes, if there is one IP address that doesn't match, it doesn't talk to each other. If there is one configuration, say, which driver, OpenStack driver, in the driver class you should use. If that thing doesn't match, it doesn't talk to each other. So this is a hard part. It's wiring different components in OpenStack. And later, suddenly you have an OpenStack running, and then you, customer, because customer want different things, you need to config all the different kind of images containing different things, right? Especially those distributed infrastructure that's difficult to config. And also some proprietary software like MATLAB, you need to configure the license servers, these things. And next, you have an infrastructure, then you want to test it, you want to run it, we run a time pass, we run Rally to make sure it works. And we continue to run it to make sure it works as expected all the time. And if it doesn't, we have the log monitoring infrastructure to monitor that. So that's six steps, right? At this time, what you get? You get a number of configuration files and scripts. Also, what you have, you, we run SAP servers. We know how to do this. My configuration, so we run a SAP server, so we have a bunch of cookbooks with all the settings, right? And then we have a bunch of system mean who is close to burn out. That's the kind of stage. But what we really don't have is we don't have a production cluster. So all the work has been done on a test environment. Now we need to transfer that into a production cluster because the bunch of customers already lose their patience they are waiting, right? So the next step is transfer to production environment. But this is the usual logic because as a programmer, I'll show you some code. So this is a logic. Basically, it doesn't work. It doesn't work. Why? You have two choices. One is you debug in your production environment. One is you go back and test. So most people go back and test. They just go back to step one and restart. So what's wrong with that? Why is it difficult to do it? Because first, you need multiple people. We got people from Huawei. We got our own people. We got people from different hardware vendors. They need to work together. And then there is a lot of coordination to do. And it's difficult to coordinate because there are just so many documentations and scripts flying around everywhere, different format, different kind of language. It's a very steep learning curve for the architect or for the project manager. And also, there is a lot of human interventions there. Even if you have scripts, you need to know which scripts to run. Everything with human intervention is a very error-proof process. Of course, we use Seth. We use Puppet. So this is a great deployment tools, but it only solves part of the problem. It puts all the deployment scripts together. But what it doesn't put together is the environment. It says, okay, we have a cookbook. A cookbook needs input. The input is environment parameters. What is environment parameter? You normally see an IP address or the OS information which directly you put it into. So that's networking and OS. But soon, if you install multiple layers of software, open stack, on open stack, you install Hadoop. They want to monitor everything. So the higher layer, the environment is really the entire environment of the lower layer. So it has this kind of complex dependency of all the environment parameters. The environment parameters capture different places. Networking parameter. Basically, if we use REST, we use PKI8. So it's different kind of router configuration scripts or configuration files. OS will keep them in the cobbler kickstart files. And open stack, we happen to use Chef, use this cookbook to capture all the environment information. And we use Puppet also. And also we use Ansible. Why we give us too much trouble? Because we are not going through particular infrastructure. We are going through the scripts that exist in that community. We believe that this Puppet script is the best installing Hadoop. And then the Ansible script is best installing the monitoring log stack because it doesn't really exist in the Chef community. So we are going through the scripts rather than the infrastructure. So we have different things. Then all these environment variables capture different places. And it's manually copied into different places. You change one place, you forget to change another. Typical way of copying and paste code. Second problem is operation. So I know you guys are open stack expert. So I don't think you will say open stack is the most reliable piece of software in the world. So people discover all the problems. And it fails a lot of times. You need to figure out why. And even if open stack becomes solid, the hardware can still break. If we replace a server, then something breaks. It works yesterday, but after I replace a server, it didn't work. What dependency or what environment did you break? This can be everywhere along the stack. And then maybe you forget to change one IP address somewhere you copied. So it's not a single place capture the entire thing. And also why it's difficult to figure out is because open stack is fully decentralized infrastructure. Fully decentralized. Basically if something goes wrong, you look at all different information and you look at every single machine. You don't know which one caused the problem. Sometimes it doesn't say. Especially if you run disability storage like SAF, sometimes it goes wrong. And then you don't know which OSD caused the problem. It doesn't say. And the last problem is that change of idea of our customers. So one of my assistants says researchers change very often. They change ideas. And also computer scientists. We are a computer science department. So we have a lot of computer scientists. They change their ideas very often. They come from software background. They say, okay, I should be able to change. So how do we currently manage all the changes? It's this logic. So I have no idea if I change this. I tune this parameter. I maybe connect the network differently. Does it work? So what I do is I backup the data and start testing again. So it's completely another cycle. So the fundamental problem is that so operational knowledge we accumulated does not transfer from one person to another. So original engineer did the entire thing. He tells the second engineer, second engineer understand nothing. Everybody needs to build from scratch and multiple times in order to get it. Of course, if you are doing consulting business, that's good news for you because otherwise you'll be out of business if this thing is easy to teach. But for other people, it's a nightmare. Okay, so then I'll give you the stage to Dr. Yang Shuo who will tell how the open source project campus really saves the day of the operation. Thanks. Thanks Wei for the great introduction for the whole context of our project. So we are building an open source project called the campus and we started this project from really a book. Dr. Wei, Dr. Xu is a researcher from industry back to the academia. I trained the PhD going to the industry. So I started reading this book when it came around. And it gives me a lot of reflection. Let me reflect what happened in the industry. So if you really think of back to the 90s, we have NICS, which is the single box networking resources. We have CPU, which is single box computer resources. And we have a disk, which is the storage resources. And Linux came around and basically Linux managed all the resources, exposed a single image of compute device, which is a computer. And however, Linux didn't take too wide until there is this amazing, convenient tool like LiveCD. Nowadays, you insert LiveCD into your computer, it stands up your computer from a bare metal. And after that, you should have some kind of tool to basically telling you, my computer is working. What is running here? And that's a no system monitor. And if you really look at this in this data center as a computer context, right now we have a switch, which is a data center networking resources. We have a server, the computer resources, and we have a storage server, which is storage resources. And we are joining this conference to join this great movement for OpenStack. OpenStack, we all believe as OpenStack community members is forming this giant big data center computer. So, however, if I appear, I do a quick survey, probably people will say, standing up OpenStack cluster is equally difficult. And they say, why don't we start trying to build something that over time it can evolve into a LiveCD? Why don't we build something as Dr. Xu mentioned, after you stand up your machine, your giant machine data center scale? How can you tell your customer you are running well? So, that's what we reflected from this book. And so, this is two data centers, one is from Google, one the other is from I think REC space. Hopefully, you guys know which is Google's, which is the other. But essentially, once you have this giant data center computer, at this moment, you have all bare mental resources. And right now, the community is trying to solve a pretty hard problem, which is standing up the cluster. And that's basically, we call it zero touch deployment. We hopefully zero touch. It's a pretty hard problem. However, even with this gets solved, I'm not saying the industry has solved this problem in the context of OpenStack. But even this is solved, we have another problem, recognized that. And hopefully, there are several smart robotics company, they can help us do something to reduce the human level repetition. And after that, how can you design your cooling system? How can you design your power system? What's the best way to wire them up? You hopefully have some smart design, some smart blueprint. Hopefully, the AI industry give us some answer. But anyhow, this talk is really about the very, comparing to all the three layer, we are really just started this easy part in this context, which is a zero touch deployment problem. So as I said, this is from a book. We read a book, we think of this problem, we start with thinking big, but we are all trained engineers. We want to start small, start tangible. We want to solve something rather than writing a research paper. So we basically want to build a general system to deploy any distributed system, because this distributed system makes the environmental resources a giant computer. We started with this OpenStack, but we are not limited to that. So extensibility is our primary design goal. So we started, you know, automate OpenStack cluster. And that takes quite a while. And then, you know, with much shorter time period, we were able to deploy, you know, automate stuff. And we hope, you know, if in this audience, you guys have a, you know, requirement for building Spark, building Mesos. Mesos is another great, you know, tool to organize a data center computer, scale computer. Talk to us, right? We can, you know, exchange some idea, see whether Compass serve a foundation for you to start from there. So why, why we can even start thinking this? Because there is a pretty unified layering process for, you know, data center deployment. First, you want to deploy some certain, certain of, you know, whole system. And then you want to deploy the distributed system, basically demons across the machines. And hopefully you can fix them correctly so that they can, you know, talk to each other either master, slave way, or, you know, peer to peer fashion. They form a logical cluster. So, and in each layer of this, there is no lack of tools. And as Professor Xu already mentioned, even with this tool available, this problem right now, it still takes a lot of time to reason about. Why? Because even with this tool to solve your problem, to build your data center computer, you first need to write a lot of boilerplate code. Only after writing those boilerplate code, which is scripts, you start thinking, oh, I want to build that kind of system. I want to build the seven machines with my testing, you know, open stack cluster. And I want to have this HRA, you know, configuration of my control nodes, right? That's the real question you want to ask yourself. But your primary time was wasted by this boilerplate code. So, COMPAS will, you know, the system as a bunch of computational ingredient. We view this system, you know, all this modern data center as a software defined infrastructure. They compute these three basic, you know, foundation. And first, if you look at this, you know, environmental devices, right? It's a server. You first, you want to install some, you know, daemon regarding the computing. And then some daemon regarding the storage, some daemon regarding the networking infrastructure. And this SDN, you know, movement, you know, you even want to, you know, have all your, you know, networking gears to build by this, you know, software. So, I like the, in this morning's keynote, people are talking, I think Jonathan in the keynote said, you know, you really want to build something like, you know, building Legos, right? My son likes Lego. And basically, I realized we are really doing this same kind of mindset, right? If we can do that, we are tracking the problem in a tractable way. So, as I said, you know, there is a zero, almost zero difference between, you know, server software and networking software. Because if you really model your deployment process, you basically can deploy your, you know, networking fabric with your, you know, your networking ingredient onto those environmental, you know, resources. And then, you can fix them to the right controller, right? They can form the SDN infrastructure. After that, you can do this traditional server software management, right? If that, if that's the way we can build together, then the whole data center, it can be fully automated. You are not having any kind of underlocking. That's the goal we want to achieve. So, as I said, and also, you know, Professor Xu mentioned the challenge. The challenge is you want to have your system of the mean or, you know, data center builder basically focus on, you know, what kind of mic, you know, giant computer should look like. So, we basically model, you know, all these necessary steps into restful, you know, resources. Basically, machine switches are the environmental resources. Those are the resources you want to put your system onto. And then, we have this model called the adapter. For example, OpenStack is one adapter. It allows you to, you know, deploy the, you know, OpenStack onto the environmental resources. Once the, you know, the system, the targeted target system is deployed onto the environmental resource, then you form a logical cluster. And each node within that, you know, logical cluster is called a host, right? So basically, you can see on our website, syscompass.org. This is one example of our restful resource. And here, as I said, you know, you really want to have your customer to focus what's the real problem he want to answer, he want to ask and he want to answer to himself. So, you know, you know, forget about automation. Even without automation, you still want to answer these questions, these fundamental questions to yourself. And for example, in this OpenStack deployment, you want to say, hey, what's the environmental resources? I want to deploy my, you know, OpenStack packages onto. And that's step one and two, right? You answer those through the, you know, restful API, you select the right resources. And then you say, hey, because I'm already in the place of installing OpenStack, I select the adapter named OpenStack. And then I basically, I'm asked, you know, what kind of OpenStack you want to deploy? Are you going to deploy HA mode OpenStack, right? Or if testing purpose, are you going to just deploy single node controller node mode? After that, you basically give the, you know, the system administrator a feedback pulling the installation progress, right? Give them the feedback. So basically, this is a very high level, you know, diagram of our OpenStack design. So we provide a, you know, restful API so that, you know, we, you know, expose the fundamental resources we believe the, you know, the application writer want to write, basically the data center builder. And then we have a hardware management module, we have, you know, OS provision module, we have a package deployment module. As Professor Xu mentioned, right? People choose a certain, you know, certain configuration management software because of those scripts. We don't want to lock in, we don't want to select, pre-select a tool for you. We want to say, hey, whatever the best wheel out there, we want to leverage the existing wheel. We don't want to reinvent wheel. So as I said, you know, from the end user perspective, we really want to help you to build a extensible system so that extensible in a way as of today we can support OpenStack installation. We already supported SAF. This one year ago, this is still black, right? You know, this can just keep going. And if you are using ESXi, you know, there's nothing difference between ESXi, you know, provisioning and, you know, CentOS provisioning, right? Theoretically. And we are supporting this hardware. We tested it on HP hardware. We love OpenSource. Even, you know, hardware, there is movement, OCP, right? So if you, in the audience, there is OCP, you know, developer come to talk to us. So again, we are extensible in the configuration management tool. We are extensible in the, you know, OS provisioning tool. As Professor Xu said, you know, he doesn't want to be locked in to a certain tool chain. And we do not want to lock it in. So Professor Xu mentioned, after this stand-up problem is solved, let's say we solved it. Then you want to give your, you know, daily operator some knob, basically saying, hey, what's my, you know, cluster, data center computer, you know, about? Is it running well? So we basically say to solve that problem in a data center level, what basic layer you want to build. Then we say, first of all, you need to have certain of, you know, agent installing on your networking gears, installing on your computer gears. And you need to have some of this dealer, and you have, you know, storage engine. Most importantly, you want to expose those through, you know, RESTful API. With that, the, you know, big data researchers can consume this API, saying, hey, I can build a complex model to, you know, consume my, you know, daily operational data. So that's the fundamental, I'm skipping, because of the time I'm skipping this too. So basically that's, you know, we build the log management engine, we build the temporal data, the performance engine. And let's take a look at the demo. Let's see. Okay. Let me, yeah, let me finish the talk first. Oh, here, here we go. How to start it? Okay. Let me drag, because this is a installation process. I'll skip that for the time being. Let's get started to the monitoring part. So for this monitoring, we build this, you know, a panel so that we can show the user, show the user, you know, the topology. And not only the physical topology, we can show the user the logical topology of your services. You can zoom in, zoom out, look at the alerts, you know, over time, so that, you know, you get this sense of, you know, ownership. And not only that, you know, you can consume on top of this API doing, you know, model based analysis. And I think I'll conclude that my talk here. And I'll leave some Q&A sessions. Any questions? Yeah, please. It is, it is open source project. And the project is, it's, you know, following all the rules governed by open stack, you know, stack for checking process. Not only that, yeah, so that basically that is. And the web page of our open open source project is syscompass.org. You can check it out. Please. Sure, sure, great question. So lots of people are asking, hey, there is a triple O, right? Why do you guys even bother building that, right? There are two answers to that question. Back then, when we started building this project, you know, we evaluated the, you know, the tools in the market. And pretty much no tools can satisfy our needs. That's why we get started. And then the second answer to your question is, hey, if, what if these tools getting mature, right? And you are basically building another wheel? Why bother? The answer to that is we are not building another wheel. We actually, we well thought of this. For example, if triple O get mature enough, triple O could be used as one, you know, driver for us to remember this, because for the time constraint, I didn't mention this, explain in detail this plugin architecture. But we can use the triple O or other tools to actually actuate these actions. So hopefully I answered your question. Please. So we, great question. So we are holding, we hosted a, you know, local meetup. And, you know, the attendance is pretty good. And we have, well, we have this Tsinghua cluster largest, largest one. So far we deployed. We have several, you know, much smaller, you know, deployment. So that's the consumer side. From the community side, as I said, we just get started building the community. Yep. You're looking for more people. I look for, because I'm a person, really believe collaboration will, you know, grow the project. And also, if you really think of what we believed for their first play, we really want to, you know, different people with different background. For example, if you guys trying to build missiles, for example, a spark, right, we want to talk to you. How can we use our like 80% of our existing code and write like 200 lines of plug-in, right, to do something? If that's possible. That's an open question. Yep. Great question. That, by the way, I want to make an announcement. Tomorrow we will have, you know, design session in the afternoon. If you guys are interested in these questions, come to talk to us. And that design discussion is really for this kind of questions. We are really open minded. We want to hear what's needed and whether that fits into our bill, right. We don't want to say, oh, we don't want to scratch our tool to do something we are not good at. But if there is a similarity, there is synergy. We want to hear you and we want to discuss with you. Okay. So we can maybe take the further questions offline. Let's take the further questions offline. Thank you. Thank you for attending.