 Okay. Good morning everyone. My name is Xiao Yilu. I come from the Ohio State University. I'm a research scientist doing the research in big data and cloud computing. So today I'm so happy to present this talk here to my region because so many of you come to listen to my talk. Thanks a lot. And first of all, I apologize that my co-presenter, co-presenter Dr. D. K. Panda, because of some personal constraints at the last minute, he couldn't come. So I will present the whole talk. So this talk is about accelerating big data processing in the framework provisioning with OpenStack, heat-based, Hadoop, and Spark. So these days, I mean, all of us talking about big data, which is a big problem, a lot of challenges. The main reason is because the data is growing so fast. For example, let's say every day when we clicked our website, we use our smartphone or our operations, we generate data. And this data needs to be stored, analyzed, and processed. That's why big data becomes a big problem. And actually this problem is not only occurred for industry or business domains. It's also happening for the scientific or research domains. Let's say this days a lot of scientific big data problems as well. So to some extent, big data becomes like the speed of big data growing. These days, we see it almost faster or higher than the more slow. So in order to process this big data in a fast and scalable manner, a lot of people start trying to converge big data and high-performance computing technologies to meet these type of challenges, to solve these type of challenges. And on the other hand, running high-performance data analytics workloads on top of cloud is also gaining popularity. Let's say, according to the latest OpenStack survey, around almost 30% of the cloud deployments are running HPDA workloads. So when we talk about big data, so if you look from the system perspective, what kind of solutions or what kind of technologies these days people are using to solve big data problems, to run big data applications. So broadly, we define into two tiers. One is the frontier, one is back-end tier. The front-end tier is just like the example I mentioned earlier. Every day, if you click on the website, you use your smartphone, or your click screen will go to the front-end tier servers. They say web server, like a database server or long-sigo DB server, kind of things. And then this data will get stored, get processed there, and then it also go back to the back-end storage. Here we give example like HTFS. So probably you can also use SAF or orange GFS, some other kind of fire systems. And then on top of that, people typically run some kind of machine learning, data mining, deep learning kind of workloads on top of it. So here I give two examples like why is my produce, why is Spark, so that you can run your algorithms in a scale out fashion on top of distributed fire system. So how many of you familiar about MapReduce? Can you please raise your hand? Okay, great. There are so many people familiar with that. So just let everybody in the same page, I just quickly give some high-level view of what kind of ideas of MapReduce so that later on people can understand my talk easily. So broadly for MapReduce, just like here, the diagram shows, you have a big problem. You're written with some user-level program code. And then there is a kind of master-worker kind of architecture. And then this program will launch, and the master will assign a task to different kind of data splits. And then you run map tasks on each of the split. And then the output of a map task will get shuffled to the reduced phase. This is the reduced task. And then this map output will get gathered and aggregated, processed, and then output to this real fire system. So this kind of defined conqueror architecture. So during this phase, a lot of problems can get solved in a four-tolerant and scalable fashion. This is just like an example. You have a lot of tasks to do, to be done by yourself, but you have a lot of friends, good friends. So one way you can do is, you can do by yourself, which is sequential manner, which is not scalable, which is not high performance. But you can also assign your tasks to good friends, right? And then let's say you have 10 friends, and each of them help you a little, then you can get, like, ideally you can get a 10 times improvement. That's some example behind the map reduce. This hopefully is easier to help you to understand. Okay, this is for map reduce. Let's take a look at the concrete example. Like, here is work count. Let's say you have, if you have this file, this file is small, so you can write any kind of program to solve it. But think about if this file is like one terabyte or one petabyte. If you want to do work count on this file, then this problem becomes hard, right? Especially if you consider about four tolerates, okay? Especially because if some task gets failed, then how you handle it properly. So like I said earlier, for example, the flow of map reduce is like this. So you divide this file into multiple splits or partition. And then you run your map task. Each map task will take over one file and process on it. And output gets shuffled to the reduce phase. And then you reduce these dual aggregation and output in the HDFS or other final systems. So in this way, the huge problem becomes to the simpler problems and they get solved. Okay? So this is the idea behind that. Why these states are so popular? That's the one reason of making problems solved easily and fast and also scalable. And another perspective is about productivity or programming. Think about if you write this word count example by yourself. If you use MPI, PEGAS, or other kind of language, you need to write a lot of code. Especially if you want to handle one terabyte, one petabyte, or one zetabyte of data. But for Hadoop map reduce, these states, you can only write 63 lines of code. Okay? The same code you can run on any kind of data sets without any changes. Okay? That's why these states Hadoop is so popular for especially for the developers or engineers who is trying to do some fast, like ideal implementation and see the result, how the result come out and trying to change their solution. So, and also like I said, it can be run scalable manner and for tolerant productivity. Okay? This is the Hadoop. So, some of you may also familiar with Spark or heard of Spark, right? Now, this example, same example, becomes even simpler in Spark. How many lines now? Three. Okay? So, that's why Spark is getting momentum these days. A lot of people trying to use Spark. Okay, this is one of the reasons. See, you can easily define your file in HDFS. You can also change it to take care of other file systems. Okay? This is your Spark file. They say RDD. If you some of you heard of this concept. And then you can do like a based on this file, you get each line and then do a split. And then you just counting the occurrence of each word. And then you save the output to HDFS. So, this idea of the Spark. You get three lines. This code is all very even more productivity you can get. And then it's scalable, for tolerant, as well as high performance. Okay? The high performance, the reason of high performance is because this Hadoop is being developed or designed for like batch processing. But in many scenario, your workloads maybe iterate, you have it, you will run iteratively, right? And also interactively. So, Spark will save all this data or RDDs in the memory. I mean, so that you can repeatedly run your programs on top of it. So, let me give an example. Another more concrete example or complex example. World count is too simple. Maybe some of you say that, okay, that doesn't mean anything. Right? Now, let's take a look at log mining. So, let's say, if we are developers, we always want to do what? We have, we write our code. We see some arrows. We are trying to debug. Or you're trying to see what kind of arrows, what kind of problem happening behind the system. So, one way is like do log mining, right? So, typically when we do log mining, we can use NITIC share, we do grab, we do like card or sort kind of things to find what's the problem actually. And then in Spark, they say how we can do it. When we think of this scenario, think about your data is huge. Not only one file. You have like thousands of tens of thousands of files stored in your cluster. How do you do that efficiently? Okay? So, this is the architecture of running Spark applications on top of your cluster. You have like a driver or master node and then you have the worker nodes, like a distributed function. And then first of all, you say that, okay, where's my file? You get the file from the HTFS. And then we call it base RDD. Can load it into memory. And then just like we do grab, we do like we say, okay, for this file, we just want to select the lines and each line starts with arrow because we're trying to see what's happening for the arrow part, right? And then we call it transform RDD. And then after that, we say that arrow is just a tag, right? We actually want an arrow message. So, we do a split. We get an arrow message from the second field, let's say. And then because we want to find the reason, so it may not just run around, right? We want to repeatedly to analyze this data. So, we want to cache this data into the memory. So, the next time I run some other commands or run other some instructions, it can run in memory speed. So, we do a cache and then now we want to see how many times or how many lines related with this type of event or anything else you can define. And then we do a count. So, now let's see how this gets executed. So, then Spark will get these lines and then he's trying to submit the jobs to the cluster, okay? And it's like a MapReduce. This task will be assigned to different nodes. And each node will take a piece of data and analyze for you, okay? This is like a scalable. And then get the results back to the driver program and then you get the output, okay? You get the results. So, this is how it gets executed. This exactly like a MapReduce, but the interesting part is the next one. So, because we say cache, okay? So, this output or some earlier phase results can be cached in a memory. And the next time we want to say that how about another message or like a bar or anything else you can you can you want to you want to search or you want to analyze. So, the second line, this one will be executed in the cached dataset so that you get much faster than Hadoop, okay? So, this is like a design principle or ideas behind Spark. Let's just give some high-level results. Let's say based on Databricks numbers they say that a full-text search on Wikipedia data around 60 GB on 20 EC2 machines is only about 0.5 seconds compared if you do this search on top of disk. On top of disk is about 20 seconds, this 0.5 second. Think about if you can whether you write other programs you can do like 0.5 second full-text search on Wikipedia data. Okay, another one, the same program can scale to the 1 terabyte of data in about 5 to 7 seconds, okay? But if you do on disk is about 170 seconds. That's the idea of memory computing these days. People are trying to bring into the data middleware. Okay, so this is just some background overview so that people can understand what I'm going to talk in this outline, okay? So, I want to tell you guys what kind of challenges of making this processing parallel even faster with HPC technologies. What kind of bottlenecks are there, okay? And then I also want to introduce what kind of work we have done in our group. We called a high-performance big data project, like short for high BD. And then we introduced some kind of basic designs, like going deep, what kind of designs we can propose with high-performance networks. And also because this big cloud community, right, we want to see that what kind of opportunities we can further optimize for bigger libraries on top of cloud. So, we are going to introduce something like a cloud where we already have the processing, okay? And then in tomorrow's talk, I have another two talks. In tomorrow's one, I will introduce how to bring our DMA technology into OpenStack Swift to make it even faster. And then we also want to say that because in cloud environment, how to deploy your application as fast as possible is another big challenge. So, thanks to the OpenStack community, the heat component, we think it's very useful to help us to deploy this whole stack because you see big data everywhere, so many layers from the under this one to the like a fire system, to your MapReduce, Hadoop, your two applications, so many layers, so many dependencies, how to deploy it efficiently, right? That's a big challenge, okay? Now, let's take a look. What kind of opportunities first in these days, especially for high-performance computing domain, high-performance cluster or cloud architectures. So, whenever we go to HPC cluster or super computer these days, we see every node is multi-core, many core kind of technologies. You have very high-performance Xeon cores or Xeon fine or GPGPUs. And then also these days, you have a lot of memory or large memory nodes, one terabyte of memory nodes, also very popular, right? And also SSD, MVRUN, parallel file systems, and object storage, this kind of different storage technologies also available in the cloud and also HPC clusters. There's another two important technologies I want to introduce here. One is RDMA-enabled networking. Two examples, InfiliBand and Rocky. How many of you use this network? Okay, a few of you, okay? Another important technology is we call the Thinkaloud IO virtualization. How many of you have heard of this? Okay, a lot. Good, that's because we are cloud. Okay, so let me introduce these two just in two or three slides. In this slide, we're just going to overview or summarize what kind of communication protocols or what kind of communication mechanisms or network you can use these days in your HPC cloud or HPC cluster. Okay, from left side is the one we are very familiar, like TCP-IP, over-resonate, over-resonate driver-resonate adapter, and this is a switch, right? We start learning this from when we are students. So these have been used in many, many domains, like all the big data middlewares these days if you see everything developed on top of Sockets and then running with TCP-IP stack. But if you take a look at the right side, the right side is something, it's very exciting because it gives you much, much faster performance than the left side, which is called RDMA technology. I will introduce what is RDMA later. So basically, it doesn't use Sockets API anymore. It will use verbs, APIs, and then it will use RDMA technologies and go through the user-level communication protocol and running with InfiliBand adapter and InfiliBand switch. Okay, so this will give you a lot of benefit. What kind of benefit? Let's take a look at these slides. So I want to introduce a little bit about why RDMA is important these days and why it can give you good performance. Think about the center receive. First, before I introduce RDMA, center receive means what? Receiver side posts the receive. Center side posts send. And then the network communication for you and then in the receiver side do a match and go to your memory, right? This is like a two-sided communication. Each side needs to be involved. That's why you cannot achieve best performance, okay? Because each side needs to be involved. There are CPU resources, a lot of things it needs to be participating in the communication process. Now somebody, some smaller guys say that, okay, can we do one-sided communication? Okay, if I if I know I want I will communicate with you why I cannot directly write the data into your memory? Because in the single node you probably heard, I think all of us should know RDMA, right? RDMA means what? You write into the memory without CPU involvement. And then somebody said, okay, can we do this in a remote function? I can write to somebody which I trust or you trust me. I can directly write to your memory so that whatever you're doing, you continue, right? You don't have to be involved in the communication process. That's called one-sided communication or RDMA technology. Okay, now let me give an elimination. So let's say this is center side, this is the receiver side, okay? So the center side, I'm trying to send the data to the receiver, okay? This is the data in the memory and then the infinite card will be smartly take your data directly right into the remote memory location, okay? So this is called remote direct memory access, okay? So that in this and the hardware acknowledgement everything will be handled by the NIC, as well. So take a look at this phase. The initiated process is involved only for post-send descriptor and also pull out the complication queue from the complication event from the complication queue, right? But for the receiver side, the CPU is not involved at all, okay? So this is the major idea behind RDMA. Okay, so if we want to bring these technologies into cloud, another technology is needed which is called single-route IO fertilization, SRV, okay? The idea is that because earlier, if we are familiar with IO fertilization or networking fertilization, typically we do what? We do front-end driver, back-end driver and then we do the software-based kind of fertilization, right? That's good, but like you lose performance. The reason is because each package you need to come back and forth from different layers and then there's a lot of overhead. So in the community people trying to solve that problem by proposing single-route IO fertilization, this gives a lot of new opportunities to design high performance communication IO middleware for big data and other workloads. The basic idea is this. So now this if the uniband card or other SRV supported PCI device can present itself like multiple virtual functions in the hardware layer, okay? And then the hypervisor when we when we launch the virtual machines, and we can select each of the virtual functions and then map it directly into the guest OS. So that this guest from this guest OS perspective, it seems like that it can dedicated use the hardware directly, okay? That's why SRV give you a lot of opportunities because the performance can improve a lot, okay? So in Fitiband now these days also support SRV technologies and also other kind of high-performance Ethernet like a pancake or 4G Ethernet. So with this let's take a look if you want to build efficient cloud with SRV and the Fitiband. In Fitiband or high-performance networks like ROCKY, IWARP give you a lot of opportunities for good performance. There's a low latency, few microseconds now these days, and the high-band ones. With the Fitiband HDR card you can achieve 200 gbps these days, okay? And the fairly low CPU overhead like I said because the remote side or the receiver side the CPU is not involved in the in the receive phase, right? So that you only need like 5 to 10 percent of the CPU involvement. It gives a lot of a chance to do the overlapping of your complication, complication and IO. And then all of this like open fabric software stack like O-FIT is very popularly is very popularly used in many HPC systems and also is open source. People can use it directly. Now the question is how to build the efficient clouds with SRV in Fitiband to deliver near optimal performance or near variable performance in your cloud. That's the overall problem we want to solve in this talk. And then particularly for big-data midwars. So we want to solve some kind of challenge like this. So these days like I mentioned earlier big-data midwars by before they are developed and designed with sockets, TCP and IP kind of complication protocols. But because RDMA is showing so good features or so good performance why we cannot use it. But the challenge is what? Our design is written with sockets. It's using standard default Java libraries or something like that. How you can use verbs. Right? That's some of kind of challenges. And also we have these days not only DRAM based the complication. These days like MVRAM is also available. What kind of MVRAM based aware complication IO schemes can can help and how to use like a different kind of SSDs, SATA, PCIe, NVMe, Parallel 5C support, Optimize overlapping, different thread models and the singleization and the locality aware designs. Because in cloud a lot of different layers like virtual machines, containers, your tasks is no longer just running on top of native nodes. You're running inside some containers, some environments. You have to aware of the underlist architecture or underlist network topology. And for tolerance, residency, efficient data access placement, efficient task scheduling, fast deployment or kinds of issues. We want to see what kind of designs we can propose. Okay. So this slide showed that what kind of challenges we have we have addressed so far. So these applications, this big middleware, we have do a lot of designs with HDFS map loose spark, HBAS memory cache and extra. And then in this layer we call the resource management scheduling systems for cloud. Like OpenStack, Swift or Heat kind of systems. Definitely normal. Those components are there. And then for the communication, the LIO library, the most important thing is already made in SRV. How to utilize that? How to get better designs inside a big middleware. Locality aware communication, virtualization, data placement, kind of things. Okay. So now let's take a look at the concrete designs or concrete things we have done in our group. So this is just overview of what we have done. So first of all we take Hadoop and Spark as examples. We optimize that with RDMA technologies. We have developed RDMA for budget Spark, RDMA for Hadoop, 2.x series. We actually started with 0.20, I think. And then we keep upgrading our designs to one point series, two point series. And also some of you may find that in the beginner community there's some kind of funder distributions. Like Houghton Works, cloud error distributions. We also designed some plugin-based components so that you can integrate it with your HTTP or CDH distributions. And then some people are trying to use like a no secret base. So we take the HBase as an example application framework. So we bring RDMA into it. And like for key value stores, these days memory cache D or like Redis, ROX, DB kind of things are also very popular. We bring RDMA into memory cache D, see what kind of benefits people can achieve from the RDMA protocol. So we have a large user base right now. We have like 225 organizations from 30 countries and more than 21,000 downloads from our project site. This solution can run with InfinityBand as well as Rocky. But if you only have v-cell, it can also run as well. This is like a super set of the default version. Okay, for Hadoop 2.x distribution, the latest version we support is Hadoop 2.7.3. Okay, this is almost the latest one. This day's 2.8 is coming. So we are working on merging to it. We will make a new release in the next several weeks maybe. So we have very good designs like RDMA-based HDFS, RDMA-based MapReduce, RDMA-based RPC kind of things. Okay, it can also run MapReduce directly on top of Luster or other parallel five systems without HDFS involvement. We can also run these workloads on top of burst buffer-based design. For example, in our group, we develop some burst buffer on top of memory cache so that you can run your workloads on top of burst buffer layer. Okay? So this is like a different modes we support in our Hadoop library. So for HDFS side, we can support purely like in memory mode. We are trying to help you get in memory speed for IO. Okay? But it may lose some foretolerance. But in some cases, if you want to have performance, then this may be a one-mode you can try. And then in heterogeneous mode like HHH, we call it Triple H. And then with Luster Integrated Mode, Triple HL. And then with burst buffer, Triple HL, BB. And then for MapReduce, we have two modes as well. You run with HDFS or you run with Luster. Another thing is for HPC clusters, we develop some kind of tools to integrate Hadoop with your scheduler, HPC cluster schedulers, like SNRN or PBS or Tokyo. Okay, for Spark. Similarly, for Spark, we have developed some designs inside it. I will give some deeper overview of what we've done in the next episode. And then this latest version 2.1.0 of Spark, we support that with RDMA. And then these packages available in our website, they're also available in supercomputers or cloud. For example, SDSC Commit. If you have XCAD account, you should be able to log in to Commit and run our packages. And also we have developed appliance which available in Committing Cloud. This is OpenStack based cloud supported by NSF. Okay. Now let's take a look at the best designs, what kind of challenges we've served through RDMA. So first of all in HTFS. Because HTFS is the major component widely used by a lot of big data midwills like Spark, Flink, or HPACE, everything. Then we take a look at what kind of communication requirements or IO requirements actually HTFS need. Okay. So then we do a lot of analysis. We see that the most communication IO intensive part is what? It's not read. Because read is a locality. Right? Actually it's right. Because when we do write, you need to go because of the fortress report, you have to do replication. When you do replication, your data has to be moved to different nodes. Okay. During that phase, you need to consume network. You need to use a lot of high speed IO devices as well. Right? So that's why we bring RDMA first into the HTFS write pass and also the replication pass. Okay. That's we call RDMA based HTFS write and the replication. Okay. Not only that, because if you just improve the communication part, the bottleneck may still exist on the IO pass. Right? Because you have to load that from disk and then send to network. If the bottleneck in the disk exists, even you have high-profile networks, this still doesn't help. You have to improve your IO pass as well. So that's why we bring some other designs to efficiently utilize heterogeneous story device like SSD, run disk, and VRUN kind of things. When also we do hybrid replication kind of technologies. We have a paper in CIS grid 2015. If you have interest, please feel take a look. That's for HTFS. And for MapReduce, earlier I give example of how MapReduce works. Right? Just record that. Where is complication? Where is complication? Where you need complication? Shuffle, right? You have some output from map face. You want to transfer this map output to the reduce tasks. Then how you transfer, how you move those data. You have to use network. That's also complication intensive phase. Okay? So that's why we first of all we design already may base shuffle. Not only that, similarly, if you need to get data every time from disk and then do the shuffle, it's still slow. So that's why we design some prefetching and caching of your map output. And then we do memory merge rather than on disk merge. Okay? So with all these kind of designs, we are trying to improve your application. We, you know, run in a much faster manner. And another goal is don't change your application. Everything we done is in the middle, in the library, so that your application can run transparently. That's the major goal. Now let's take a look at some of the performance, example performance numbers. So this is like a Apache Hadoop random writer Terrigin on top of if Infinity Band EDR 100 Gbps. We see that around thick 3x or 4x performance improvement. Next one is sort and Terrasort. Maximally, we can achieve like around 60% improvement. Okay. Now let's take a look another one next to take a look at Spark. In Spark, similarly, even though Spark is trying to bring your data processing in memory, but still when the wide dependency happens, you still need to bring your data from different nodes. That phase, we still call shuffle. We still time consuming, communication intensive and IO intensive. So what we bring is in default Spark, you have like NIO or native based components. We bring the already made based parking designs under in the Spark core. So with this, we are able to see around 80% improvement for RDD based operations, this Gruby and Soluby. This basic RDD operation. Very simple micro benchmark. And then for next one is graph benchmark, which page rank. So we run up to 1,536 cores, full subscription on 64 worker nodes on SD's comment. This is another, this is a supercomputer. We can achieve around 40% performance improvement. Okay. So that's like, just like a back benchmark or benchmark. So these days, people also trying to run deep learning on top of Spark. One of the example is Big deal, is developed at Intel. So basically they are running parameter server kind of architecture. Okay. They run their deep learning models on top of Spark. And then the data will get a shuffle through the back end of Spark. Okay. There are a lot of interesting features for Big deal. If you have interest, please feel free to go to this link. Take a look at their features and the user guide. We want to show some earlier numbers we observed. This is just for one EPO competition phase. We see that with RDMA, this VGG training model on top of C410 dataset. We are able to see like around full X performance improvement. That's very exciting for deep learning workloads. Just bring RDMA into your stack. Okay. So that's for native environment or bare metal environment. Now let's take a look how to run RDMA Hadoop on top of cloud. Lot of challenges here also, right? What kind of performance characteristics of your native, of your RDMA pass on top of SRV channels? What kind of locality of where designs you can bring? And how to detect the cloud topology and how to design some virtualization of where policies in Hadoop. And how to bring these things into the Hadoop stack. So this is the dimensions we actually can try to learn or try to design to explore the designs to utilize HPC cloud networking technologies. You have different network the kitchen for example, QDR, 4DG, DSTACE or 4DG, ROCKY, FDR, EDR. From this dimension you have like different protocols. You can run TCPIP, you can run IP over IP, you can run RC, UD or hybrid, whatever. And then you can run with SRV, you can run with a fair amount of environment. And then how to design your stack to consider all these kind of different generation of architectures and the protocols. So we just give you one example because RPC is being used for in many big middlewares, for example HBASE, right? So this is the default pass you can run, you can run default HBASE or RPC or top of TCPIP over UDP or TCP protocol. You can also run with IPOB with RC or UD. And then like I said earlier, we designed RDMA based protocol for big middlewares. We want to try that. Can this design take advantage of multi-channel or multi-protocols? For example, like RC, UD or hybrid, okay? So we designed some kind of protocols we can hybrid RC and UD protocol because RC can support RDMA but UD cannot. But UD can help you to do that reduce memory utilization because you only need one peer to complicated with one EP to communicate with all the peers. Now let's take a look at performance. If you take a look at this, the red one is the IB with RC. So we see that that's the best performance we can achieve. And then this one for HBASE. The maximum we see around 2.6X performance improvement, okay, with the best protocol selection. And for Hadoop similarly, for different components for HDFS, we designed block management to improve for tolerance. For yarn, we designed some container allocation policy to reduce network traffic, map reduce Hadoop components. For Hadoop common, we designed some topology detection modular to automatically detect your topology of your cluster. We integrate these designs into the RDMA Hadoop. If you want to take a look at some numbers, we see that for color burst and self-joining applications, maximally we can reduce the screen time around 55%. Okay, that's some kind of designs we provide. Now let's see how to do the fast provisioning. So like I said, open-start heat can help this phase. So this is the heat architecture. We want to get some physical node. And then we launch VMs. In the launch VMs, we need to handle a lot of things. For example, SRV channel, setup, network setting, image management, launch VMs, and the month global storage. These kind of requirements are there, okay? But with heat, we actually can design all these things in a template manner, right? We can write these things in a heat configuration file and the heat can take over all these kind of things like load VM config, allocate the ports, allocate the floating IP, generate a key, blah, blah, blah. And then it will develop the whole Hadoop stack and the Spark stack for you. This is just a quick demo on top of OpenStack. So this can, we first create a list and then with this one important thing is we need the infinite support. Infinite support here. Because we run with RDMA, right? And then this stack with heat. So we need to select what is the list, what is the rest of the key, and how many total physical nodes you need, how many total VMs you need, and then how many virtual CPUs for each VM. And yes, these kind of things. And then you click launch and then this whole cluster will deploy for you automatically. And then this is the phase, get launched directly. And then this is the overview of the stack details. And then in output, actually, there's the important thing is that we were directly automatically assigned a floating IP on your master node so that you can access this cluster through this public IP. Wherever, whatever terminals or SSH, through SSH. So we have a lot of going plans on this project. We are trying to upgrade to latest version of Hadoop Spark. We are also trying to support streaming, do automatic tuning, Impala, Swift, and deep learning with that. With this, let me conclude this talk. So first of all, I'm trying to give an overview of what kind of challenges of accelerating big data processing with HPC and the cloud technologies. We present a lot of designs or opportunities to take advantage of our RDMA and SRV. I mean, it may be hard to get all this information in just the 40 minutes talk, but feel free to come to me and we can discuss it offline. And we have a lot of materials in the website as well, so please feel free to take a look. All the results looks promising. And the one important thing is that this kind of stack is very complicated. If you want to deploy it by yourself, it's very hard, but with open start heat, you're able to deploy it easily, especially we share this template on the website. You can easily use that and then deploy it. And then all these things, we are trying to enable big data and cloud computing community to take advantage of modern HPC technologies to carry out their analytics with high performance manner. So I have two more talks tomorrow at 4.30. I will introduce what we have done inside OpenStack Swift to bring RDMA into it. Second thing is that the day after tomorrow, we will introduce how to build efficient HPC clouds with MPI, OpenStack, over SRV and RDMA and also how to support migration. So I'll just acknowledge the sponsors. So we are seeking for more opportunities from you guys. And then this is the acknowledgement to the personnel from our group. Thank you a lot.