 Okay. Hello, everyone. So my name is Xiao Yilu. I come from the Ohio State University. So my co-presenter, Dr. DK Panda, because of some personal constraints at the last minute, so he cannot come. So I will cover the talk, our talk. So today I'm going to present our recent work on accelerating up open stacks with our DMA technology for building efficient HPC clouds. So this outline of today's talk, I will first briefly introduce what kind of background or why we do this work, and then what kind of problems we are trying to solve, and what kind of design we are proposing, and how much performance gains actually we can achieve for different benchmarks and applications. And then we want to conclude this talk later. So many of you may know that these days a lot of exciting and high-performance technologies, hardware technologies are actually being used in many, many cloud instances. For example, the many core technologies, accelerators, Intel, Xeon, Vine, as well as a large memory nodes, new memory like NVN, those kind of things. And another exciting thing is the high-performance networks, such as the Phillyband and the Rocky. So this type of networks have been heavily used in the HPC cluster. Now these days the HPC cloud is also trying to utilize these new networks. Another thing is like a single-route virtualization. This has been proposed several years, so it can be used to high-performance, scalable, utilize your PCI devices in different virtual machines. So for different storages, for example, like SSD and VME SSD, and then like SWIFT, this kind of all-powerful system, like distributed file system, SAF, this kind of different storage services, also being widely used in the cloud. So to summary what kind of questions or what kind of problems we are trying to address in this talk is that we see there are two different things. One from the provider side, you see a lot of interesting networking available in the cloud. Phillyband and Rocky, this gives you a very good performance, like the latest HDR Phillyband card can give you 200 gbps per second band-wise, and the next and one microsecond point-to-point communication over two nodes. And also another exciting thing is Phillyband or Rocky gives you another interesting or important feature called remote direct memory access. I believe most of you know about DMA. DMA is like access to your memory by password CPU, right? DMA means that you can extend the DMA concept to the remote side. You can directly access the remote node memory without involving the CPU in the remote side. So you can directly write data or read data from remote memory. That's a very exciting feature so that you can achieve truly one-sided communication compared to send-to-receive, okay? That's why it gives you a lot of performance benefit. And on the other hand, if you take a look at cloud's storage systems, take example like Swift, we see that it gives you a lot of big capacities. However, the performance scalability is still a big problem. Let's take a look at what kind of design or architecture of Swift looks like. So Swift is like a distributed cloud-based object storage surface. It can be deployed as part of the OpenStack installation. It can also be deployed as a standalone component to store your data. So user requests can come from anywhere worldwide. So it's through the HTTP request, something like that. And then the request may come to the proxy server. And then there's a multiple proxy server being deployed to handle your request. And then the data actually is saved or served from the multiple object servers. And then the ring component is actually used to handle the metadata. So Swift can be used to handle a lot of different use cases. For example, you can run big data applications. You can do software data backup. You can also store the first machine or containers images. So this is good, but the problem is that the default Swift architecture or default Swift implementation design still take advantage of the traditional TCP IP or socket-based communication, which is not high-performance. The reason is because I mentioned earlier, the RDMA technology can do much faster than that. So with this, we're actually trying to address some kind of challenge like this. So we see there are two problems in the Swift architecture. The first one is the proxy server may be a bottleneck because a lot of requests should come to the proxy server first. The second thing is that all objects upload or download operations are actually based on TCP IP or socket-based communication with network intensive. So the question is that can an RDMA-based approach benefit this design? And then how can high-performance and scalable RDMA-based communication schemes be designed to improve the overall Swift performance? And also we hope we don't want to change the Swift API. So if you take a look at the proposed design, we actually did two different types of designs. The first one we called client oblivious designs. There's no change required for the client side. The other one is we also proposed metadata server-based design which can direct through the communication between the client and object servers to bypass the proxy server. And like I said earlier, all the socket-based communication we replaced it with RDMA-based protocol. RDMA typically implemented with verb APIs, but traditionally, maybe you can implement with sockets. So that's why we utilized native verbs APIs to reimplement the complicated substrate of open-stock Swift. And some other things like the provider-max server-lapping between communication and IO, we also have done some things. Let's first take a look at the first design, the client oblivious designs. So for this design, there's no change required on the client side. If you see the client side, the blue lines is actually the default pass. It still goes through the TCBIB communication protocol. But if you see the red ones, the red ones is the communication pass between proxy server to the object storage servers. So there we actually changed to the RDMA-based communication protocol. But at this design, even though we improved the backend data access through the RDMA, however, the proxy server part, the blue lines part, still the bottleneck. And then later on, we are thinking that in some cases, because for example, if we build some private clouds or some clouds under your control, and in that case, if the performance is an issue for you, then we probably can actually bypass the proxy server, right? So that you can directly communicate between the client and the object storage servers. So in this way, we actually can improve the performance a lot. So in this design, we say that this proxy server is no longer a bottleneck. We actually will present this paper in next week in SPAN, CC Grid Conference, there's more detailed designs, because there's only 10 minutes talk, I want to go into more details. I'm happy to share some of the performance observations we have. So first of all, we actually did some breakdown, like we break down the time spent in default swift pass and the design one we proposed and design two we proposed. Firstly, we see that for design two because you bypass the proxy server, so your communication time can significantly reduced by almost 4x compared with default scheme. There are two contributions. One is RMA-based protocol. The second one is bypassing the proxy server. That's for design two. If you see that for the put operation, it's almost 4x. And then for gate operation, it's almost 3x. Okay? This is the breakdown for the put gate. If you take a look at the off-road evaluation, it's the put gate operations with different message size. We see that, like for very large optic size, like 5GB, we can actually reduce the latency, put the latency around 50%, and reduce the gate latency almost 70%. So this is like a basic operations, but sometimes we want to run some into big data workloads on top of swift. If we can do that, that will be fantastic because these days a lot of requirements from the big data analytics side. So this actually, we compare three things. First, we run Hadoop World Count on top of HDFS. Second bar is like a Swift FS, which can run HDFS workload on top of Swift directly. But that's default implementation. It still goes through the default Swift architecture. And then the gray one is our proposed Swift X. Swift is already made and bypassing the proxy server. What we see that compared with the Swift FS, we see maximally we can achieve almost 83% improvement for Hadoop World Clouds. And then compared with HDFS, we can achieve around 60% to 70% improvement. But here, no doubt for HDFS, the data needs to be copied from the Swift. The reason is because if you deploy another HDFS cluster, the data actually will be in Swift. You need to copy and copy out. So with this, let me quickly conclude the lightning talk. First of all, what we did is we want to share that. We analyzed with the architecture and identified the major bottlenecks in the Swift default architecture. And then we proposed two designs to accelerate Swift performance to improve the scalability. So the two designs will be applicable for different kind of scenarios. For one case, if you don't want to change your application, you can use the first design, like a client-appearance design. If you want to, if you're more care about performance, actually you can use a second design. So we designed some RDMA-based communication framework for Swift, and the evaluation results looks very promising. So we are trying to evaluate with more application scenarios and support a three-and-posting APIs in the future. And this design will probably be available soon in our website. So tomorrow, I will have another talk about HPC Clouds with MPI and OpenStack and SRV and how to support migration results based in really many clusters. So let me thank all the sponsors. These are our group members who work hard with all these things. And then thank you. I'm happy to take short questions. Okay, go ahead. So we are not... Actually, we say bypass the server kind of things, but we are not totally bypassed that. The actual data pass, the performance critical to the pass, actually we bypassed that. But we still need some data exchange with the ring and the focus servers. So I'm happy to talk with you later because they are saying we need to stop. Yeah, thank you.