 Hello, everybody. Welcome to our session. And this is a session co-presented by me and my co-presenter, this we are talking about model region cluster, which is one swift cluster consists of several regions, two or more regions. And this is the work. This is a cool work done by INSPUR. INSPUR is from INSPUR and I'm from OStorage. I'm software engineer from OStorage. And OStorage is a startup company that provides object storage for enterprise users. And this work is also co-operated with 99 Cloud. 99 Cloud is an open-stair company in China. And due to some visa problems, the guy from 99 Cloud cannot be here. But we will present the work by, okay, include what is done by 99 Cloud. And Yipong will make a short introduction for his team and their work in INSPUR. Okay, well, good afternoon, everyone. I am very pleased to have this opportunity to communicate with you. My name is Hu Yipong, coming from China. I work for INSPUR. Maybe you don't know this company. So let me give you a brief introduce about INSPUR. Actually, it is a big company in China. It is a leading provider of cloud computing solution. We are able to transfer open traditional data center to the most advanced cloud computing data center. Actually, it has risen up to international ranks. According to Gardner, our cell has achieved number five in the world, number one in China. INSPUR focuses on research and development. My team is mainly responsible for developing a cloud computing platform. Actually, we have developed a cloud computing platform, which was called in Cloud OS based on OpenStack. In fact, INSPUR is a very large IT company in China. There are servers. There are services in China. Maybe you know that Baidu, Ali and Tencent use many INSPUR servers and storage systems. So back to SWIFT. First of all, let me make a very short introduction of SWIFT, which was also introduced in the session before. SWIFT was born at the very beginning of OpenStack. Now OpenStack has tens and hundreds of projects, but at the very beginning, there are only two projects. One is Nova and the other is SWIFT. SWIFT provides object storage services, and it can run on very large scale. In direct space, there are hundreds of petabytes of data in SWIFT clusters. SWIFT can also provide very high durability, very high scalability, and very high availability. So it is a very good storage system, very good storage system and can run at a very large scale. SWIFT provides object storage services. So what is object storage? It is a little confusing. For example, we say that SAFE is based on Redos. Redos is an object storage system, and we also say that SAFE provides object storage service. So in other contexts, we also hear object storage here, object storage there, but the meaning is a little different. When we talk about SWIFT and when we talk about SAFE provides object storage service, we mean that this object storage is Amazon S3-like storage. Okay, two key features of this kind of storage, which is the first of all, data stored in buckets or containers. In Amazon S3, we call it buckets, but in SWIFT, we call it containers. We don't use directories in file system. In file system, we use directories, but not in object storage. The other main feature of this kind of storage is that it talks in the web language, the internet language, which is HTTP. So we access to data, we read data, write data, via HTTP course, but not in, for example, in file system, we use write, read these functions. We do not do it in object storage. In object storage, we can visit storage system via HTTP. So millions of cell phones, millions of desktop PCs can access directly to our storage system via internet. So this is awesome. This is a very good feature that storage system can provide very high concurrency to our clients. So in SWIFT, let's see how we can access data. First of all, in SWIFT, we use containers, but we also have account. Account is related to tenant. So when you are a user of a tenant, for example, tenant 1 and the administrator, the administrator can configure the SWIFT system that for tenant A, they use, for the users in tenant A, they use account A. So when you log in, you can see container 1, container 2, and container 3. You do not see account, because it is directly related to tenant. And there is another container called container 1 and under account B, container 1, there is a container named container 1 in account A, and there is another container named also container 1 in account B. There are two containers. The storage space is isolated logically. So when I use a log, when I use a visit, when I use access to SWIFT, he can only see containers, but do not see account, do not see account. And if we, in fact, in the API, we can see it. If a user in a, if a user of tenant A and he access to this storage system and he can list all the containers, all the containers his tenant can access to, via this HTTP core. So I use core command here, and it was written container 1, container 2, and container 3. Okay, in the body of the response, in the body of the response. And if we make this core, he can see the two objects under container 1, in container 1, the two objects in container 1. However, if he use another URL, for example, account A, container 4, account A, container 4, or get 404 resources not found. This is the basic way to access data in SWIFT. We use GET to read the data, and we use PUT to upload objects or create containers. And we can also use POST to modify the metadata of a container or an object. We can use delete, use the delete action to delete objects or containers. So this is the best way to access to SWIFT storage. And how SWIFT implement object storage under the hood. In SWIFT, we have some services such as proxy servers and three storage servers. Three storage services called account server, container server, and object server. The nodes run proxy server, we call it proxy node. And the node run account server, container server, and object server, we call it storage server. A storage node, sorry, storage node. The data is stored on storage nodes. And proxy server or proxy node provide HTTP APIs for the users and make authentication to verify if this user can be permitted to access to the data. It is also done on proxy nodes. It can be integrated with Keystone, which is also another project in OpenStack. Now, what is MRC, multi-region cluster of SWIFT? Multi-region cluster of SWIFT is a cluster consisting of more than one region. Usually, the regions are located in different data centers, in different data centers, or even in different cities. We will show our experiment in Laung Chao Cloud Lab that the test result of multi-region deployment of SWIFT. One region in Zhenzhou and another region is in Jilin. There are two cities in China and the distance between them are more than 400 kilometers. So there are two regions, one and two. And the regions are connected via what we call replication network. The requests, user requests, HTTP requests to read and write data come into different regions from different users and to write and read data. And the data are synchronized. Data between two regions, between different regions are synchronized via the replication network. One advantage of this MRC deployment or multi-region cluster of SWIFT is that the data are stored with multiple replicas. So when one region is done, for example, a data center is done due to power loss or some disaster, for example earthquake, one region is done, but we can still access our data, we can still read them and write them. And so this provides very high reliability and durability for our data. So for a administrator or for a deployer, how can he or she to implement or deploy a multi-region cluster with SWIFT? It is very easy. The key point is that when we build the rings, so ring is the data that is implementation of consistent hash. So SWIFT use ring to distribute data across the devices in a cluster. So what we need to do is to tell SWIFT that we have devices in multi regions when we build the ring. So these are the commands when we build rings. We can see that some devices are belong to region one that is indicated by R1Z1, R1Z2, and R1Z3. And some devices belong to region two that is indicated by R2Z1, R2Z2, for example. And then SWIFT will know how to distribute data. He will distribute data as unique as possible. For example, when we store data with four replicas, one piece of data, four replicas, he will store two replicas in region one and two replicas in region two as unique as possible. So the distributed data is distributed as possible. But in real world, sometimes the user does not need to store all data across different regions because multi region means cost. So single region means less cost. So in real world, some users do not need to store all data across different regions, across different data centers. So the deployer or the administrator can provide different storage policies in one MRC, in one cluster with different storage policies. It is very similar to some storage system that call it storage ports. So this storage policy in SWIFT is similar to storage ports in other storage systems. For example, in this cluster, we provide three different storage policies. Some data are stored in two regions with four replicas. And there is another, you can see, there is another policy to store data with only three replicas in one region. Three replicas in one region. There is yet another policy that store data on SSDs to provide better performance. The user can choose policies from them, but we need to be careful that these policies, we need to be careful that these policies are only applicable to object data. So object data, we can see that here. We can see that in SWIFT, except for object data, we also have account data, account information, and container data, which provide the container or store container information. But storage policies can only apply to object data. So in multi-region deployment, in multi-region cluster of SWIFT, we always store container information and account information across different regions. This is about multi-region cluster. And in multi-region cluster, in SWIFT, we use the current protocol with write data. That means when you store data with three replicas, when you write data into SWIFT, it will confirm you that the write is successful until two of them are successful written into the storage nodes. And when we use four replicas, the SWIFT will confirm our writes when three of them are successfully written to the storage nodes. So the problem is in MRCs, we need to write one of the three replicas, two of the three replicas in region one and one of the replicas in region two because SWIFT needs to write three replicas of the four successfully then to confirm the write. So when a request coming to region one from the proxy node in region one, it will confirm the write successfully. It will confirm the write successfully when two of the replicas write in the storage nodes in region one and at least one of the two replicas successfully write to the storage nodes in region two. So they will write across regions. They'll perform write across regions. And usually the connection between data centers or regions or even different cities are very slower than the connection between different nodes in one region, apparently. Okay? Especially when these data centers are in different cities, the connection are even more slower and the delay are longer, the speed are slower. So it will be the bottleneck, the replication network will be the bottleneck of this storage system. So how to solve this problem? SWIFT provides affinity. Affinity includes write affinity and read affinity. This affinity is configured in the proxy nodes. For example, we can write configuration in the proxy server.conf for the proxy node here. And after that, when these proxy nodes receive a request to read the data, he will pick the replicas locally, pick one from region one and will not visit region two, will not provide data from region two. So this is write affinity. Without read affinity, SWIFT will pick a replica randomly. So maybe this replica will come from region one and maybe this replica will come from region two. But with read affinity, the proxy server will always pick the replica from his local region. If it fails, due to sometimes the data is corrupted, sometimes the device is broken, or some other reasons, when it failed to pick replica from his local region, from its local region, then he will read across, read the data across the network between regions. So it will still provide very high availability for our data. And for writes, write affinity makes that if the proxy node here received a write request, write request or, for example, a put to put data into this SWIFT cluster, he will write data locally and synchronize the data across the replication network to the other region. So this is very interesting because we said that for a write, because of program protocol, for a write with four replicas, SWIFT will confirm the write successfully until three of the replicas are successfully write to the storage nodes. But here we only have two replicas in one region, in region one. So how the write affinity works, because it writes all of the four replicas into region one and move two of them to region two and synchronized. So four of them will also store four replicas of the data when you enable write affinity. So you do not worry about the data duration when you enable write affinity. This affinity will improve the performance very much and Yupeng will introduce our experiment in Lanzhou Cloud Computing Laboratory. Okay, well, let's look at it. Do you need to see this? Oh, no. Oh, okay. Let's look at our test environment. Here is the development topology. We have a genetic center in Genant City in the Zhengzhou City. The distance between two cities is more than 400 kilometers far. There are two servers in Genant. Each server has eight disks to make up a zone. In Zhengzhou, there are four servers. Every two servers have eight disks to make up a zone. There are the same number of zones in Genant City and the Zhengzhou City to make four replicas. Every server has two nicks. One provides test networks for clients. The other one provides communication between proxy server and storage server. And also provides the replication network which is connected by VPN across region to sync data in the background. Okay, please next. Thank you. Let's look at the experimental results. We set object size to 20 kilobytes by using 10, 20, 40, 80, 250 concurrent read and concurrent write and 150, 250 read to test with affinity and not affinity. We get maximum and average delay like the data is shown on the picture. And this is the result of read operation. And the next is the result of the write operation. From the chart, we can see that the swift with affinity enabled can effectively improve read and write performance on the MRC, especially read performance. That's all. Thank you for your time. I have more to explain. So you can see that with affinity enabled, it improved the performance extremely. But it seems very good. But actually, in fact, we need to take care of some things. We need to take care of something because why Swift can run on a very large scale? Why Swift can deploy with multi regions easily? The important thing we should know is the eventual consistency. Swift is the eventual consistency storage system, not immediate consistency. We should be careful with this, especially when we do multi-region cluster multi-region deployment. Because if there is only one region, the consistency will achieve very quickly. But if we have two or more regions and they are connected with one or even with the internet, with the VPN over the internet, the connection between different regions will be slow, and also the consistency, also Swift will spend more time to achieve consistency. So if we enable write affinity, as we talked before, Swift will write four replicas into one region and move two of them in the background asynchronously to the other region. So it will spend some time to move the data. This will make inconsistency, inconsistency to modify our object. Because in object storage systems, we do not have random writes, so to modify an object, it means to upload an object and overwrite the existing one. So when we overwrite the objects, we put data into, for example, there are proxy nodes in region one. The data comes into region one, and with write affinity, Swift writes four of them, follow the replicas in region one, and to move two of them in the background. But now there is read coming to region two. Here we get the dirty data, the older variant. So this should be careful. Especially when we delete data, we delete objects. We delete objects in one region, and a write comes into another region, and the client will get data, which is deleted. Why? Because eventually consistency. This will even happen when the writes and the reads come into one region. This is weird, but it is true, because why Swift can implement write affinity? Because in Swift, for example, we store one piece of data into this cluster with four replicas, and the Swift, according to the ring, he will pick four devices. We say four disks to store the four replicas. But actually, Swift will also pick up four. The number is equal to the number of replicas. We'll select four hand-off devices, which means by the name hand-off, it means that if one of the four devices we mentioned before failed, he will write the data to the hand-off device. And there are four primary devices to store data and four hand-off devices to store the data when the primary device failed. So there are four hand-off devices in one region, respectively. So when the write affinity enabled, actually Swift write data like this. He writes two replicas into two primary devices in region one, and write two replicas into the hand-off devices in region one. When you read, they also perform like this. I read data from devices. If failed, I read data from hand-off devices. So when we delete data, and when we write data into this cluster and read it immediately, we can get the data from two regions. We can also get the data. Of course, in region two, if we cannot find the data locally, Swift will read the data remotely from region one and provide the data to the user. And Swift move the data in the background. But now, a delete request comes into region one, and it will delete the data in the four primary replicas. Please be careful here, because he will delete the data in the four primary replicas. He will not delete the data in the hand-off devices. Swift delete data in hand-off devices rely on the replication services. Replication services will find, okay, this data on hand-off devices should be deleted. So the replication services in the background will delete the data, but not by the request, but not by the user request coming to this system. So you can see when you write data into the multi-region cluster with the right affinity and you delete them immediately, there are four replicas left on the hand-off devices in region one. So when you perform read now, in spite of from region one or region two, you can always get the dirty data which should be deleted, but not delete immediately. So it is weird that we perform... It is sometimes weird that when we enable write affinity, we put data from the region one, we put request to write a data, write an object into region one, and we then immediately delete this object. And we read again. We delete, put successfully. Delete successfully. And we read. We expected to get 404 resources not formed to make sure this data is deleted. This object is deleted. But we get the data. We read it successfully. And we wait for about, for example, for a minute or several seconds. And we read again. We get 404. So this should be careful because the eventually consistency that it will... Sometimes it will act like this. Like this. And we should also point that write affinity, read affinity can affect all of the operations, all of the read operations to account to container to object. But write affinity only affect object put. So if you want to modify, if you want to delete, if you want to delete an object, it will not affect by write affinity as we mentioned before. And if you want to update the metadata of object, it was also not affected by write affinity. So you will spend more time than you expected to modify the metadata of object. So this should be a pay attention to when you enable affinity. So sometimes we do not enable write affinity. Sometimes we do not enable write affinity to a... But the read affinity is always what you want. So this is what we want to present about multi-register swift class. Thank you. Maybe we'll have two minutes. Any questions? One question. No question? Good. So with the copies going to the handoff locations, is it the object replicator process that's responsible for delivering those to the second region then? When it makes its round, the swift object replicator, is that the process that does the eventual move to the remote region? So the question is, when with multi-register, the replicator is right to the other region? If you've got the affinity set up and it all goes into one region, two of them are the handoff locations. Is that handled the same as like if you've got one region and a drive failure where it goes to the handoff location and then it's the object replicator process that eventually crawls the directory structure and says, oh, you're in the handoff, you need to move to region two? Yep. Okay, so it's that process doing the relocation. Thank you. You're welcome.