 Hello, and welcome to the presentation. We'll be talking about how some development that has been happening in Swift over the past few years and continuing about how we can extend Swift to support all the third-party storage systems. My name is Luis Pabon and I work in Red Hat Storage, and I'm here with my colleagues. We've got Prashanth Pai from Red Hat Storage and Pete Sicef also from Red Hat OpenStack. So first, if you're new to the Swift architecture, let me just take you through it a little bit because we're going to start talking a little bit about each one of these servers. So the way it works, we've got four servers in the Swift architecture. One is the proxy server, which is the one listening in to requests coming in from the outside. And this job is then to define and decide what other server it's going to talk to. We've got the account server, which really is a metadata server, and this job is to have a collection of containers. So the account in Swift, you can call it in OpenStack the tenant server or the project is what it translates to. Then we have the containers, which is another metadata server, and this job is to have to be a collection of all the objects located inside that container. And if you're from S3, you can think of a container as a bucket. And then we have the object server. The object server's job is to actually take the data coming from the proxy, being sent by the customer or the user, and actually place that data on some system, in some file system, and store it. So this architecture works really well for Swift. But what if we want to extend it to other third-party systems? What can we do? So back in 2013, a new class was added to the Swift project called diskfile. Now, don't get confused. I, myself, the first time I heard it when I was new to Swift, I was thinking, why would a class called diskfile in an object store? But anyway, it's called diskfile. And what it does is you are able to create your own diskfile class and plug into the object server. And with that, you can decide how is it that you're going to store data on the file system itself. And this has been around since 2013, so it's been almost two years. It's been around. So that's great. It was excellent. A lot of companies, a few companies, started using it. But then we said, OK, that means that the entire Swift cluster right now has to use the same diskfile. So we cannot segregate. So maybe we want to put some data objects in one type of storage system. So we need something else. So back in Juneau, there was a new technology added into Swift called storage policies. And what that does is decide, depending on the container, it allows segregation of the objects. For example, if you want to put an object in two times the replication or three times replication, you can have objects on faster storage systems with SSDs. So depending on the container, you can decide where actually to place the data. And this works really well with the party systems because now you have diskfile and you have storage policies. And now you can decide per container where your objects are going to go and lie in. So this has been available since Juneau. So we're going to take an example of a storage policy that uses diskfile. And this one's called Swift on file. So if you were at my other talk with IBM on Monday, this will seem a little familiar to you. So Swift on file, what it does is a storage policy with a plugin for diskfile. Then what it does is that it takes objects and places them directly on a clustered file system. And what it does is it takes the maps URL and it sees the map, the URL path and takes a verbatim creating the directories that needs to do on the clustered file system. So for example, here's an object being sent by a user with account container and object. And Swift, what it does today, it creates it. It actually lays it on the file system as you can see here, which is kind of complicated to see. But what Swift on file does is that it takes that path and actually lays it down on the file system as you can see here. So it's very easy to find on the clustered file system. Why would you want to do that? What is the benefit of this? Well, let's take an example. And again, this comes from Monday's presentation. Here we have an example of analytics. We take and we ingest into our clustered file system using a Swift API. Then we have an event that happens where we have Hadoop come up and run the queries right on the clustered file system using a Hadoop connector. So there's no more copying and copying on to Swift. You're running right on the same file system. Once the results happen, there will be wire available on the file system to be used from GETS all over Swift. Let's take a look at another example. In here we have what all of us do today. We take pictures with our phones. We post them over an object interface into some cloud system. And in this model, what we have here is that the pictures come down where Swift interface. They go into a clustered file system with a unified namespace. And then they are accessible over, for example, new VMs that then can transcode that information into other bit rates. So, for example, if you had, I don't know, a 4K camera and you take a movie, then you post the movie over Swift, then you could have many VMs act on that movie just by accessing over the scale of file system. Okay. So that's great. But we have two other servers that we need to talk about if you remember the architecture. We talked about the object server that's working very well. What about the container and account server? What if you have other methods of representing what objects you have in your cluster file system and how you represent the collection of objects in your file system? So what we have in development is something called Plugable Backends. And with this, Pete is going to talk about this. Okay. All right. So Plugable Backends are an API that existed for a while. And it's probably the oldest, actually, still open review. And in a sense of an experiment of having a code which is part of Swift, but it's not actually in Swift. But I don't want to say that this is a fork. It's ultimate goal here is to get it into Swift. So it was intended to support Luz's work in GlusterFS. And it was a companion to Peter Portenta's disk file that Luz just explained. And we kind of worked on it together. And you can see that we are not very inventive at naming. But it is actually what it is. It's a backend API. So that's why it's called that way. Unfortunately, it was a little bit more challenging. I'm not saying it's not to say that disk file was trivial or something. Not at all. But it was different because there was way more methods in that class that had to be abstracted and redefined and stuff like that. So it took a while. So it had this long history. And while this was going on, we had some changes in Swift occurring. In particular, storage policies, I think, were the most learning experience in PlugBulbacans because they introduced some changes to those classes. And it became abundantly clear that the initial idea to create a stable API that we would publish somewhere once and then just have everyone plug into it is probably not going to happen just because Swift is a living thing and that's the nature of it. So it's still an API, still an important one, but it's not going to be custom stolen forever because you will see, I'll just show a picture. Here it is. That's where it goes. And PBE is PlugBulbacan, a acronym for it. And this is equivalent of a picture everyone's seen before, but just all the extra stuff removed. So you just see the proxy in the account containers. It plugs in the same place where disk file plugs right now, but disk file is for object and this is for account container. And you can see that this is kind of inside the Swift still. We try to push it as low as possible. So this way, firstly, you have to, for example, if you're some vendor, like Seagate, for example, and want to implement this interface, the lower I push it, the less stuff you have to code. So that's why it's so low on this. And the second, all the changes in the churn, like storage posts and stuff, the further I get away from them, the less churn happens with this interface. So I'm not making that claim of a custom stone public interface anymore, but I want to be friendly to implementers. So this way, the lower it goes, the more stable it becomes because then it changes with the technology, not with the user requirements. So one thing that a lot of guys noticed here is that because of this desire, for the reasons I just explained, it's on the other side of the network hub from the proxy. So whenever you use this thing, you incur this network hub. And it was kind of, not a contention spot, but it was an observation of another people, a number of people made. Like every, you're proposing an API that has this network hub baked into it, and it's probably not a good thing. But fortunately, some very clever guy, or actually two of them, came up with a way to do it, and Pa is going to talk about it later. So let me just move on real quick to what we're up to. So there's just a bunch of things that still need to be done. In particular, Swift on file goes through something else, or rather it uses the same class that PBE changes. So it's like a version behind PBE, and that's probably the biggest thing that we need to do. Another I wrote, adapt to razor codes, and that was in my mind the biggest thing, and I learned just, and it's not in this presentation, I learned that like half an hour, that probably the biggest challenge is going to be container sharding, because that thing, the inventor of the sharding met Oliver just put the sharding into the, well basically I have to create the same method that he does for PBE, and that thing creates the shard tree. So yeah, there is a lot of work. And as you can see, but the good thing is that it's useful, or useful and usable, you can check out a tree, or that review that I mentioned in the presentation, and just write code to it and expect it to work. And I guess the rest is just to appeal to people to start using it, just maybe without waiting until I get it in. Just use that review as a patch to your Swift. And it's almost always up to date, up to the new Swift. And now I think next this pie, talking about the thing that I incurred and how he helps me to write. Here you go. Thank you, Pete. Okay, we'll talk about a single process work that's been going on for some time. The single process is really a very simple optimization. So when you have scenarios where Swift is backed up by a third-party clustered file system such as GPFS, or ClusteredFace, or even CepFS, a lot of heavy lifting and the meat is done by the clustered file system. So when you have Swift backed by clustered file system, the distribution and replication is usually done by the clustered file system. So you have Swift, which is capable of doing replication and distribution in the proxy server, and you have the clustered file system also capable of doing the same thing. So there's no easy knob to turn off the distribution and the application. And as of today, how we turn off or sort of suppress Swift's distribution is through storage policies like Luis explained. So what storage policies allow the backends to do is it can allow the user request to be routed to a particular object server which talks to clustered file system back end. And how do we suppress replication? As of today, we hard-code replication as one when we build the ring files. So if you see this setup where you have Swift in front and the clustered file system talking to the disk, so the hop between the proxy and the object server is really not necessary. So as of today, the only way you can reduce latency is to collocate proxy and the object on the same node, but they still run as two different processes. So the client request that proxy accepts, it is still forwarded to the object server over the network which involves the network overhead and the SDP overhead. So the proposal is very simple. Move the object server functionality into proxy and run as a single process. So this is an example. So as of today, when the users send a request or put to proxy server, the proxy server refers to the storage policy which could be the default storage policy or set on a particular container and it routes that request to the right object server based on the storage policy. So that is how we override sort of override Swift's distribution at the proxy level. And once the request comes to object server, so object server has this cool thing called disk file API and Swift on file is a disk file API implementation. So Swift on file can talk to any POSIX-based cluster file system. So in this example, Swift on file disk file API talks to the disk through GlusterFS. So when the request is forwarded from the proxy to object, the object server talks to GlusterFS and GlusterFS is the one that sort of does the distribution among the Gluster nodes and also the replication among the Gluster nodes. So if you see here, the hop between proxy and object server is not really that something relevant and useful. So what we are getting at is this, where we remove that hop between proxy and object servers where they just run as a single process. So the client request that is accepted by the proxy is directly sent to the cluster file system in the back end. So how are we going to do this in Swift as of today? So we did some SS bench benchmark recently with GlusterFS and older version of Swift and Swift on file. So this is done on a older change, so a lot of this is obsolete as of now. But as you can see, there's a lot of improvement for small file object puts. And it improves, you can leverage this work when you have multiple proxy workers and multiple users accessing the cluster simultaneously. So how are we going to do this in terms of implementation? So erasure coding was beta in this kilo release in Swift, and it brings in some new concepts such as policy type and multiple object controllers. So object controllers is the one that kind of talks to this file API, and as of today we have two policy types. One is replication and another is erasure coding. So the proposal is to introduce a single replica policy type which clustered file systems can make use of, and the single process optimization, if you see, although in the reference implementation it's tied up to the single replica policy type, it need not be, you can still have three replicas with the single policy optimization. So right now there's a patch up there sent by Chia-Go. So what it does is it will combine proxy and object as a single process, but once Pete's work on on pluggable backends is done, maybe we can get into sort of a Paco process where you have all the processes running as a single process talking to the disk file, to the clustered file system backend. So how does all these efforts, the pluggable backends and the single process optimization, fit in as sort of pieces of a puzzle and how they allow third-party file systems to plug better into Swift. I think Luis is going to take it. Thank you. So you've seen the technology, you've seen what we were trying to do. What is the ultimate goal of these projects that we've been doing for the last few years? What we're really looking for is a well-defined storage interface for Swift. What that will allow is storage files in third-party systems to be able to benefit from all the middleware that's out there for the proxy and then send that data to your third-party storage systems. That is the goal. I'm sure it's just a start of a conversation. I don't want to say that this is the way it's going to be. This is just the beginning. This is what we like to go to. This allows things like, for example, archival. You have maybe your own methods of storing the data, your own methods of making sure that maybe you don't want, like, warm, you don't want the data to be deleted, and your own methods of determining how the container and the account will be set up for that. For example, in GloucesterFS, a container may not actually be the container that it is today, which is a database where we look for how many objects are actually inside that container. We may just go into the file system and determine what objects are in the file system. It's completely bypassed the container database. In Ceph, there's a lot of work there that we need to decide what to do, but it's possibilities of maybe using the radar's gateway to benefit from the middleware on the proxy. We also have tape. In tape, you can use something like the tape file system in the back end and decide on how is it that you're going to decide how you're going to enable the container to decide what objects are going to be there and such. And then we have the kinetic drives from C gate. That can be plugged in here also. But again, this is community work. I want to bring an idea and say, this is not the way it's going to be. This is just an idea. We want to bring it to the community. We want to work with the communities. And decide how is it that we're going to go forward from here because really the only way to innovate together is to work through community work. So I'll make sure I'll bring that up. So I look forward to working with the Swift community on these ideas. So that's what we have. Any questions? Please use the mic. Yeah. You said that you're going from a replication factor of three to a replication factor of one with your back end. The question that I have is, how is this not introducing a single point of failure? We're talking specifically to Swift and File its method. What it does is that it leverages the storage back end. So Gluster FS has two or three replicas itself. So it's just leveraging. We passed that technology over to the cluster of fossils. Same thing, for example, if it was GPFS or if it was CFFS. You pass that over to that fossil and it keeps that data safe. And if you're talking about single point of failure in terms of network, you can have multiple processes with proxy and object behind a load balancer. I think that would also help. But don't we introduce a pretty big dependency on the redundancy inside the plug in storage subsystem? I mean, one methodology that you can use right now is simply mount the file system from, or file systems from, let's say, a SAN or from whatever back end you want in the devices. But if you start integrating the proxy server and the storage server and try to move it into the direction of essentially getting rid of the whole core. I think if I can just interrupt, I think it all depends what you want to do. For example, in system files going after a certain number of use cases. And those use cases define that the objects that are placed on the cluster file system are accessible also from other NAS protocols. So it depends. We don't want to replace Swift completely. This is not what the goal of this project is. We just want to enable other use cases with this technology. Does that make sense? Okay, thank you. Can actually use both of them in the same cluster with the policy. Like default policy zero goes to the legacy back end, which is super safe. So going back to your ultimate goal here, you're trying to basically provide a POSIX interface in front of an object interface, but I see you going in the backward direction where you're actually using the object interface to pipe in the data at the back end to a clustered file system. Why would you want to do that? I mean, the whole idea I thought behind building an object interface is innovation, right? And to allow people and legacy applications to come in and put their data onto objects. So I'm a little confused here. Yeah, I think that's, you're right. There's two different things. When we're talking about the plugin that we have today, the Swift on file one, it is very specific for cluster file systems. But from, for example, other views, it all, it's just an interface on how the data is written somewhere. It doesn't define that as POSIX. Swift on file does that because it is a specific use case. But the APIs that are available that we're trying to work on do not define it as a POSIX interface. It defines it as a class that you can subclass and do it your own way. For example, Kinect drives are not POSIX at all. And they are another object store. So it all depends how you want to implement it. Does that make sense? Yeah. And if you have seen the keynote by a digital film tree with a nice demo on Monday, so what they do is they took the 4K video of the audience and uploaded it to Swift. And what I'm assuming is the VFX software that the other guy is running there doesn't talk object. So I am guessing what he does is he gets it from Swift, edits it because the VFX software talks to a file, and then puts it back in Swift. So if he had used something like this, he could just open it through the file system interface without incurring the get and put again. Hi. I very much like what you're trying to do where you're enabling the Swift API to be consumed ubiquitously in front of any file system. My question is, how do you prevent Swift from doing bad things? For example, if the file comes in through the NAS and then I want to pull it out through the Swift side, it's obviously going to be missing some metadata like the e-tag, and that could have the auditor of racing files that we don't want it to race, and it could have client software that's expecting an e-tag or maybe the e-tag has changed because it went in through Swift and it got modified by Hadoop to reject the actual put that's returned. So how do you enable that? Or what's your proposal? Today is not very optimized for that. Today, what we do is we look at the file that has been placed there. Please let me know if I'm wrong. And the e-tag is not there. We calculate it. Okay. And so it's two reads, because the read for the e-tag and then the e-read to send it back. And then do you disable the auditor so that it doesn't or you calculate every time on the fly? Yes, that's what the replication number one does. You just don't start the auditor at all. Very good. Thank you. Hi, guys. I'd just like to amplify and ask a very similar question with a slightly different flavor that Mono asked slightly earlier. You seem to be having difficulty in differentiating between file systems and file protocols. A file system tells you how to store your data. A file protocol tells you how to get access to your data. And I'm seeing a slight blurring of the lines here, which may be desirable, but leaves you open to all sorts of problems that you're no longer playing in one camp or the other. There are two questions to this. First of all, how are you going to resolve this? Because nobody really wants a file system that's a file protocol or vice versa. We've been there, done that, and it really doesn't fly very well. And the second point about that is that we're now mixing up POSIX compliance with object storage. I was always taught, mention locking an object in the same sentence, and you've got a sentence that doesn't make sense. How are you going to fix those two problems? The second one is actually kind of easy because the system file is just an application sitting on top of LusterFest in our implementation. So it is just as any other application handles locking with any other application in the same cluster. The first one is a little bit harder to answer, and the only one I can answer is with this. So the guy in the left is a file protocol and the guy in the right is a file system? Exactly. Mainly what it means is that it's about community work. So we don't have all the answers. We are looking for help in this technology. Yeah, I have a somewhat different take on it. It's not contradictory but different. The goal here is to allow non-POSIX. So pieces and POSIX that are outside of this, they're probably going to be not supported. Just stop. That includes locking. For example, there is no such thing on Kinetic, and you can plug Kinetic. Actually, it already is plugged into disk file, so all that's left is PVE. As long as this is a goal, this interface cannot be too specific to POSIX. Parts of it are probably going to return an error if you try to do it. But there's an implicit problem here in my limited thinking. Yeah, so basically when you need locking, you cannot place, for example, a mail spool onto this thing and then have object access to it. So that's probably not going to work. But you are allowing us, for instance, to ingest via Swift. Swift and file being an example of that, so that's via NFS. Yeah, so there is a specific use case that actually does exist. It's not hypothetical. People are already doing this kind of thing. That's like a specific example I have. I know for sure that people do use this as a FFMPEG that just ingests from Swift because it needs this POSIX interface. But it's not looking anything. It just reads. And we made these reads to work, so it works fine. So yeah, I'm asking too many questions and I apologize to everybody else. But perhaps I've taught you afterwards. The thing I'm having difficulty with is perhaps differentiating by what you are meaning. When I hear you talk about file and yet you use the word object and then you use the word object and I hear file because I'm not... I'm beginning to misunderstand the position of Swift then if you're trying to support other. Exactly, the piece underneath is there are two things in there. There are file-based things and there are object-based things. And so what is the storage interface going to be? Object-based or file-based? It can be both, surely. You see what I'm meaning about? There's no file systems and file protocols. I just want to make sure that we don't confuse the example of Swift on file with what we're trying to do here. No, I understand that. All we're trying to say here is that there's a device driver on the kernel which allows you to take data coming from the stream and move it in the direction that you want to save it as. That's it. We don't try to define how you're going to define or save the data. That is up to the developer and the defined interface should not define that. It's only the stream of data that's coming through through the proxy and being passed to your class deciding how then to save that. Does that make sense? We're not trying to say there's politics. We don't try to say that's anything. It is a stream of data that's coming through. And it's allowing you to decide how you're going to save that. Okay. Thank you. Anything else? All right, thank you.