 Good afternoon and welcome to one of the three Swift slots at this time session. So I'm glad you came here this session is on how the Community and red hats and the red hat storage community have been working together and putting Swift on top of Gluster FS and The not as much as the technical will have talk about some technical overviews of how that actually happens But we're gonna focus a lot on how do you actually work together in the community and commit to? Really good community collaboration and is something that's been really nice leading up to this talk is I probably had six Conversations yesterday with people around the hallway and whatnot just asking well How do we do this with Swift and how do we do this with? Swift can we add this and all that kind of stuff? So it's really thing a really good topic and I think there's a lot of people here who are really interested in it So by way of introduction, my name is John Dickinson. I'm the project technical lead for open stack Swift I am the director of technology at Swift stack and With me is Luis Pabon Hello, my name is Luis Pabon. I work for red hat storage And he also works on something really cool We'll talk about that. So I work for a company called Swift stack I said what we do and to clarify is that we Are actively evolved in Swift, but we don't have Swift itself We we have a management control plane that comes alongside of Swift and does that To integrate it into existing places So why are we here and why does this actually all matter and what's what kind of problems are we solving? the reality of the world is that data has been exploding in its growth and At such a scale that you have to solve it. You can't get around it It's not something that you can just ignore because unstructured data especially has been just Growing at rapid rapid amounts. You've got changes in the way applications have been built Which means that people are deploying stuff on phones and mobile devices that are always connected and always generating new content You've got video content that is getting at higher and higher resolution And what this means is that you've got a problem of storing your data and you have to effectively solve it so It's not just the applications that have changed as well the IT Infrastructure people who are deploying the storage solutions actually need something better than traditional Traditional storage because they need to be able to move into a world where they don't have these silos of storage You want to be able to pull your storage You want to be able to take advantage of cost savings and things like that But you also want to be able to respond in a very rapid way as new applications come online And as that usage patterns change over time. So what you need is you need something that is going to be able to Abstract away the actual storage media and actually give you the The agility and the cost that you need to do that And that is what systems like Swift and like Gluster FS are doing Solving different problems for different use cases But in the same common way of being able to abstract away the underlying storage media so that you can grow and deal with This massive problem and unstructured data so How does how do we do this? How do we actually solve these problems? To start with I'm going to go over a little bit about how Swift works and some use cases there They were going to switch over and talk about how Gluster is put together and some use cases there And then we're going to show how the two have been kind of married together a little bit and have cooperated cooperate very well So to start with it's kind of fundamental pieces that are think are very important is that let's talk about what is Swift How does it actually solve this problem of distributed object storage? So there's two big parts of Swift and that's it's a really simple design You've got a proxy server and you've got the storage server The proxy server is responsible for implementing most of the Swift's API and then coordinating all the communications with the Storage nodes the storage servers are responsible for actually storing the data on hard drive someplace or more generally a storage volume and so The client talks to the to the proxy server and then the proxy server talks to one or more Storage servers and the storage servers talk to a hard drive. That's basically how it works So just to make sure you're paying attention pop quiz does the client ever have to deal with what hard drive something is stored on No, good answer. This is my fourth time to talk about this this week. So some of you probably got the cheat notes So that is absolutely right the the point here in the solution that we have is something that is Removing those hard problems of the siloed storage and allowing you to just simply treat it as a consumable resource So let's talk about a couple of use cases I've talked about this one a lot this week because I think it's so exciting But I'll tell you again because not all of you have seen it and if you have you need to hear it again There was a news story recently there was an airline that went missing in the Indian Ocean and apparently it was a big deal and They weren't ever able to find it but in order to Track where this might be and where it went down One way to do that is look say okay if it hit the water and it broke apart Then we need to maybe figure out where the debris went and then we can from that figure out where the plane is So it turns out that there's a site in Australia that was built out to do just this You know drop a little marker on the ocean and you can figure out what the debris field looks like and from now to like 10 years Out or something like that. It's a really cool thing. It's a really fun little app it's a drift.org.au and You should just go play with it because it's fun But it turns out that in the media storm of figuring out where the plane was One of the newspapers got a hold of this website and it's like hey guys You should go check this out which of course meant that everybody went home and started clicking on it and playing with it and the servers Probably crashed they just had a scaling problem So it was a pretty simple architecture and they realized you know We could just add in more engine X servers and increase our web servers. It's a known problem You know how to scale that we've been doing that for years But then they realized wait a minute We're already storing the data in Swift This is a they've got some CSV files that load into the client side application And so you click on a point it loads the right CSV file and renders it in the application It's kind of it's pretty simple application, but loading all of that stuff was just overwhelming their Their their stored. I'm sorry enough to store servers. They're their web servers. So what they realized is that they could easily Just point the browser the actual end user client directly at the Swift cluster and load the data directly from there Boom problem solved They don't have to worry about scaling out their web application servers just to deal with a concurrency problem on their storage That's the store. That's the kind of use case that Swift is designed for so if you've got content on the web stuff that you have this You know a wide range of access across it. That's the kind of thing that we need to solve for So a couple other use cases just to show you a few different things Pack 12 is a sports broadcasting network They're Filming hundreds of sporting events every year and as you know the resolutions on those are going up storage increase Requirements are going up as well and they were using a traditional sand and so they have They have number one they have a migration problem. They have to deal with And number two they've got to figure out how to take advantage of something that's going to be more available Rather than throwing stuff into a tape archive Let's put it in the active storage and then we can not only lower our costs But we can also get something that's more available and potentially increase revenues because now you have You know somebody wants to go look at a football game from 1994 or something like that well You don't it's not unavailable. It's not sitting on a shelf someplace It's already actively stored in the Swift cluster, which means that somebody can just go load it right then with As soon as it's needed And I want you to remember about the migration story. We're gonna come to that come back to that as we move into the the Gloucester world Another one that I want to talk about is somebody who was here It was this week kind of other end of the spectrum not necessarily the web content But you've got a lot of data centers on a single campus studying genomics and a lot of research data solving the problem hard problems around curing cancer and curing AIDS and You've got this kind of file sharing problem of massive massive data sets that have to be Stored durably because you can't recreate them and you've got to do a lot of computation on them And you've got to be able to read and you know any part of them at any time to do it So this was something that there was a whole use case a session earlier this week on Fred Hutchison And how they were building out their Swift cluster for this but again one of their problems is a migration problem on dealing with file access protocols And how do you deal with that in a in an object storage world? So remember those two things we've got stuff that needs to be highly available highly durable highly scalable But we also have the problem of migration from legacy storage So other users around the world that are using Swift include You know service providers and major people the point the major companies And the reason that I really want to point this out is because it demonstrates how Swift is being used not just as some Toy hobby website or just a project an experiment someplace But actually implementing storage for crucial lines of business at major companies all around the world So Swift is obviously part of the open-stack project and that's why we're here this week, of course And so there are lots of ways that we participate in working with the rest of open-stack here whether that's being a target for Backups from sender or loading VM images from via glance or being able to integrate with keystone the metrics and salameter And all of those other projects that are in the open-stack In the open-stack project It's kind of playing well together with all of those now the fun thing about Swift is that it is not intrinsically dependent upon your Your particular compute infrastructure so that you can say you know what I've got a storage problem And we can deploy this independently without needing necessarily to set up a neutron network or something like that It's something it it's something that works very well on its own And then it cooperates very very well with the rest of the cloud infrastructure so that you can build out your applications Exactly to what you need both on the compute and the storage And I want to talk this is this is where we get into something that I think is very important and how we're transitioning and being able to Work very well in the Gloucester community So we've got something working in the in progress in the Swift community right now called storage policies storage policies allow you To do three things Given your global set of hardware on your Swift cluster You're able somebody is able to choose the ability to say what? Subset of hardware do I want to store that on? Once I choose the subset of hardware I can choose. How do I store it across that subset of hardware? Is it going to be replicated? Is it going to be a racer coded? What's the configuration parameters on those how many replicas does it have how many parody bits and are you see and then finally? And this is the important piece that Gloucester has really been contributing back to and taken advantage of is the abstraction to the volume itself And this is where I want to talk about the extensibility of Swift remember This is the parts of Swift and how it's put together and so one of the major extension extensibility points within Swift is at that last bottom layer You've got the object servers or the storage servers talking to hard drives that abstraction there is something that You can extend and bring in your own implementation of a particular storage volume And this is what we've seen happen time and time again in the community and it really kicked off with a lot of the the Gloucester Swift project and So there's a I want to talk about a few things It would I'll mention a couple that are there. We've got our Normal one in Swift that comes out of the box Which is includes just talking to a local hard drive with a normal POSIX file system We've seen other people write things for C gates kinetic drives, which is a non POSIX Non-block device so it's a different actually a fundamentally different protocol and then the zero VM team has been using this to tie together Compute and storage so that you can do your compute exactly where your storage is they had a presentation earlier today about that It's not kind of really cool technology there and then the last one and the reason we're gonna are talking here today Is the running Swift on top of Gloucester FS volumes? And so that abstraction there is what allows Swift to be able to ingest different different sorts of storage volumes and then gives the users the freedom and flexibility in the choice of Deploying on what they actually want to do So thank you very much John So now I'll be talking about the integration between Gloucester FS and Swift So before we go ahead and start talking about that. Let me introduce first a little bit about Gloucester FS Gloucester FS is a distributed file system with normally file access one of the things that it does as being distributed it has its own way of Placing data on different nodes It also exports what it does is it exports a set of directories on different nodes combines them and Represents them to the user as a what's called the volume That volume can be accessed through different types of protocols We have a system protocol for Windows systems. We have NFS We have a local Gloucester Interface which we used over fuse and we have other methods through APIs and things like that But the Gloucester FS community wanted another method of accessing the same data on their cluster and that was object interface They decided to then use what Swift Use they want to use Swift as a method of accessing the storage the same files that you access through either NFS and SIFs So now one of the use cases that they were trying to accomplish Was something like this Let's say for example, you have a set of Swift Systems that You you can use object interface to place files for example video files that you take on your camera your phone You put them on your Gloucester system and then you have a set of VMs for example That can take those files and transcode them using a file interface NFS or SIFs You can then take those transcoded video files and place them back on the Gloucester FS system And then export them again through the object interface This works very well with something like you know, you take a video with Facebook or Google You place it on your you upload it through your phone and then available to others at different bit rates So this is the use main use case that they wanted to accomplish So let's go back to how was this accomplished back in 2012 Excuse me the very first thing that they did the gloss for the Gloucester FS community was They would require the user to apply or install open stack Swift on their system And then they will install a diff file on top of it. Okay so There was no automated testing for this There were zero developers really working on the open stack Swift community and it was really only one developer on the Gloucester FS community working on this this patch this diff file now Trace this was really a great idea to combine those two worlds, but really it was bad execution All right, so let's kind of revisit again a year later and We are red hat storage we took it and we said let's refocus this. Okay. Let's refocus what we're trying to do here We took the work that was done in the Gloucester FS community and we brought we took it out of that repo and we Focused it more out to be integrated with open stack Swift We then made a full CI system we want to make sure that we supported all their interfaces But more importantly we made sure that we included development developers into the open stack Swift community and Then we had some developers still work on the Gloucester Swift Bridge the glue to Export this object interface into Gloucester FS Go ahead good So one of the things that we did in red hat as We think we're part and participated in the open stacks with community is we wanted a better method to To enhance the methods in open stack Swift to extend it further extend the volume interface So that way we could have open stack Swift use Gloucester FS as its volume There was a set of people that worked very closely with with red hat in in the in the project that We're able to actually accomplish this it's actually Available now in open stack Swift and not only does Gloucester Swift use the system But also there are others in the community that actually leveraging this technology Yeah, and so for example like I mentioned earlier It's the same example in the same extraction that was able to be used By other people in the community and not just one vendors proprietary extensions on to something And so what this ended up being was a really great idea and one thing I've told many people today And I'm not before and I'm not at all a shame to say it at all Is that red hat storage in this instance has in my mind set the bar for how to integrate with an upstream open source Community in this way what they did is they identified Where are the extension points that they need to take advantage of and start using those and then They where they found those extension points lacking They were able to contribute back upstream to to improve the project for everyone And so this is absolutely the right way to interface with Swift and to take advantage of the things that Swift offers It is completely a different way than saying like what you're doing in the past I'm just gonna re-implement the Swift API behind my own proprietary code in this way You're working with the upstream community, and that's absolutely the right way to do it Thank You John so But we can do better than that today a few months ago What we have done is we want to increase participation We want to increase the involvement between red hat and open sack Swift So one of the things that we are doing is saying let's step back and let's look at the Gluster Swift technology itself We step back and say well This technology is not really based on Gluster FS. It's really just talking to POSIX file system So how can we increase community involvement in community collaboration? Well, let's rename the project what it's really is trying to do. We're trying to make Swift Go on top of file systems as we do that We also want to increase the amount of development from red hat in the community So we are adding more developers into the open sack Swift community and some of those developers are now Responsible for what we call Swift on file, which is the replacement for Gluster Swift Son like John said and when he was talking about storage policies One of the next goals for Swift on file as it transfers It transitions from Gluster Swift to Swift on file is to become a storage policy as We become a storage policy and we now can extend a current open sack Swift deployment with file system based storage like Gluster FS and in the near future NFS and Any other type of POSIX file based system that the store systems So the thing is that I love about this and the way this works and one of the reasons I'm very excited about the storage policies in this being a particular application of that is that This allows people who have those migration problems. So remember back to Pac-12 remember back to Fred Hutch They had this migration problem both with existing storage and also having to deal with the the different protocol access to Their new sort of storage So what this means is that you would be able to use a Swift on file that project that connector to ingest a An existing for today Gluster FS system But even tomorrow potentially in arbitrary NFS system that would say this is my traditional storage And now you can still take advantage of the storage. You've already purchased But as you grow out your your storage requirements, you can expand you can you can build that out in the just traditional Normal way in the Swift world. She's in commodity servers and take advantage of the the global dispersion and cheap Cheap costs to deploy Swift clusters. So what this really I what I love about it I keep saying all week is, you know Swift is able to allow you the freedom that you need to Exactly match your infrastructure to your use case Exactly. Thank you. So one of the things that we wanted to also bring up is the collaboration of the project One of the things that made Swift on file possible and made this storage policies also possible and so on is the ability to collaborate in the project together We have different Innovate ideas between different companies different developers The only way to innovate bring innovation into a certain open source projects to really bring that in those ideas and collaborate together we really need to start embracing that and Not only that but this presentation is actually a collaboration between both of us I mean, this is a type of collaboration that we really need in this project and Collaboration is really really you want to say something else? I'm just saying I'm agreeing with you It's incredibly important because it's not just the new ideas that sharpen one another But it's actually meeting the new use cases if I'm sitting at my office somewhere I'm typing on Swift codes, but I don't hear about your use case Then I'm not going to be able to make sure that Swift includes those sort of things and meets your needs So the fact is having that collaboration as demonstrated by this particular in this particular example But expanded to all of the people who are here this week. That is the kind of things that makes Swift better and OpenStack better Exactly. Thank you. And then collaboration is very important and It's hard to explain but when everybody comes together and it starts innovating together and then those ideas start coming together And people start talking. It's really really awesome. How awesome is it? It's as awesome as a shark high-fiving a gorilla. Is that awesome? So please come join us. We've got to take advantage of that Extensing the existing extensibility it contribute back where the extensibility is lacking and you need additional functionality That way we can all take advantage of what's going on there Now I think we have just a few about seven more five to seven more minutes We could have some questions if you can please line up behind the mic If you're not able to do that, I will repeat your question and I see one right here in the front Right. So what's the work involved in actually making this happen? So they're saying that well Swift is talking to local POSIX storage systems anyway Just you know XFS formatted on our drive and Gluster FS are presenting a POSIX file system. So why was this so hard? so It is an ongoing work We've we've come a very long way and we've still got a little ways to go and that's part of the collaboration but the main focus of the the main functionality focuses here are the ability to Make it flexible to give people freedom to choose what kind of back-ends they need to use behind for storage volumes behind Swift so instead of writing the Swift code in such a way that we just assume and Mix up all of the layers of abstraction such that we say well We know this is going to be a local POSIX file system Well now we can throw that out to a Plugable interface that says well we can actually do this now the gentleman standing up right here actually did a lot of Work on this and you should ask him all of your questions, and he's gonna ask me one now Yeah, there's no question, but just a quick answer is the difference between Swift on my mic and through it closer Close to the mic just get close really very close to the mic Yes, better is that in the Swift on file the URL for a Swift object maps directly on to your file system That's whereas with the current one now what is stored in the POSIX XFS file system is not a direct mapping to the URL So right Over here Yes Do you want want to repeat the question first? Yeah, I'll repeat the question here So the question was does it does it mean that every file on your on your POSIX POSIX file system is now exposed as a Swift object directly and the answer is yes, and here's the difference So the question is well, why doesn't it if we just have a local HTTP, you know, etsy Dub dub dub or something like that. Why can't we just throw a patchy at it and serve up those files directly? Which you could absolutely do and if you show me the file system that can do that with 10 petabytes of data and a trillion objects go for it But that's why you need something. It's a matter of availability. It's a matter of scale It's a matter of making sure that's durable to individual hardware failure and things like that So what you're getting that's a great question Does the availability does that do the does the durability in those promises? Translate through this Swift on file and the important piece here is that you get these same advantages that you add with Gluster FS if you're using Gluster FS and you also get those Get the full Swift API because they're using actual upstream code. It's like you're right Yeah, the Swift on file just leverages the underlying file system. So that technology is just passed through here I have some concern here since the shift is sacrificing the Consistent consistency to get a high availability But the underlying you get a distributed file system, which is a stricter consistency will I think It how how Can we get to the same level of high availability to build Swift on top of the distributed file system so that's a great question and this I Don't know if you want to answer. I've got an answer to it I'm trying to understand this question again, but go ahead you answering So so basically we've got a an eventually consistent object storage system And now I've got a strictly consistent distributed file system. So did I just sacrifice all my availability needs or you know? How does how do those? differences in Abstractions and understandings of the world actually affect the end user. There is the CIP a serial Right exactly. So you've got the cap theorem. So we didn't solve physics sort of thing. So the answer is what do we do here? One of the important is that in this in this world and in in the world where you have a a Swift on file Storage policy that allows you to talk to Gloucester FS in that sense in that kind of world Swift is not going to be doing your replication for you. Swift is just going to be able to sorry about that Swift is only going to be able to do the Just the API implementation, but would only store one copy and then offload all those durability problems to Gloucester FS in this particular instance But then the other storage policy may be not using the Swift on file just the normal stuff and would still be using Traditional replication or in the in the new world of erasure codes with storage policies. So if that is true, so the new new Swift cluster provide a stricter consistent model To the client so so I guess what I'm trying to say is that What you're getting is if you're using a storage policy that is throwing the data into Gloucester FS Then you have the semantics that are associated with Gloucester FS and if half of your Gloucester FS cluster is unavailable. Well, then you're gonna Is it fails then you're going to sacrifice your availability because it's a strongly consistent system That won't affect the other storage policies within Swift because those will be independently Are logical pieces so I I was just going to say that it all depends. That's what I meant by depends On how you deploy it and what your needs are and what the customers needs are So if you deploy it knowing that your source policy defines that this set of data is going to go to a Gloucester FS system That's what it's it's defined that that's its expectation So you have to know where your data is moving in other words. That's what I mean It depends the the employer of this of the storage policies has to understand that So it's a great question makes sense Question again over here. Was there somebody that hasn't asked one yet? Just make sure Go ahead So the question is capacity management, how do you add capacity in this sort of world? It's the same model really It shows that we are now can extend on that model For example, if you want you have opensack switch yourself you in that cluster you would add the nodes for example, right? but now you also have the ability to add Gloucester FS or NFS or Whatever FS not just nodes to to opensack for example if you had already some set of data in some Device that's exported NFS. You can now add that to also do the cluster. So it's not just about Swift adding nodes there now you have more choices So in this case what you know go ahead what adding are you adding multiple NFS namespaces or would everything be one volume? So I think that something very important that I want to reiterate it just to make sure we're very clear is that When you're adding a durable storage system with Swift on file underneath Swift You may still access that exactly like you were with the multi-protocol access exactly the use case Louise was talking about but at the same time that's It is not exposed through Swift as a storage volume. It's not that you're going to be able to mount something You're not going to get a POSIX interface into Swift itself for doing this This is a way for Swift to add capacity That has to do it's the migration story of I've got existing capacity I need to ingest that capacity and then I need to continue to expand that and I give the flexibility that I need so What happens is that in the current incarnation of the Gluster Swift project a It's mapped to Gluster volumes, so it's managed exactly like you would manage Gluster volumes today In in the brave new world of the Swift on file It's essentially going to it's really just that connector of whatever volumes are abstracted or presented by that file system That's what you would be adding and and Swift would then see this as here's the storage interface At one point that I'm a storage volume that I'm going to place data on according to the same rules that it uses for everything else Hey, it's Trevor RMS. I had a question with replication and so I feel like Elvis right now With replication are the two technologies that aware of each other's replication because both can do it and if you wanted to Replicate to a different data center Would you turn one off and leave the other one on or are they aware that's actually a great question And I think that's true Swift on file right now is is in the process of transitioning so that would be a great question to bring to the community as well We can answer it But I think that we don't know the answer completely to that yet in general I would say that in this case for example You would configure Swift to just put one replica so to speak inside of Gluster FS and then Gluster is going to do its own Encoding of the data replicated or not and do what it needs to now I think it's interesting that you brought up the concept of global clusters and how does this work if How you in the world of global clusters because we know Swift has supported that for a while now and people are using it So if you've got as somebody else mentioned if you've got this strongly consistent system Then you're not really able to distribute that in a global way and keep it available That's the kind of the trade-offs that you get with that that sort of design So in this case what you would end up with is a Gluster FS System that is in one location But then you may have something in another location as well But you've got Swift that's spanning the world there So you've got one logical Swift cluster that's able to say look here in my Texas region is Gluster FS and over here in my Berlin region is normal traditional story I'll call it traditional stories. It's it's Swift I've been working on it for five years now. So it's traditional to me just basically white box servers with hard drives that would be normal replicated storage and then Swift can deal with I want to Distribute something globally or I want to make sure that it's put put just on the the Gluster F FS part via the Swift on file project. So in that sense, we're doing We are not sharing information today. In other words, the the deployer is choosing. Where do I want my replication to to happen? Last yeah, let's say last question over here. Yeah We'll be available after two if you want to traditionally the the project fast system fast application access a fast system and understand how to lock In all those things you take closer. So once you add this Swift interface, which is great, then and it's not a politics API So let's say you do I oh normally suit through your traditional positive fast system and now Swift Upload something which change your things on the fly how you handle that, right? No, so that's a great question. So I think the the best answer that I can come up with that is really just going back to the use case and Saying that remember the use case that Luis was talking about that. They were originally trying to Solve and it I love I actually really like it. It's the video transcoding use case I've got a bunch of data. It's unstructured data. It grows without bound. It's getting we've got to solve this But I've got a tool chain that needs to figure out how to edit the video how to transcode the video and things like that that requires SIFs NFS POSIX access to the data and I can't go rewrite all of those applications mostly because the guy who wrote it left the company six years ago, and I bought the other one and I don't see the code to it So you've got to be able to deal with this migration strategy the client application talking to that thing that piece of video content But what's really really nice about this is being able to say that now? I've got the exact same storage system and I can talk to it and use the standard POSIX locking semantics if I'm going through Gluster FS, but now using the Swift on file stuff I can expose that same object and the new objects created as a result of transcoding Directly through to the clients via the Swift API just like the adrift org site in saying that I'm gonna I'm gonna give them a you are a Swift URL and the Swift URL being the standard Swift API rest-based non-locking object store object blob storage There so does that clarify that a little bit on? So right if you've got concurrent rights, I'm writing an object in POSIX and I'm writing it in Swift Don't do that Your Swift on file or Gluster Swift. It's really a user of the file system. So Yes So the the comment was saying that the Swift interface in the understanding here is that it would be more heavily used on the Read use case and the and the POSIX may be more used on the right interface and I think that is one very valid use case I certainly wouldn't put it into that box alone, but that is in this particular case that we're talking about Yes, that is that it's a very good point to make So I want to thank you for coming I want to specifically thank Louise for presenting with me and if we have any questions We will be here at the front and around in the halls this week. Thanks a lot