 So I, I love this little box here because it makes me feel like I'm one half of Statler and Waldorf as I'm speaking to you. That stinks. Anyway. So yes, I'm John Mark Walker. I'm the Gluster Community Leader. I try to expand the Gluster community for users and developers all around the world and I go to events like this and talk to people like you. So one of my talks about today is, you know, first of all, sort of the beginnings of Gluster, where we saw it fitting in the other days and how it fits in was sort of like all the trends right now that you see going towards cloud and automation and all that stuff. And so you can see right off the bat when you say, what is this cloud? I'm actually not going to explain the cloud to you because you already know what it is. But I do like to tell the story about how one time I went to an AWS meetup at a cloud expo and some older gentleman crashed our table and just looked us in the eye and said, what is this cloud? And I was like, why are you here? But anyway, so it's always amusing an anecdote to start with. But you know, getting to why we have a cloud, why are we here talking about OpenStack and the related technologies that go along with OpenStack? It's because for the last dozen or so years, there have been these guiding overall trends that sort of fit within the same framework. The trends towards open source, towards commoditization, towards virtualization, towards anything that gives you more agility in the data center, those are all part of the same basic trend. And OpenSource is an intrinsic part of that. And from the storage perspective, it took a little bit longer for us to really go full speed on that same path. And that's because, you know, people get conservative about data. Compute is one thing, you know, I can spin up a VM and trash it and it doesn't matter because I can just spin it back up again and it's cool. Data, you trash your data and you're, you know, you're going to be fired. So people were kind of loathe to virtualize their storage systems because no one got fired from buying from EMC. But now as things have progressed, we've gotten the point where, you know, storage is definitely following the same trends of virtualization that everything else is. The same trends towards agility, the same trends toward open source, the same trends towards automation. And it's just becoming one yet another application you install with all your other things that you install for a scale-out architecture. You may be surprised to know that when Gluster started, we weren't a storage system per se. And full disclosure, I was actually part of the Gluster acquisition by Red Hat. I was there a full five months before we were acquired. So I got tied in a little bit to the back story while I was there. But we started off as cluster management. We didn't do storage. Our engineers had a different perspective of the data center. And you can sort of see the way it ties into the Gluster project and community today, for better or worse. They had a very unique perspective on storage. They didn't have the baggage of previous storage experiments or distributed file system experiments. So they really approached it from a new direction. And you can see that manifested in the architecture. I'll get into the specifics of why in a minute. And what happened was they had this cluster management piece, and they had this project down in South America with an energy company. And this energy company said that your cluster management stuff is nice and all good and well, and thanks for putting it in this cluster. But what we really need is help with our scale-out storage piece. We don't have anything that lets us do stuff with our tape backup system so that we can grow it to multiple petabytes and be able to scale out to the point where performance doesn't suffer just because we reach a certain amount of data to be stored. And so we thought about it, and we were a cash trap startup this being 2006. And we thought, well, we need the money, so we're gonna say yes. But we thought that within six months we could use something off the shelf. And it would be, it would just fit into whatever else that we had put together. And it would just work and we'd be done. Come to find out, all the storage solutions at the time were either too expensive, did not scale out enough, were proprietary and required signing very pernicious agreements with proprietary software vendors. Or just didn't work as advertised. And so as most open source projects begin, ours is similar. We said, well, we'll write ourselves. And as with many open source projects, we didn't really understand what that meant at the time. We didn't understand just what we had set out to do. But we made it. We built a essentially a Lego toolkit for building file systems on a scale out architecture. And that design decision has yielded many benefits even until now. Because if you look at GlusterFS now, because we're so flexible, because we're so hackable the way I like to put it. I think we're the most hackable storage system in the world. Because of that, it's very easy to add features and remove features. It's very easy to bend GlusterFS to whatever you need to do. And in fact, you can make it look like a completely different beast than what you normally see when you unpack it. And so as we're working on this project, we came up on the idea, storage should be simple. Why can't you just install software in your data center and bam, you have distributed storage? Why can't you aggregate and pool together your data silos into one same space? And as we thought about that problem, we thought, okay, well, this is the beginning of something new. So we're gonna take the work we did for this project. We're gonna turn it into this open source project. And it's gonna be about scalability. It's gonna be about scale out. And it's gonna be about, you should be able to install it just like you would any other application. You should be able to do yum install or app install. And then you should have distributed storage clusters. And thus began GlusterFS. So GlusterFS at its core is a unified distributed storage system. And the word unified there is very specific and we choose that for a very special reason. That's because we have, as part of a single namespace, we do not have data silos. So whether you are accessing a Gluster volume via the Swift API or the NFS mount or the Gluster client or the LibGF API client library, you're accessing the same data pool. There are many storage systems that claim to be unified. But what they really offer is a set of data silos. And it's up to you to be able to manage those silos. We remove the silos entirely. And yeah, there are some performance give and take there. But for the most part, it works pretty well. Also very unique to GlusterFS, it's completely in user space. We don't have any kernel modules specifically. The Gluster client does use the fuse kernel module. Sometimes that incites jeers from the audience. But recent versions of fuse are actually quite good. We also feature a golden namespace and stackable architecture. How many of you have heard of the GNU Heard project? Yay, okay. One of the co-founders of Gluster was a contributor to the GNU Heard project. And so a lot of the architectural ideas and even some of the nomenclature was borrowed from that project. The idea of implementing features within stackable user space translators was an idea borrowed from the Heard project. The idea of everything being treated as a file was borrowed from the Heard project. And so you can see a lot of that manifested, Heard lives on in GlusterFS, you could say. This just gives you kind of a bird's eye view of GlusterFS architecture. And I have another slide following this that shows you specifically how the clients interact with the servers. But at its core, GlusterFS is a data aggregator. It works in conjunction with disk file systems that support extended attributes. So XFS, ZFS works, if you can use it well on Linux. As well as EXT4, the recommended file system is XFS. There are some people using the ButterFS, but anything that supports extended attributes. And there's a very specific reason for that. And that's because, well, I'll get into that in a minute. But it's a data aggregator. It sits on top of the disk file system. It allows you to distribute the data across multiple systems and create a single namespace and a single volume. You can replicate that volume to other servers as well. And then across the network, we have multiple access methods. We have the client, which again uses the fuse module in the Linux kernel. But we also implemented an NFS v3 server on the GlusterFS back end. So any NFS v3 client can connect. And then we have a new client library, and I'll get into more detail with this later. But this is how we do integrations. This is how we did the QMU integration. So now you can directly manipulate and manage Gluster volumes from QMU. This is how we did the Samba integration that was recently pushed upstream into Samba 4.1. And in the future, we're probably going to use our Swift API. Probably move our Swift API to that architecture. Right now, the Swift API actually goes through the fuse module. Similar slide, except to show you a bit more detail about the client access. The GlusterFS client understands where all the servers are. In other words, when the GlusterFS client mounts a Gluster volume over the network, it loads in all the translators that lets it distribute the data, either replicated or distributed. And from the GlusterFS client, if it's accessing a distributed volume, it knows which server to actually go to to find the data. If it's going through a replicated set, if it's reading, it will take the first responder and read from that set. If it's writing, it writes to both before it actually returns. And so we put a very high premium on consistency. And the same with LibJF API, the client library. It loads the same translators as the GlusterFS client, so it follows the same protocol. The only one that's different here is the NFS V3 client, in which case, it only attaches to one server on the back end and the HA is actually done on the back end. So it also means that if that connection between the client servers is severed, then you have to work out the client side HA. These just give you an overview of all the different interface possibilities with the GlusterFS volume. On the back end, you can see the block device and you can see files. These are virtual block devices, so ultimately they're virtual files. I mean, ultimately they're files loopback mounted on the server. We don't have any sort of like ice-cozy integration yet. You can see on the transport layer, you can do IP or RDMA. There are many deployments in the wild that use InfiniBand. On the file side, there are many client methods for accessing data via either NFS V3 or the Fuse module or SAMBA, the SMB V1, whatever SAMBA supports, or HDFS. On the block side, with the recent release of GlusterFS 3.4, we have a QMU integration, which I referenced before. There's also a Cinder integration that went into Grizzly and a revamped version that is going to Havana for OpenStack. On the object side, we have Swift API integration. We're actually collaborating with the Swift project upstream so that the Swift API becomes more pluggable so that other storage systems can be used on the back end of Swift. And then we have the GFA API, which is our client library, which should be able to integrate with anything. Some features. So remember when I said that our engineers started off as cluster management people, they were not storage engineers. They had a different approach and different idea. And one of those was they did not think that a metadata server functions well in a scale-out environment. There seems to be limits to how much you can scale out if you're using a metadata server or services. So we bypass that entirely. And so we implemented what we're calling the elastic hash, which is a DHT algorithm. And by doing that, we remove the round robin to pull the metadata server. We also make it, it's a very fast calculation to determine where exactly the data is located. We basically create a hash value. And that hash value is consistent across the entire cluster. And it allows you to find data for reading and writing purposes. There is some magic that happens underneath if you move files from one place to another. But relatively speaking, it stays consistent across the entire cluster. One of the key design points that we made was multi-protocol access, because we always thought that you should be able to access data your way and not have it be dictated by your vendor. We have both synchronous and asynchronous replication. Out of the box, it's synchronous replication. But we also have something called a geo-replication, which is master-slave and asynchronous, eventually consistent. Proactive self-healing. So if a replicated volume goes down, it can pull the other replica for data that needs to be healed. And again, no silos. We don't believe in data silos. We think whether you're accessing data via an object protocol, some other kind of application should be able to access the same data via another protocol. So you'll see, for example, and I'll have slides on this later. So via the Swift API, you can do object storage. And our core belief is that just because you're doing object storage on one end doesn't mean you shouldn't be able to do manipulation of that data with something else over in FS. And generally, we try to just make hard stuff easier. We have a GlusterD, which is a daemon for managing Gluster volumes. It allows you to add servers on the fly or remove them, that sort of thing. As opposed to some storage systems, we don't require you to pre-allocate the amount of data you're going to be storing. You can add to that amount as you go along on the fly using GlusterD. I know that with some systems, you have to pretty much determine beforehand how much space you need. You don't need to do that with Gluster. And you can have a cluster up and running in about four commands. So how do we do all this great stuff? Modularity. You can see. So when we implement the translator stacks in user space, you can see each of these boxes or rectangles here represents a translator. And so on the client side, you've got the Gluster client that goes through the Fuse module. You have the LibJF API client. In both cases, they pull the Gluster server to determine which translators are needed for the data to go through for the data path. And so you can see here it's going through a distributed replicated stack. And then to the bottom of that stack, it goes to the RPC client, which then works with the RPC server on the Gluster server side, and then down into the local storage and the disk file system. What can you do with it? You can store pretty much whatever you want. One of the main challenges we have in the data center is that year over year, the amount of storage space that's needed is almost doubling every year. And so you need something that can expand with you, something that can grow with you, and something that is, we think that scale out is the way to do that. And we have a pretty good solution for that. So for all sorts of unstructured data, GlusterFast is ideal. I think I saw a slide one time where if you compare the amount of unstructured data growth compared to things like block storage growth, it's a very wide, by a very wide margin, unstructured data is by far growing more than anything else. The other design principle was that it should be for any environment. Whether you're deploying on AWS or an OpenStack cluster or in your KVM, a hypervisor, one bare metal, it should look the same. It should act the same. It should be consistent across all platforms. It should behave the same for whenever you're trying to interact using whatever toolkits you have. And so as you can see here, the kind of overriding theme is access, availability, consistency. These are all themes that have a great impact on the type of features we implemented, because that was our vision for what a storage system should be. This kind of shows you an overview of what practice self-healing looks like. This is implemented in the release prior to 3 out of 4. It was in 3.3. This means that if a replicated set goes down, we keep a change log of stuff that's changed since the other replica went down. And so when it comes back online, it pulls the good replica and the yield commences. There's a mistake on this slide. I didn't correct it, because I always like to ask the audience, where's the mistake? So if you can see it, let me know. Anyone? Yeah, the distributed replicator are backwards. It should actually be going the other way. And so also with the 3 out of 3 release, we implemented something we called a unified file and object. And again, this gets back to the core concept of multi-protocol access and the ability to attach to the same data regardless of the protocol or access method you're using. In this case, I'm giving just sort of a dumb example of object storage over an HTTP request. And that same data can be accessed via NFS. We map the Swift account container and object to on the Gloucester side volume directory and file. That's actually going to change. The account to volume mapping works well if you're talking about a single-tendent architecture. When you start going to multi-tendent, it requires to change that mapping a bit. But conceptually, it's still very much the same. And like I mentioned, we are working with the upstream Swift project to make the Swift API more pluggable so that other file systems can plug in the backend as well. Conceptually similar, but for a different use case, we're talking about Hadoop and HDFS. We created a plugin that you can drop in on the Hadoop server so that you can write the results of your MapReduce jobs to GloucesterFS. And it essentially just mimics the HDFS API. And in order to do this, we had to essentially create the concept of data locality within GloucesterFS, which didn't exist prior to this. But once we started working with Hadoop, we realized it was essential. And it kind of started us thinking towards the lines of GloucesterFS can be the backbone of any kind of data analytics system, especially if you're talking about a system where you're converging your compute and storage and you need to move your applications closer to your storage or, in fact, run your applications on your storage server. That's one of the things that's enabled by the whole scale-out, software-only cloud storage. Now, specifically what we released with 3.4, it was released in July, so a couple months ago. The QMU integration, so you now don't have to do the fuse mount if you want to do manage virtual machines on a Gloucester volume. We have something called Enhanced Quorum, where you can have an entirely quorum-based configuration management. And we rewrote parts of our GloucesterD Damon so that it could be multithreaded and actually scale-out more. But first, on the QMU integration, so there's actually a very interesting story here. This is something we'd been thinking about for a while, and we hadn't quite gotten around to it because it wasn't the highest priority for us. But about a year ago, we noticed that some engineers started showing up from IBM's Linux Technology Center. And they said, hey, we'd like to do this. And we said, sure, because you're not going to say no, right? And so they start writing in the QMU protocol piece. And we realized, well, we should probably have a client library in between. Because when you start talking about latency, when you're managing VM images, going through the fuse mount involves too much context switching. And the latency becomes too high, especially when you're talking about many VMs on a single volume. And so we revived an older project that now we call LibGIF API. And that sits in between. And on the backside, the Linux Technology Center engineers contributed a block device translator so that you can now complete the circle. The QMU side, the LibGIF API client library, and then the block device translator in the back end. And that becomes the basis for what became later on the NOVA integration, which I'll talk about later. So it has some very key open stack ramifications. This is a diagram actually submitted by the main engineer who did the integration. Any questions so far? And then the middle piece of this is the LibGIF API client. So as we were working on this, and we realized that it could actually work and would be successful, we came to realize, well, maybe we should follow the same methodology for doing future integrations. And so it became the basis for doing the SAM integration, for example. The one that has contributed upstream into SAMBA 4.1. It's going to be the basis for a future implementation of the Swift API. It was the basis for the NOVA integration, which I referenced, which also means it's also the basis for the Cinder integration, at least the one for Havana. And then finally, we had a revamped quorum for 3.4 so you can now do quorum-based configuration across the entire cluster. Previous to 3.3, we didn't have a concept of quorum. With 3.3, we had a very simple version of quorum, which still required you having an odd number of nodes in the cluster. And then with 3.4, we actually implemented a new translator that we're calling an Arbiter node. It implements what could really be described as a fake Gluster server whose only purpose is to determine quorums, so that even if you have only a replica 2 volume set up, you can still determine quorum and define whether or not you've reached quorum so you can make changes to the configuration. And then of course, there's the concept of quorum to avoid split brains, and so you can have the healing going in the right direction when you have a split brain and how you recover from a split brain. How many of you face split brains in your storage systems? Anyone? OK, sometimes. In some cases, the admins prefer that it be a manual process for recovery. And in some cases, they prefer it being automated like this. We actually make it a configurable option. And by default, it's turned off. So if you want quorum management, you can have it. If you don't, you don't have to worry about it. And then 3.4, this just goes through some of the changes that we've made to quorum management. And finally, one of the main new features that has come along recently as a result of our collaboration with the Overt project allows you to manage, instantiate, import existing Gluster domains using the Overt interface. In fact, when you deploy Overt, you can do it either for a generalized virtualization management or just for a Gluster-specific management and also in-flight encryption. Not much. It's an open SSL integration. Not much really to be said about it. And these are things working on down the road. So it gives you an overview of what we've been working on on GlusterFS and how it can be the core storage piece, distributed storage piece for a cloud deployment. And specifically, when we were talking about OpenStack, so we've made a lot of progress in the space of one year. If you rewind it back to last year, we had zero integration, except for the Swift piece. We had a hacked up version of the Swift API that we're using for our Swift integration. It wasn't blessed by the upstream Swift guys. And so in the course of one year, we've managed to have our changes accepted upstream and Swift. We then did a sender integration. And then afterwards, we did a nova integration to the point where, as a year ago, we had nothing. Now we have integration points with every layer of the OpenStack piece. So whether you're talking about sender, glance, or Swift, there's a Gluster integration there. If you go by these specific releases, a lot of the integration work started with Grizzly for OpenStack. And that was where we started doing the upstream collaboration with the Swift project. That's where we did the sender integration initially. However, with the Grizzly release, the sender integration we did still required that you have a mounted file system through the Fuse module and deploying VMs on a Gluster volume that way, which is frankly not the best performing way to do it. So with Havana, now we have the nova integration working. I go through the libgfapi client library and say, able to use the QMU integration work that we put in for the last year. And so it's a much better, will be a much better performing integration. With Glance, Glance is kind of an interesting story. I mean, I guess it works as a, if you define the access as file-based, it'll just give it to the Gluster volume mounted over file system. But you still get the same Gluster FS backend benefits. And because Glance doesn't require high performance, we're kind of trying to figure out if we really want to spend more time integrating with that piece of it. But when it comes to the nova integration, we have a pretty good story now that will be released with Havana. If you're talking about live migration, you're talking about using the synchronous application to make sure that your data consistency is maintained, all these different things that add up to some pretty good benefits. As I mentioned, we are collaborating with the Swift upstream project to your Swift API integration and to make it more pluggable for other file systems on the backend. I'm sorry? So whatever changes were needed to make the Swift API pluggable have already been accepted upstream in the Swift project. Yeah, oh yeah, it's there. The pieces that are Gluster specific are in our code base, and so we maintain those. And then we can talk about sort of what's coming up in the future. We've got Swift, Glance, Nova, Cinder, what's coming up. Well, one of the things that people like to talk about is files as a service or file systems as a service. When you're talking about all this unstructured data that needs to be housed and these scaled architectures, how do you make it available in a multi-tenant environment in a way that tenants don't clobber each other's data? Well, that's what Project Manila is designed to solve. It's a joint project between NetApp and Red Hat and the Gluster community, and I think a few other vendors are joining in. And if you want to take a look at their initial plans, you can go to the launchpad page, which I have linked here. I think the main, we're planning to have it incubated for the Ice House release. So if you're going to Hong Kong, you should see a session about that or catch up with it later. But it's a very exciting. And for Gluster, especially, it really highlights some of our core strengths. Because when you're talking about treating everything as a file, and for Gluster, it really everything comes down to file-based semantics and specifically POSIX file-based semantics. It's really right up our alley. So we're very excited about this project. We're also very excited to be working with a lot of big names in the industry. It seems to be something that is important to a lot of different people, and so there's a lot of help in really pushing this project. Another integration project working with multiple vendors on is the whole Savannah thing. The ability to distribute your Hadoop workloads across an OpenStack cloud is pretty interesting, basically helping to scale out your Hadoop Map-reduced jobs. It's, I don't know how far along this project is, but I think there's a Savannah page on OpenStack.org if you want to take a closer look. Does anyone actually use this? Is anyone familiar with it? Yeah? OK. Does it work? Fair enough. So why would you want to use Gloucester fast with OpenStack? Like I said, no silos for data. Block, file, or object, it all comes back to the same place. And you can actually use it with your existing toolkits to access the same data as you have accessible via OpenStack or really anything else. The modular, extensible architecture, the user space translator architecture is really designed to function well in a scale architecture in the cloud. It seems to be made for something like OpenStack deployments. Because we give you the freedom of whatever transport you want to use. In addition to access, we also give you choice of transport and the other things I've mentioned that make it pretty much a comprehensive solution. And now I can talk to you about some of the things that are happening as adjunct projects that you can find at forge.gloucester.org. The Forge is something released recently, I think about May of this past year. When we started noticing, and in fact, I'll come to the first example, we started noticing these projects out there on the internet that were related to Gloucester or made use of Gloucester features but were not really available on Gloucester.org or really anywhere for Gloucester users. So we thought, well, we should bring it all together and thus the idea for the Forge was born. And so now we have at the Forge some really interesting adjunct projects that can help you complete your deployment. So one of these was PMUX. And PMUX, if you think that Hadoop is overweight and more than what you need, I would recommend you take a look at this. PMUX is something that actually makes use of Gloucester FS extended attributes to do its MapReduce jobs. And so if you're looking for a log management solution or some other way to grep through much data that you have distributed over many volumes, then this is one way to do that. You create these mapper jobs and they run and then they return values. As part of this project, there's also a RESTful gateway as well as they actually create a log viewer that goes through the API gateway. It's a really interesting project. Probably I was really happy to find this. One day I saw a Twitter tweet coming from the Ruby gem account and it mentioned this. I thought, oh, I should take a look. Also on the forage, there's the project home for the Gloucester implementation of the Swift API. So if you want to take a look at that, you could do it there. Also, I think with every release of Gloucester FS, there is an accompanying Gloucester Swift piece. So if you want to check it out, it's pretty easy to do so. Also the HTFS plugin, you can find all the information about that on the forage at slash Hadoop. Notice where I say per HTFS guidelines, Hadoop compatible file system is actually a project incubating within the Hadoop community in the Apache Software Foundation. It's to make Hadoop, make HTFS more pluggable for different storage back ends like Gloucester FS. And that's basically all I have for today. Thank you. I'll see you in the next video. Bye.