 OK, thanks everyone for coming. I'm going to give a pretty high-level overview of a few scenarios of how Gluster FS and Red Hat Storage have been used in some enterprise situations. Hopefully it'll give you some interesting insight into some use cases. I'm not going to get super detailed technical. If you have interesting technical questions, I'm happy to entertain them. We also have a Gluster developer right here in the room with us who I can defer to if you guys try to hit me with a development question that I can't answer. And we also have a Gluster workshop going on tomorrow for those of you that are still around. So if you really want to get into some serious technical details, we'll be doing a lot more of that then. Feel free to interrupt me if you have a question or something. I mean, we can always have new questions at the end, but I don't mind just jumping in and answering questions in the middle of the flow. Just a real quick me. I'm a senior technical account manager. My name is Dustin Black. Work for Red Hat Global Support Services. I'm a Red Hat-certified architect. My contact information, if you'd like to get ahold of me, I've got business cards for anybody who would like one. I always like to give a little bit of a review for anybody who's interested about what it is I do, because I'm not a developer. I'm not doing code work. And I'm not even really doing hands-on technical work in a lot of cases anymore. As a technical account manager, I am a named resource assigned to some of Red Hat's largest customers. And in particular, focusing on storage are big Gluster customers who are doing some very interesting stuff. So these scenarios that I'm going to tell you about today are things that I've actually seen in the works. And I've been working directly with the customers on. Names have been changed to protect the innocent. Really, there are no names. But on a day-to-day basis, I work with these customers and their technical contacts and their strategic contacts about really helping them move along their enterprise direction, mostly with related to storage. So really quick for anybody who might be totally new to Gluster, any hands of somebody who is just interested in Gluster but doesn't really know what it is yet. OK, good. So this will be good. I'm going to touch on this really quick. This is really high level. Again, we're going to have some good technical detail tomorrow, and I'm happy to answer questions. But just to give you the quick view, the basic idea is take your commodity hardware, the stuff that you're used to using, your Dell servers, your HP servers, Supermicro, whatever it is you guys might be deploying, models that you're used to using. Take advantage of the storage that's in them or direct attached storage and be able to scale out horizontally. And basically abstract that layer on the back end, combine all of those storage and compute resources together, and then display to the end user a global namespace of storage that's being handled in the back end by Gluster. So all commodity hardware, easy to scale, very flexible, no metadata server. So a lot of other distributed storage environments will include a metadata server. We don't use that. It cuts out a single point of failure in a performance bottleneck by not having that. And another thing that's kind of nice to point out, heterogeneous commodity hardware. In most situations, you're probably going to use pretty much the same models, but Gluster doesn't really care. And you kind of see if you get into the details of it that all it really wants is a back end file system. You're just presenting it XFS or EXT or even ButterFS or something that has the right functionality and globbing it all together and presenting that to the end user. This is for the visually inclined. This kind of puts it all together for you. You've got all of your Gluster nodes down here. So these are just servers. We're just taking advantage of the local storage. We've got some Gluster services running on them and a network interconnect. And then on the other side of that, you have clients of some kind that are connecting to the services that are being provided by Gluster. The lines here kind of indicate that what we refer to as the native client or the Fuse client, which actually uses file system and user space technology to talk directly to the Gluster volumes. In this case, the native client is actually sort of a participating member in the cluster. It's very aware of the volume. And when it goes to request a file, it will directly go to the node or nodes containing the files. Whereas a lot of the other ones are a little bit more of a dumb interface. NFS is really it's going to go to one server and that server is going to then decide where the data actually is and funnel it back to the end user. LibGFAPI is a layer that we've included recently in Gluster that sort of obfuscates some of this piece, the Fuse stuff that can kind of add a little bit of weight to the transactions and slow things down. LibGFAPI cuts down to a lower level, gets rid of that interface and actually allows us a very speedy and diverse access to other services. We've moved Samba above the LibGFAPI stack that used to be down. It would have normally been down here but we've actually integrated it such that in the new releases, Samba actually has direct access to that. So we've got a much improved performance, QEMU and your own application. So the diversity of LibGFAPI and what you can do with it is sort of the sky's the limit, right? It is an API to tie into the storage system. So you could actually build your application to directly talk to the storage that way. So quickly, that's Gluster FS. What is the Red Hat storage? Well, if you're familiar with how Red Hat does the Red Hat Linux, we have an upstream that is Fedora. Fedora has all the fun stuff going on, all the new developments and eventually Red Hat says, okay, we're gonna take the best pieces of that, we're gonna QA and QE them really well, package them together and call that an enterprise product. Gonna limit a little bit what you can deploy it on but it's all sort of in the name of stability and supportability. We're effectively doing the same thing with Red Hat storage. We're taking the upstream Gluster FS pieces and taking the stable versions and releases and combining them together with some certain constraints as far as how we expect you to run them and then selling that off as a enterprise product. So the model typically looks like this and this is basically the same for Linux or Red Hat storage. That's my fancy piece of 1984 clip art there. It's kind of nice, it's got a snazzy suit. So Red Hat has a product and they release it to the customer and this is sort of shortcutting and I'm starting at Red Hat in this but you'll see as it goes through the product releases to the customer but at the same time we're making innovations upstream and we're contributing to the upstream community project with this Gluster. And then in the community we're involved in all of the work that goes on to make this a better product. We iterate, we debug, we enhance until we get to something that is release ready for the community. So now hey, Gluster FS 3.4 is out. Now that we've got that nice and stable, Red Hat's going to make some decisions at some point to say, hey, let's include this in a product. In this case is a real example, Gluster FS 3.4 which is the current release was included in Red Hat Storage 2.1 which is the current release and then that obviously makes it back to the customer eventually. This is a pretty fast process right now with Red Hat Storage. It's a little bit more latent with Linux just because it's more mature. With Red Hat Storage we're going through this pretty quickly. In fact, we actually have a couple of features in Red Hat Storage that are not yet upstream completely that are being worked on for the next release. So what I have are few use cases here and I'll just kind of tell you about them one by one at a time as we go through them but you'll see three specific cases where Gluster FS has been used in the Red Hat Storage enterprise packaging to solve some certain goals that people are trying to accomplish. So this one is the, what did I call it? You can go back. The media storage via object interface. So this is a customer who has photo, video, music data that's being uploaded by their end users and they need a place that they can scale the stuff out. This is a kind of environment that can grow very rapidly and has a very media centric nature to it because of the way the users are using the service. So what they need to do to accomplish is store the media for this customer facing app. So a customer's gonna go to this application, upload their data, it's gotta go somewhere on the back end. It needs to be a drop in replacement for a legacy infrastructure that was based on object access. So that was an important piece of this. You couldn't just put some NAS behind this and expect it to work. We need this object interface. They're going to have a pretty large deployment in this. We've got a petabyte with a projected growth of a terabyte a day. So this thing is getting pretty big pretty quick and they need minimal resistance to scaling out because of that. You know with that kind of growth pattern, the ability to just plug in something else and scale horizontally without interruption is very important. Multi-protocol capable. So they need the object access but for future purposes, for other backend services, batch processing and different things they might need to do. Multi-protocol access to the same data is important. And they also need to have fast transactions for fingerprinting and transcoding. So this piece has to do with the fact that they're taking this media data that's coming in and they're not only storing it but they're also fingerprinting it to make sure it's not copyrighted material or things like that. And they're transcoding it into multiple formats and then dumping it back into the storage system. So every upload actually has several transactions in and out of the storage system. So we need to be able to handle that well. So the implementation for these guys looks like this. This is a hardware you're familiar with, I'm sure. The Dell R710 server and they've got a stack of direct-attached storage devices attached to that. So they started with six nodes, they went to 10, they went to 12. At 12 nodes, they really kind of got to that scalable point of the petabyte of storage that they were looking for. So that's a petabyte after RAID 6. RAID 6 is a support requirement for RAID at storage. So for the object-based interface, what we have that sits sort of above the base cluster layers is this Swift integration that allows us to do object-level access to the files that are on the backend. In the backend, they're still being stored as files. They're just in a directory structure, they're just normal files. And this is really what gives us that multi-protocol capability. But for their purposes, what they needed to be able to do is access this through the object interface. And so that's a feature that we offer in Gluster. And as I said, it's built in. So that's what I was trying to describe there is no matter how you're accessing this data, no matter how you've uploaded your data to the system, it's going to be sitting on a file system. And if I go and do an LS on that file system, I'm going to see the file, whether it was uploaded from the object interface or not. We have some mappings that we do to sort of trick this into working because the object interface really only has, I can't remember the right way to describe it, the terminology is, but there's a totally losing it on the terminology for the object. The- Right, no, but I'm thinking of how you, what you call a file from the object interface and the directory structure is a- Right, right. File in an object in a container. Okay, so if you're familiar with object, obviously I'm not terribly familiar with object or I wouldn't have stumbled over that, but there's some interesting things going on there with how we map that out. The other thing that we've done here is a multi-gigabit network, so doing bonded network interfaces and also segregating the back end. It's not technically a requirement. I mean, Gluster really just needs a network interface to talk amongst the nodes, but for the kind of transactions these guys are talking about as far as putting data in, pulling it out, putting it in, pulling it out, and the transcoding, really kind of segregating that back end traffic that Gluster communicates on from the front end traffic that all of the data is on is an important thing to get the performance that they want. And again, for the visually inclined, this is kind of what it looks like. You get your Gluster nodes scaling out to your petabyte plus of storage that you want and continuing to scale. You get your back end network to keep our transactions nice and speedy here, front end network. So we have the UFO, we call Unified File on Object, the Swift interface that's going down through the layers to the Gluster nodes and they put a external load balancer in place that all of this traffic passes through. So as data comes in over the user-facing application it's going to come through the load balancer. Load balancers are going to pick a Gluster node to communicate with through the transaction and then when it's pulled out for fingerprinting work it goes back out through the load balancer and over to the fingerprinting and transcoding infrastructure and back end. So you can see there's a lot of network hops going on there. Any questions on that implementation? Just move on to the next one. Okay. So the next use case is about self-service provisioning. This is a customer that has an existing environment that they put together for themselves for push button deployment of servers. So they're basically abstracting the virtualization back in layer to be able to say, hey, I need this CPU, this memory, dragging into place and that way end user has a simple access to this. For whatever reason they built their own, they haven't used something existing but that's what they already have and currently they only really have an object-based storage interface kind of like Amazon S3 and what they're looking for is a file-based storage interface to be able to tie into these nodes as you're getting this push button deployment and Gluster is what they're really looking for. So what they wanna do again, add file storage provisioning to an existing self-service environment. They need to automate this as much as possible because it doesn't need to be a customer drags and pushes the button and says deploy and that sends an email to an administrator to do the work. That'd be ridiculous, it's automate these tasks. Another thing they're looking for that's an interesting challenge is multi-tenancy. So this actually kind of ties into this automation piece of it, it's about making, it's pre-configuring as much as possible and making as few steps as possible to allocate the resources that the customer's looking for. So they need to be able to subdivide existing resources and they need to be able to over-provision and charge back and this is obviously a big thing that's coming in a lot of environments today, right? We have this big bucket of storage, it's really great, we wanna distribute it to our users as a service but in the end we need to be able to say, hey, that department used this amount of storage or has some quota for some amount of storage and we need to be able to have a charge back model for that. It also needs to be, again, simple and transparent scaling. You'll see this as a common goal for anybody deploying Gluster because it's really one of the key points, right? The idea that we can scale the backend horizontally or even vertically without disrupting the end user. So this implementation, again, some Dell R servers like you're probably familiar with, 510s in this case, 30 terabytes of node. What these guys have done, if you take a look at Gluster architecture and what usually constitutes a brick on the backend which is a backend file system that's being used to combine them together across the servers to create the total storage, usually you will carve out a file system per brick in most implementations. For their multi-tenancy structure, they don't need to be creating new file systems every time a user is requesting storage. They need to cut out that overhead. So what we're going to do is have one big XFS file system that is the brick on each node and only one file system and not have to create new file systems over time. So in this case, what you've done is you have one primary directory that is the XFS mount point and then each brick is actually going to be a subdirectory of that. And therefore, you're getting your over provisioning already because now each brick believes at least on the surface that it has all of the storage available. So if I create 20 bricks and I only have 20 terabytes of data, I have 20 bricks that each think they have 20 terabytes of data. So I've already over provisioned but I need a method to control that. So that's the other point that comes into place is the quotas. Now quotas are a new feature. I put a little asterisk there just to make sure everybody's aware. This is not yet in Gluster FS. This is one of the rare cases of the downstream first implementation but it's on its way. It should be in 3.5, right, Vijay? Yes, it's already available upstream now. Okay. It's an improved version of quotas. Right. So it's getting better all the time. But to solve the customer case and be able to handle this from an enterprise standpoint, we've got this already in Red Hat Storage 2.1. It's tech preview until update one comes out. Don't take that as official but that's the word right now and update one is supposed to be out at the end of this month at which point this feature should be supported. So by placing the quotas in the volumes, we can over provision but then we can also limit each one of those exports to a certain amount of storage. And when you put all this together, what you end up with is you've only got four Gluster commands that are required and are easily scripted and automated. So this one looks a bit like this. It's kind of hard to put this one together totally visually so I apologize for all the text on here and it sort of flows from the bottom up. So just to kind of show you what's going on, I've been really showing, zooming in here, okay, we've got one XFS file system and that XFS file system has our individual bricks that are being carved out of it for the volumes that we're gonna create. Our volumes are being created for different groups within the organization. So I have one business unit that wants to have a certain number of terabytes of data available. I can create them a volume and I can put that quota around it and say here's your volume and here's your storage that's available. On the back end, all it's really doing is creating a new subdirectory of the XFS file system. On the front end, it's just carving out, it's displaying a new volume that's using the same space. The layer above that is as I apply the quota, the quota is really determining what the user sees. So what I'm trying to show here is volume one has a quota of quota one, which is X size, right? So as you see how these scale out, you'll notice that we've actually over provisioned in this case. We've allotted more storage and it's actually available and that's great, we wanna be able to do that. I think that covers all of that. The four commands that you end up having to run are down here, so you can see it really is that simple to create a new volume with a new quota on it and have that ready to go. I just need to do a volume create. I do a volume start to get it going. I enable quota on it and I set the quota limit. That's it, it's done. Any questions on this one? Okay, feel free to ask at the end, too, or interrupt me. So the third use case I wanna look at here is also a pretty interesting and innovative approach that customer had. Interestingly, they were trying to solve a problem that we weren't quite ready to solve and they kind of took their own way to do it. What they needed is multiple data center, two-way replication for active-active sites with an SLA for the replication around it. The problem is when this one came up is that our Gluster geo-replication infrastructure was really only designed for archive and disaster recovery. The idea of wrapping an SLA around it, particularly with this customer's small file use case, was kind of, it wasn't going to happen. It wasn't going to work very well. So it took a lot of work with the customer and they had already solved a big chunk of the problem with some interesting architecture of their own and then we came in and worked with them on the engineering side to kind of fill it in the rest of the holes. So in this case, what they had is a legacy system that was basically an Oracle database that was using like a key blob configuration. So when data was being uploaded, there was a key being put in the Oracle database and this chunk of data, which was effectively a PDF file, was also being stored in the database. Not a very efficient thing to do, but it's what they had done over time. Obviously, when you see this and you realize you're using an Oracle database for something that's not terribly relational, this thing screams NoSQL, right? All we have is a key and some data. Great, it's NoSQL. So they really wanted to do this divide and conquer approach to replace this legacy infrastructure. Let's take out the simple part of this, the NoSQL part of this, put it in its own layer that'll easily replicate with some good known technology and take that blob and let's put it in something that makes sense. Let's put it in a file system instead of storing it in the database. So again, what they're looking for is an active, active site with a 30 minute replication SLA and I'll tell you in the beginning, we weren't getting anywhere close to 30 minutes that replication based on their data patterns. And the last piece of this is performance tune for small file worm patterns. This is also pretty unique. This is not a sweet spot for Gluster to be able to work with small files. We traditionally have performed a lot better with larger files and sequential reads and writes. When you get to these small file workload and the way that their application was designed for small file random read and writes, it was causing additional issues with the performance that they were looking for. So their implementation, again, this time HP servers but models that you're familiar with. So, you know, simple commodity hardware. The amount of storage is really not that great for terabytes of nodes. These guys are actually using smaller, faster disks instead of really needing a ton of storage. Because again, it's just thousands and thousands of hundreds of thousands of small files. And the no SQL layer is Cassandra. If anybody's familiar with Cassandra, you know, it really does its job quite well. It also handles the replication really well. In fact, it's quite fast at that piece of it. And then the Gluster piece of this is what we're referring to as parallel geo-replication. This is another one, I'm not sure. Oh yeah, I did put my asterisk there. So again, this is something that we have in Red Hat Storage 2.1. It's still working its way upstream and getting improved. There actually be an improved version of it with Gluster FS 3.5. This is very interesting to look at because the traditional way that geo-replication happened was that you would go to a particular Gluster node to set up the geo-replication and you would point it to a remote Gluster node. And those two nodes were the only ones that participated in a replication process. If one of those ever went down, if it was down for maintenance, you have no replication. Doesn't happen. But again, it was only designed for disaster recovery archival situations. So the new code, what it was able to do is actually, the two things really, number one was parallelized that geo-replication so that all nodes on each side are participating. So now we're able to funnel that through a lot more pipes instead of just through two nodes to get bottlenecked. The other thing is that the scanning side of it, the piece of it that was effectively just R-sync in the previous version was optimized to better handle a small file environment. So the scanning is not as heavy and the replication is now happening very parallelized. We added to that that we worked with the customer in this case to take their patterns and try to change the way their application wrote these files in such a way to make them more sequential. They had a very random pattern originally with their application. As we understood the performance improvements they would get by making a more sequential pattern with the way they wrote into the directories, they got a drastic performance improvement on that. This is another one that's a little bit hard to visualize. So if anybody's confused about this, it's probably because my graphics are terrible. But I'll try to explain this a bit. So Cassandra is there on the top, it's doing its two-way replication. It's keeping its keys and pointers to the locations of the files very well in sync. I think their SLA on this is minutes if not seconds. It's very fast to get this data replicated across. What they've done is they have a Gluster client which is really their file server that's acting on each side. The Gluster client is mounting the Gluster server that export from each side. So we have a volume on each side. On the local side, I'll call this local in that remote, on the local side we have a master volume that's replicating to a slave volume over here and vice versa on this side. As they're mounted over here, you'll see Gluster one is from the master volume, Gluster two is from the slave volume. So in the directory structure, Gluster one is always gonna point to this volume, Gluster two to this volume. On the flip side over here, what we've done to sort of trick Cassandra is we've flipped the way they're mounted. My lines didn't work out very well there but they didn't work out well at all. On this side, Gluster one is actually mounted to the slave volume and Gluster two is mounted to the master volume. If you think about what that does, it's flipped it around in such a way that no matter which side you're on, you're looking at the same structure with the way that they're mounted. So Cassandra doesn't really have to know whether one of these is writable and the other one is not because in this case, the way they're mounted up, because we can't do two-way replication, the slave volume on either side cannot be written to. It can only be read from. So we're only replicating one way or the other. We're just doing it from each side. So this allows for, there has to be a little bit of logic in their application that says, if I'm on this data center, then only write to this directory. If I'm on that data center, only write to this directory. But when I go to read, the application on both sides is basically reading from the same structure. Looks exactly the same to it. So again, it's a little bit of a difficult one to visualize. I'm happy to explain it further for anybody who has interest in that. But it was certainly a novel approach. And like I said, they had really solved a lot of this problem with the structure before we even got in and we kind of came in and improved the product to just fill in those last gaps and get the performance levels that they're really looking for. And in the end, the 30-minute SLA that the customer was trying to achieve for the data replication, we were able to well exceed that. So with the combination of enhancements that were put into place, it really, we saw a lot of improvement. All right. And not having actually timed this before, it seems to have run pretty short. But I'm happy to talk more about it and get into any detailed questions anybody might have. Any ideas about the particular scenarios or anything else? So small files again, are definitely not the best use case, but we've had pretty drastic improvements over recent releases. There's been a number of things happening, excuse me, happening in the code to improve that. Vijay, do you have in particular input on that? So you have also to have a view for, so in some cases, very large number of entries in a directory can be slow, but we are bringing about a new translator. Translator is a functional unit and it has functionality. This is called three data head and that would be available in a three to five, which has shown a good improvement over the existing data. Sorry, let me just say that. We've had a first contact in a long time. In this case, with the 30 minutes of application SLA, so how did you monitor the 50 SLA was achieved within that time frame? And how did you communicate that achievement with the customer? It's definitely tricky because we don't have good tools for actually checking both sides and to see if it exists. I mean, you're basically stuck with file system level tools to look for the existence of files being replicated. We know now that they're being written in a sequential manner and we can expect a certain pattern with how they get replicated. So we can limit the scope of where we're looking for the files to determine that we're getting the files. Basically, the customer runs pretty manual checks on that on occasion just to confirm that they're getting what they want. There's some feedback from the tool too that will tell you you can run some Gluster commands that will give you feedback on what the status is of the replication. It just won't let you drill down so you can't see what exactly has or hasn't replicated. So if you want to get a good idea of which files have made it, then you really have to do that at a fairly manual level. A lot of times we ended up looking at it on the brick side instead of at the total file system because of the size of their file system. If you were going to try to run a scan from the client mount, it could be a pretty daunting task. Normally we try not to do operations on the brick. If they're only read operations, this should be safe. But the typical direction that we try to give people is don't touch the brick. The brick is kind of sacred ground. But checking at the brick level for something like this, it's a little bit less heavy to check on a one-to-one. We know with the parallel geo-replication it actually pairs up nodes. So previously it would actually work through the normal translators, the single stream replication. The replication would go to the remote host and the remote host would use what's called the distributed hash algorithm to decide where to place the file on the remote side. It's possible it doesn't match the local side because of the just could be a different number of nodes or something could be changing the algorithm. With the parallel geo-replication, nodes get paired up one-to-one with each other or sometimes one-to-two depending on what the architect is like. So we can actually say if the file started from this node, we know which node it was destined for. So it makes looking for it a little bit easier. But the simple answer is it's not simple. There's not a good tool for that yet. So it's a little bit of a manual process. Yeah? And would you say it's an idea to go in your cluster as best servers or breaks on virtual machines? Absolutely terrible idea. No, it absolutely can be done. And there are a lot of good places for it. When you start looking at a complete cloud infrastructure and what you might be trying to accomplish with that, we have some pretty interesting scenarios for how to deploy that on virtual machines very purposefully. So imagine that we would do this in a rev environment, a virtualization environment. And we would take the storage from the local machine and basically allocate that storage to a virtual machine running Gluster. And you would do that on each of the nodes. And then the Gluster virtual machines would become your virtualized storage later, presenting storage back to the hosts, back to the hypervisors, in order to distribute to the other virtual machines. Kind of an interesting recursive configuration, but certainly supported to run on virtual machines in the right use case, in the right environment, it's a good use case. One of the things there is obviously the performance requirements of Gluster itself are pretty minimal when it comes to the total resources in the machine. There are a few operations that can be a little bit memory and CPU heavy, but normal steady state operations are pretty low as far as system resources. That's a good question. I imagine there are some steps to take. Do you have any insight on that, BJs? As far as what, the question was what, if you were going to run Gluster on a virtual machine, is there any particular tunings you might do at the operating system or within Gluster to account for the fact that you're running on a VM? That's our qualifications. Okay. Other questions? What kind of programmatic interface do you have for doing reads and writes to and from Gluster? Is there a good Python and Java APIs to just import a library, say, Gluster, write this file, read this file, and such? So that's in the works with libgfapi, right? So libgfapi is the API way to access Gluster. So we have C bindings and Python bindings, so Java bindings are emerging for that. So you can use that to write your own custom application that talks to Gluster. That's certainly a pretty significant improvement as I was talking about before to the fuse interface while it is really, really useful for getting the job done simply without having to go actually having to write a file system in the kernel space. It has its limitations, especially when it comes to performance and the additional transactions you have crossing the user and kernel space line. And libgfapi basically gets us around that. It cuts out the fuse layer and gives us that ability. It's a nice improvement. The reason I asked more for application development is for considerations that at the moment we're using MongoDB's GridFS as a sort of lock store for terabytes of the F and B and G data. And the kind of upside of that is we get to use just Python drivers to read, you know, JSON metadata about stuff and read files and write files, so it's a really convenient way to write applications and see how to be good to have that rather than needing to do a lot of system operations to get files. Right, yeah, so it's good. It's emerging for us right now. You know, as he said, we've got C and, did you say Python too? We have C and Python and Java's in the works, so. It's good. Is there another question? Yes. How many, how many of us can I create in the last class? A lot. Do you know what the number is, Vijay? There's no physical limit to it? I don't know. Yeah, so there are some practical limits that we run into because I mean, every volume that you create and every quick that you also create leads to a process of being created to export that like. So what a point of time, I mean, if you create thousands of bricks on a single node, I mean that would translate to thousands of processes and that can impact your behavior of your system. Right. That's the recommendation, but there is no hard limit within code on the maximum number of volumes that you can create. What is the limit for maximum number of user ID, room 20, and what about extended for six other groups? Right. So in terms of auxiliary tools, till very recently, I mentioned that we're supporting 32 through the fuse access and a little bit more through the other access. Where the cluster has 35, the plan is to move to 64K GIDs, and we also support the post-exactors. And as far as post-exactors go on in it, this translates to XML attributes and we also have a transfer between cluster and that is just based on the point of elegance. Is there some additional API to configure what the best for outside of application, for example? I need special server which wants to configure my cluster, then is there some API for the API? Yeah. So we also integrated with the forward. So that provides you the best APIs for performing cluster management operations. It also provides you in this meeting for doing cluster management operations and a nice going also for managing the cluster. Yeah, so just for the sake of anybody who maybe didn't hear that, the basic question is how far can we scale these things, right? How many volumes of bricks? And it really becomes a limitation of other fairly normal system things. How many processes can I run or how much memory does each one of these take up? When do I hit limitations on my system? For most practical use cases, it's just a matter of tuning for what you expect to use. Think ahead of time, are you really going to run hundreds or thousands of volumes and bricks? And if you have a case to do that, build the systems and tune them accordingly for that. But cluster itself is not going to have a real limitation. Do we have to rebalance every time we add new servers? There is a rebalance operation that happens. So I can get a little technical detail about the basic hash algorithm that Gluster does. So it's called the distributed hash algorithm and it's effectively hashing on a file name to decide what node a file goes to. So the decision it makes on what node to put a file on is based on how many nodes are in the system and what the file name is, right? If I add an additional node, then it's going to change the decisions that the distributed hash algorithm would make and thus put files in a different location. So that's the long answer to effectively, yes, if I scale out, I have to run a rebalance operation every time and the rebalance operation has really two activities that it does. One of them is to basically let the distributed hash algorithm know that the architecture has changed so that new files can get written to the new location. But the other piece of it is let's take existing files that are now sort of wrongly located from the perspective of the distributed hash algorithm and now put them in a location where it makes a little more sense. They used to be a lot more heavy lifting too. The newer version of it is pretty lightweight. So you can still ensure that all writes to such directories go to the word set of nodes and if you create a new directory data that has been created and that means that could can also potentially get to the new one. So you need not run a rebalance as soon as you bring in a new cluster. So is there any doubt that when the rebalance to actually move the files, there's no down time? There's obviously a performance impact of some kind. You know, in a lot of enterprise situations it'll be smart to do these during some maintenance window but there is no required down time for doing scaling out or back or up or down. All of these operations can be done transparently. But of course, each one of those requires some lifting of files and moving them around. More questions? We have plenty of time. We're happy to talk in any more detail about the architecture of Gluster since we didn't really touch on that much today if anybody's interested. I have some other slides for another presentation I could pull up. If anybody did want to see that. But also as I said, we have a full day Gluster community day tomorrow at the Sheridan Hotel. So that we'll have a full spread of everything from sort of state of the union and system administration and development and a lot of other aspects of Gluster. I always like to close out with a bit of a motivation to go do this. This stuff is really, really easy to go get and install if you're already running a recent version of Fedora. The bits are already there just to do a yum install and hey, now you have a Gluster server. The Red Hat bits are available on access.redhat.com if you've got a account or if you sign up for a demo version you can download our ISO build. Red Hat packages up and distributes it as what we would sort of call a software appliance. So the only real supported version of Gluster that we have is an ISO that is a combination of RHEL 6 plus XFS plus Gluster and with a bunch of the fluff cut out of it so it acts as an appliance. You can also go, there's a Amazon image that you can go build from and test in the cloud very easily or you could obviously go upstream and pull the latest bits down. Get everything on Git for that. So get out there and do it up. I always tell people, you know, if you get bored in the hotel tonight you can have a cluster up and running on virtual machines in like 20 minutes. That includes download time. I mean, it's like, it's really quite quick to get this up and going. Contact information, feel free to take a shot or you can go and get the slides. I have not uploaded them yet. So that URL that people.redhat.com slash dblack will be a good URL. I have not put the slides there yet, but I will. But my contact information as far as getting information about Red Hat Storage or Gluster or communicating with us on social media, it's all there. So if you want to own my own business cards I'll dig them out of my backpack for you too. Any other questions or thoughts before we wrap it up? Great, thanks everybody for coming. Appreciate it.