 Yeah, we're running all over the place. Okay. We've got a very small community outside of Red Hat, so it was originally written by Dominique. It's Dominique's last name. I can't remember his last name, but he has a CDA right up to the top computer question. This is the original author, and the other people at IBM have picked it up and run it at IBM and ships it. It looks like the Dominique figure. It looks like the place where he was supposed to be. Yeah. Well, he originally wrote a name. The big user is Plantai, and it's on his page. Kaushal, did you have this? Yeah. That's what's saying to my butt. That's what's saying to me. Thanks. Thanks. Let's start off with the last presentation of the day. After this, in this room, we're pretty much done. You can go to the lighting talks. Last reminder for the feedback. Please give feedback for the presenters. There are some prizes, if they're good. And I want to introduce Caleb, who will talk to us about how to improve NFS, which has a lot of flaws, but hopefully with NFS and Ganesha, we're going to fix that. Hi, so I'm Caleb. Thanks for coming. It is late in the day. I recognize that. I also know from personal experience that I have a speaking voice that puts people to sleep. So take a deep breath, get some oxygen going, and I will try to not put you to sleep. I'm going to talk about NFS Ganesha. NFS Ganesha, if I say that right, pronounce it correctly. The NFS Ganesha is an NFS V4 server. Actually, it's a lot more than that. So the outline of my talk is, first I'm going to explain what NFS Ganesha is because probably not very many people have heard about it so far. I'll tell you after I talk a little bit about what it is and how it works, I'll talk about what the current state of shipping, what we're currently shipping in Red Hat Gluster Storage, and Fedora, and Rel, and Epel, and the Storage SIG, and so on. I'll tell you what's coming in the next releases that are being worked on right now. And finally, I'll do a demo. And based on Louise's experience, I've actually recorded my demo. So I will just play the recording and pretend that I'm typing, and actually that way you won't have to watch me fat finger all my typing. Yeah, so let's move on. So NFS Ganesha is a user space NFS server. A lot of people kind of react to that. Historically, we've even seen people react to Gluster and say a file server and user space. Isn't that performance really bad? So let me just head that off at the pass if I can use an Americanism. Cowboys and the Indians are coming and we want to head them off at the pass. The performance is quite good. It's not as good as kernel NFS, but it's quite good. But there's other reasons why people ask, and the answers are many. One, it has a much faster development cycle. We have a very small community. We have a very small development team. We can get stuff done fast. I've watched the Linux kernel development for many, many years. I was using open source operating systems back in 1993 when Linux started shipping the first versions of Linux. We all know what the kernel development cycle is. Getting patches reviewed and accepted upstream in the kernel can be a very daunting process. We have a much lighter weight process just developing outside in user space much faster, much lighter weight, much faster development cycle. When things crash, we all know it's easier to debug user space programs. If somebody wants to disagree with me, raise your hand and we can arm wrestle out in the corridor later. One thing that's definitely true, Ganesha in user space is easier to scale out. Now, the difference in case anybody's unclear about the difference between scale up and scale out is when we talk about scale up, we talk about a single box and we put more disks in it or we put more network cards in it or we put more CPU or more RAM and there's only so much you can put in a box. Eventually you're going to run out of room to add things to your box. And then you're done. You've scaled up as far as you can go and that's as much as you're going to get out of that box. When we talk about scale out, we're talking about adding more machines and beyond. And then of course in each of those boxes, you can scale those boxes up but by scaling out we have the ability to add many, many more resources than you could possibly do if you're limited to a single box. And when we talk about kernel NFS to export a volume or a set of volumes, it becomes very difficult to scale out with the current kernel NFS servers. It's easier to access other servers. So things like Kerberos, Authentication and LDAP, we jump through some real hoops. I work indirectly with Steve Dixon who's in Westford and the rest of the NFS team and when we talk about we have to run LDAP servers and we have to run Curb 5 agents and then we have to have all this chain of things up and down back and forth between kernel space and user space. So it gets racy and there are genuine issues with how complicated it is to interface with these other services. It becomes a much, much, much less of an issue when we're all running in user space. It's very easy to talk to these external agents. And it's easier to integrate with Gluster, things like Gluster and I'll talk more about that. I'll show you more about that in a minute. Ganesha, NFS Ganesha is in a lot of ways analogous to what the Samba server is and we don't ever hear anybody, well Michael and Günter might disagree with me. Nobody ever says, hey we should put Samba in the kernel. In that regard we apply the same logic and we say there's no reason to put Ganesha in the kernel. We do have a kernel NFS but like I said we have these issues. And one of the ways that we interface with things like GlusterFS and Ceph is we have this notion, which I'll show you in a minute, of something called the Fassal. If you've been watching Samba for any amount of time the Samba implementation has something they call a VFS virtual file system layer. This lets Samba talk to any kind of back end, back end storage besides the local disk. Ganesha has the same thing. We call it a Fassal, a file system abstraction layer. And this is a plug-in architecture that lets us write bindings to other things that aren't just Aposix file system underneath and I'll show you that more about that in a minute. Am I talking too fast? That's right. Okay. So here's a block diagram of what the Ganesha server looks like. So we've got I stole this slide from a colleague. We've got a network 4 channel. This is just the NFS listening on the network for NFS protocol. We get the RPCs. This is SunRPC. We dispatch it. It goes down through the different layers depending on which protocol we're talking through a cache inode layer and then into the Fassal. In our case, the Fassal Gluster knows how to talk the Gluster protocol. This actually gives me a teeny little entree for a second about three presentations ago in this room. Now I'm going to draw a blank on his name. I had it on the Vladis Rav who said, oh well, NFS is a file system and all these other things like Gluster they're not file systems. I'm an old protocol guy going back to X protocol and some other things. I would say that all of these things Gluster, we have a protocol SMB as a protocol. These are all things that look like file systems. NFS, despite the name, NFS is a wire protocol and you can use the wire protocol to implement things that look kind of like file systems but it's not a file system. And Gluster is not a file system. SMB is not a file system and NFS is not a file system. I don't want to be too contentious with that presenter but that's just Well, it's because we have a fuse bridge that talks through the kernel VFS to fuse and then we turn it into a wire protocol to talk to the Gluster servers. So we fake it. Okay, so features that are in Ganesha we're a protocol compliant. We try to be precisely protocol compliant with the published RFCs. We have NFS V3. We have NFS V4. We have NFS V4.1. We even have PNFS. The things that are in 4.2 they're coming soon but they're not there yet. Some of you are thinking, if you're already using Gluster we've already got an NFS server in Gluster. That server is only NFS V3 and I'll say more about that in another slide. As I showed you in the block diagram we have a pluggable file system abstraction layer. There are many others besides just Gluster. The one most people will use and I'll show you in the demo is called the FASAL XFS. This just talks to the file system on the back end. We have Gluster. There's a CFFS. IBM ships a GPFS. There's a proxy FASAL that you can use to proxy NFS and aggregate, you can even actually aggregate multiple NFS servers into a single namespace with the proxy. Ganesha uses Dbuss. You can control exports dynamically. It's a simple Dbuss command. Ganesha will use as much memory as you want to throw at it. It can have huge metadata and data caches. I said better security and auth, well that's kind of a little lie because it's not really any better or worse than kernel NFS but we have all the same auth that you can do with kernel NFS so LDAP, Kerberos, all of those things. As I was alluding to earlier we have simpler access. We have a very active community. The original author is a Dominique whose surname I forget. He's at the French Atomic Energy Commission, CTA. IBM are actively involved and of course here at Red Hat we have four seven plus active developers. So that's Ganesha all by itself. We're doing things in the Gluster space specifically to enhance what Ganesha does. One of the big things we've done is we've built both upstream and the downstream product is a highly available cluster at NFS. If you deploy a cluster of Ganesha servers let's say four, each of those four servers has one or more virtual IPs with that we can eliminate any single point of failure. We're doing active-active so if you're not familiar I'm sure you're probably familiar with the difference between active-passive and active-active but if you're not in a cluster say with four servers with four nodes we've got four Ganesha servers. They're all actively used to serve NFS if any one of them fails we move the virtual IP to another machine and the three surviving machines carry on and continue to serve all the clients seamlessly without most clients won't even notice that one of the servers has died. The pacemaker and core sync to do this these are actively developed this is an actively developed high availability technology that Red Hat ships and is available in Fedora and sent to us and a number of other Linux distributions so we've eliminated one of the things we've accomplished by doing this is we've eliminated the single point of failure so in a single scale where if that machine dies or any of the processes dies you have a single point of failure you're dead in the water until you can get that machine back online with clustered NFS using Ganesha we've eliminated a single point of failure we've managed a shared state of the volume with D-Bus and NFSgrace we do things when we do a fail over we do things like migrate the locks from the failed server over to the other servers so this is part of the whole seamless invisible failover that you get if an application on a client had locks those locks get migrated to the surviving machines and the lock state is preserved across the cluster the locks are basically they're reclaimed by the surviving servers we have functionality that we're building in the cluster right now there's some preliminary stuff in the current versions called up call and we can do things like invalidate caches on the clients cluster wide across all the nodes so some of the sort of strange behavior you see sometimes if you delete a file on one server or on one client and another client is talking to another server the if that cache hasn't been invalidated across the whole cluster another client talking to another server might think that file still exists when it's really been deleted so we use the clusters up call between all the Ganesha servers we get a much better consistent state across the whole cluster we have cluster wide lease which is lease locks which is basically a short term locking mechanism that times out in SMB we call this op locks I believe in NFS they're called reservations so you take a lock you're granted the lock for a limited amount of time after the time after that time elapses the lock is either automatically released or the application requests a new lease on that lock and we do this through clusters up call mechanism we also have it's not there yet it's coming in the next version we have the ability for clients that are using PNFS to be able to retrieve PNFS layouts and that's all done through the lease mechanism we've overloaded the lease mechanism so what are we shipping today in Fedora 22 we've got Ganesha 2.2 and it's a company in RPC library are bundled if you're in the Fedora community and you see the word bundled immediately bells and whistles and flags go off and everybody gets upset we were granted a a bundling exception through Fedora 23 we actually beat that but in Fedora 23 now we're shipping Ganesha 2.3 and the unbundled version of the library if you're actively if you're interested in using we've got a new version in the pipeline it's in the testing repo right now we have it's available in Apple it's available in the CentOS storage SIG if you're the whole world isn't just Fedora, CentOS and REL so we are providing community binaries to other distributions am I doing on time seems like I might be going really fast that's all right we'll slow down when we get to the demo any questions so far no okay we're not sitting still upstream Ganesha continues to drive forward 230 was released a couple months ago features and it included things that was primarily focused around being a stability and performance improvement release we got the we got TIRPC the new TIRPC unbundled we got ACLs we've got the upcall infrastructure we put some performance improvements in our implementation of RFASOL there's multiple data server support that this is the things like the lock rec lock reclamation and so on currently we're actively working on Ganesha 24 that will probably get released shortly before Fedora 24 so we hope to have Ganesha 2.4 in Fedora 24 the FASOL interface is being revamped we're doing more performance work we've got a new version of the RPC library we're trying very hard to get rich ACL support into it a lot of what's holding that up actually not just there's a lot of reasons some of it's in Samba some of it's in the kernel some of it's in some of the kernel file systems and hopefully this all gets resolved and we can have rich ACL support along with everybody else that's over on the Ganesha side what we're doing in Gluster to accommodate a lot of that we have work ongoing between our Samba team half of which are sitting right here and our Gluster team in Bangalore and elsewhere who are working on what we call Converged Multi-Protocol and what that means is that you'll be able to have a single Gluster backend being fronted by a Ganesha server or a Ganesha cluster and a Samba cluster and you'll be able to modify, read, write modify, delete files through NFS and through and through Samba and the whole world will be consistent now if you're trying to do that today that's actually a no-no we tell you don't do that a lot of people do it anyway and then things crash and burn and then we say well we told you but we're looking at making this work in commercial servers you can somehow they've solved it or they lie I don't know but we're looking to make it work for real with the next release of Gluster and Ganesha the next releases of Gluster, Ganesha and Samba we also part and parcel of that is that we the HA that we have now that's just for Ganesha will be expanded to encompass commercial servers as well so any failure there'll be no single point of failure in either Samba or Gluster or Ganesha and that will all come under one roof under one implementation and we're getting improved better up-call and lease support so that some of the preliminary sort of early implementations of delegations and op-locks will get better and faster we'll have pNFS layout recall you'll be able to do that if you want to use pNFS you can use pNFS today it works today but there's no layout recall another thing we're getting largely through work that's going on in Gluster is that we'll have ex-adders on NFS currently no NFS implementation that I know of has the ability to set ex-adders so this gives us things like labeled NFS so SELinux on your on your NFS volumes unique NFS epics we're adding a question what is the Linux support in its specification I think what do you do about ex-adders Niels is over there grinning maybe I'll let him answer that go ahead so I just remembered I'm supposed to repeat the question and I'll also repeat the answer maybe I'll even repeat the answer correctly the question is with NFS v4.1 there's part of the specification for setting ex-adders on NFS volumes and the question is how do we handle that and Niels answer was say that again so for NFS 4.1 there is labeled NFS which was a specific ex-adder attribute to support SELinux this list labeled NFS but also ex-adder attributes for NFS volumes ex-adder attributes for NFS volumes is in addition to the basic NFS protocol I think that is 4.3 it's coming out maybe 4.4 and it's currently under review so this is a new feature okay I'm not actually going to try and repeat that and of course we're doing we're very busy fixing lots of bugs and adding new features for 3.8 trying to go to the next slide there is stuff ongoing in CEPH2 I haven't talked a lot about CEPH but CEPH is a big part of our storage strategy here at Red Hat there is stuff going on not the most brilliant slide in the world beyond GlusterFS 3.8 all as it relates to Ganesha and SMB, Samba as well we're doing more work with compound operations so you'll be able to do things that think that NFS V4 whatever allows compound operations are things like an open analog or an open analog or and so on rich ACLs above and beyond the rich ACL work that we already have ongoing more work with shared reservations for PNFS for the PNFS we're starting to lay the ground work now for multiple metadata servers and the layout recall that I talked about in an earlier slide the originally we were going to do some stuff for Manila for OpenStack Manila but I've been told that that was incorrect and rather than delete the whole line I just put an OPE not going to happen kind of an example of a compound operation is a special one called server side copy rather than if you want to copy gigabyte VM images or your video files rather than download 10 gigabytes across the network just to send it back you can send it and let the server do it all server side and not burn up your network atomic open is another flavor of compound operations so things like open and share or open and lease open right close all bundled up into a single atomic operation there are a couple little things in 4.2 that haven't been implemented yet we're going to do those start working on those and then one that scares some people like our friends at Facebook we don't really care to maintain both the old legacy Gluster NFS we call it to differentiate we call it GNFS or Gluster NFS which is the legacy NFS v3 server we want to get rid of that and just switch to doing Ganesha for all of our NFS so the code isn't ever going to disappear there's plenty of legacy code sitting and get repos and what not but ultimately we're going to start turning it off so initially we'll probably just say when you start a Gluster volume you won't start the Gluster NFS server anymore and then maybe in the next release we'll do more to disable it and eventually we'll print it all out of the source tree any questions about that did I turn it off somehow yeah it's green now it wants to work yes well we have that's on our roadmap to do so as a flavor I always get referrals and reservations twisted around in my head and now I'm having a senior moment because I'm only 30 years old in base 19 but send me an email and I can tell you what we've got going on there how am I doing on time 10 10 minutes left 11 minutes so plenty of time so ok so now I'm going to show you the demo and I'm not going to play a recording and maybe I'll have to pause it but I recorded this the other day but it would be just better to play the recording than have you watch me try and type because I can't ever really type when I'm under pressure but these demos are all running on sentos 7 vm's the biners that we're using are all all come straight from the sentos storage sig that nils is really the does all the heavy lifting on I'm supposed to do some of it but nils usually gets done before I do and in these demos actually I'm going to cheat nobody ever cheats in demos but the way I'm going to cheat is I'm not going to bore you by showing you installing cluster and creating volumes that part is already done and I'm going to show you three things this is based on a blog article I wrote called scaling scale out nfs crawl walk and run so the crawl demo I'm just going to show a very simple Ganesha server serving a regular file system export in the walk demo I'm going to show pretty much the same thing but it's going to be a cluster volume and then in the third one where we run I'm going to show you a full 4 node highly availability with pacemaker and I'm going to show I'm going to kill the Ganesha server and you'll see the virtual IP failover seamlessly and the client will continue to read and write from the volume so bear with me for a second while I switch over to find my mouse find my window and I'll try and keep up and I might have to pause so this is the crawl demo and so far I'm able to keep up so remember all the software packages have already been installed this is we're looking at the Ganesha export you can see the path we're going to export we use the XFS Fasal to export that particular path I'm going to start the Ganesha server and I'm going to watch the Ganesha log and wait and see for it to come out of grace certain fops in NFS are blocked by during the grace period so there are a number of things that will put an NFS server into grace when it starts like this it's actually sort of unnecessary but it's the way Ganesha is so we kind of have to live with it and the default grace period is fairly short it's about 60 seconds we'll all sit here and watch to come out of grace there's a TV show in America called Jeopardy and I would whistle the music that they play while they're waiting for the answer but if you know the Jeopardy show the game show in America you can just whistle this along in your head came out of grace so now we're going to mount the volume mount the NFS export the path that was specified in the config file and I'll show you that it's mounted and for good measure I'll show you the sequence of the NFS4 mount we didn't accidentally get NFSv3 we'll do a couple file operations and we'll tear it down I was worried that this was going to run faster than I could talk so now we're going to switch to another virtual machine the walk demo it's going to look amazingly like the one you just saw so I apologize if you're bored but what is different is I'm going to start the cluster volume I've already got a cluster volume setup I should be able to make 5 minutes I'm going to start the cluster volume the cluster volume started I'm going to show you the config file again I'll show you what's different here this time is that the clients will use to reference the export but we're using a cluster facade to export this time so there is no actual path anywhere in the file system called slash walk on the server it's purely kind of a virtual path that's provided by the cluster server now we have to wait for grace again and you can sort of hum along the jeopardy theme in your mind the jeopardy theme then your homework will be to go look it up on YouTube somebody playing it for us I don't know maybe 5 minutes isn't enough if I have to wait for the jeopardy music oh ok we can so oops so it emerged from grace so we mount the volume and I won't let's skip ahead because you've seen this let me show you did I skip too far ahead this is now a 4 volume 2 by 2 distributed plus replica volume running on 4 different across 4 different nodes and I'll just briefly show you the config file so this is actually the default as we move ahead I'll just show you briefly what the HA config setup looks like so we have a cluster name there's a server which is used for shared state across the cluster these are all the names of the nodes that are participating in the cluster and these are the 4 virtual IP addresses 4 virtual IP addresses that each of those nodes will be assigned by a pacemaker and let's jump ahead come on fingers so I used the Gluster CLI to start the file to create the export and we've okay so the important thing here is that the 1 virtual IP the 1st virtual IP is on node 1 the 2nd virtual IP is on node 2 3rd virtual IP is on node 3 and so on let's jump ahead so we've now mounted this volume I've got 1 minute left I think but I'm also the last presentation so again you can just proving that it's really mounted it's going to have the best format now what's going to I'm going to show you I'm going to switch to another window here in a moment and I'm going to kill the Ganesha server and then we'll come back to this window and now you can see that virtual IP 1 is running on node 1 virtual IP 2 has moved over to node 1 and we'll come back down here and we'll see that let's jump ahead unless it starts typing we're going to wait for it to come out of grace it came out of grace I think recording this was a brilliant move because I can fast forward faster so here I'm going to write a file again to because this client has mounted from the virtual IP the virtual IP moved as a client the client never has no idea no notion that anything failed and I continue to write unmount and that's pretty much the end of the demo so I'm a minute over time and this is just tearing tearing down and unexporting the volume and shutting everything down that's normal we expected to see that and let's wrap with that so I didn't see anybody fall asleep with my sleep inducing voice and come on where are you there we go so Hogan is Kanada which is for let's go so hopefully I have encouraged you all to believe that this is exciting technology and we're doing lots of stuff and I'll give it a try thank you everyone