 So now Arjun will speak about NFS Ganesha. Arjun works for Red Hat in the Bangalore office in India. I'm very happy to have him here. Thanks. Thank you. Thank you for the introduction, Niels. Well yeah, I'm Arjun Sharma. I've been with Red Hat for about one and a half year now and I'm a part of the NFS Ganesha team at Red Hat in Bangalore office, India. So I'll just start right away. I think I'll just start it. You got it? Yeah, perfect. Alright, so the outline of my talk is something like I'll start off with the introduction of NFS Ganesha and I'll explain about its architecture. I'll talk about the, about a component of NFS Ganesha which is a configuration file known as the Ganesha.con. Then I'll move on to NFS Ganesha features and I'll, yeah, and I'll also compare NFS Ganesha with kernel NFS and yeah, and then I'll talk about since I'm mostly working with the Gloucester FS and not Cef, I do not touch Cef or RGW at all. I'll talk about Gloucester with NFS Ganesha and then I'll talk about some recent developments or ongoing developments in NFS Ganesha and then finally I hope to get some feedbacks regarding transport less security. It's, it's one of the, I think, hot topics in NFS I guess. So yes, I'll move on. Right, so NFS Ganesha, like most of you here I guess know what NFS Ganesha is. So it's, it's basically, it's a user, these are NFS servers in user space. NFS Ganesha basically allows you to do NFS servers in user space and it supports your version 3, 4, and your PNFS with also 9P protocol, 9P for the plan 9 operating system. The version 3 of NFS was stateless, stateless, and from version 4 it was stateful. So I know most of you all probably know what stateless and stateful is but let me just rephrase what stateless is. So basically, yeah, stateless stores no client information, you know, like the FD, the file FD, next byte to read and etc. stuff like that. Or, but instead it supports the lookup procedure that converts the fine name to a file handle, which is a unique identifier, like it's probably, it's mostly the inode number for the file. And yeah, I mean NFS Ganesha has Fuzzal, which is called file system abstraction layer. And I'll be talking more about Fuzzal. I'll be talking about Fuzzal a couple of times in my presentation. So the architecture, right? On the top, we have the network layer, the network 4 channel. It actively listens and captures your network requests from the client, which the request then passes through an RPC dispatcher. Here, the request, the RPC dispatcher decodes your request at RPC. And the RPC dispatcher here uses a sub, sub, sub, I'm sorry, sub project, sub, sub directory, sub program known as the lib NTIRPC that is used to implement RPC. And then the request passes through a duplicate request layer. And also NFS Ganesha supports RPC SEC GSS for your user space, user space applications like Curb 5. And then from there, the request passes to the protocol layer, depending upon which protocol you're using, version 3 for whichever it is. And from the protocol layer, it passes to the actually, so I was half asleep when I made this diagram. Just imagine the SAAL layer to be on top of the MD cache layer. And yeah, I mean the request, from the protocol layer, the request passes to the SAAL layer. And the SAAL layer converts the request to an operation. For example, the write request to a write operation, read request to a read operation. And then the MD cache layer basically stores the metadata cache for the particular request. And then finally, we have the FASAL layer, the file system abstraction layer. Depending on what backend file system are we using, Gluster, SAF, whatever it is, yeah, it communicates with that. And that goes on to your backend file system. So it makes more sense to have FASAL here since it's easier for NFS Ganesha to also interact with other user space applications like Curb 5. And then we have D-Bus that basically in a way controls the entire NFS Ganesha process. So you can dynamically send D-Bus messages to dynamically export and export and do other things. And yeah, we also have logging depending on log levels. You can log the results, the progress, I mean the logging basically. So the configuration file. Although the configuration file has a lot more things, I'm just pointing out the, I'm touching upon the main parts here, main components. So first we have the NFS code parameter. So here, yeah, if the block is not defined, it picks up the default values for it. It's basically used for defining your ports to be used for the particular share, for particular share. And then we have the export block where you can export your file system, the shares that you want to export. It can be multiple, you can export multiple file systems. And then we have the, yeah, it's a generic block, the NFS Curb 5. So here you basically, for example, the NFS Curb 5 block will communicate with, if I'm using Curb 5 for authentication. So this block basically is used to modify functions and options for my Curb 5. And then we have the NFS V4 as the name suggests. It's basically used to limit if you want to just go ahead with NFS 4.1 and not 4.2. So yeah, it's mainly for that. And then you have the cash inode layer, the cash inode block that's used to set options for caching. And finally, we have the log block where you can modify log levels for each, for the Fizzal layers to store logs, features. Okay, so NFS, let me just start by talking about Fizzal, file system abstraction layer. So Fizzal is basically, yeah, it's like a plugin that's written in NFS Ganesha for your back-end file systems. So basically, you have a file system that you have. And if you have a library that supports, I mean, you write a library in your file system to integrate it with NFS, you can write a Fizzal in NFS Ganesha for it. And then you can just straight up use your file system with NFS Ganesha. So that's where Fizzal is handy. And then it supports your dynamic exports using Debus Mechanism. I spoke about this during the architecture. So you can basically give Debus commands to dynamically export and un-export your file systems. You won't have to shut down your entire server and restart and do that. So you can dynamically just give commands and, you know, export one of a few shares and not the entire, not all the shares that you're exporting. Yeah, it's in the user space, so I think which is why it's ideal for huge metadata caching. And also, we have, I'm not sure, I'm not 100% sure about kernel NFS. I think it's, if it's not impossible, it's complex, a bit complex to, you know, implement high availability. High availability, most of you know, it's basically, you know, it's for servers to basically prevent failovers. So in case I have like multiple servers that are actively serving my clients. So if one server goes down, I have the other servers to support having to service for that. So that's what HA is for. And we use a pacemaker and coarsing, which are actively developed by Red Hat, especially. And yeah, pacemaker and coarsing, if I have to touch upon it, pacemaker, I'd say is like a, in essence, pacemaker is like a, it decides how your cluster behaves. And coarsing is like a message, is like a message channel that communicates between server and clients. So yeah, we, so it supports active-active configuration. So there's a difference between active and passive and active-active. So I'd say for the viewers, so most of you here know, I'd say that active-active is basically where all your servers are up and running and serving your clients. And this is also good for your load balancing, depending on the situation. And whereas active-passive, you have, although there is redundancy, but you have just one server that's serving. And if it goes down, you have the other server that takes its place. Then, yeah, security and auth mechanism. Yes, it communicates well with other user-level applications like five welled up. So you have auth mechanisms similar to kernel RFS, I think it's way much easier. And it's user space. So yes, it's easier to work with, easy to debug. And of course, we are like a small team and small team. So we have a faster development cycle. And yes, faster development cycle, it's not, it's not chaotic. And okay, I think chaotic is not the right word here, but yeah, we have a faster development cycle. Integration with Gluster FS. So yeah, firstly, the Gluster FS has something called as the GF API, the LibGF API, to communicate with NFS Ganesha. And NFS Ganesha uses, it's basically a wrapper with the Gluster protocol, which interfaces with Fuzzal Gluster that's written in the NFS Ganesha. So NFS Ganesha uses GF API to communicate directly with the Gluster FS and not go via the traditional Gluster Fuse layer. So performance is better in that way, in that sense. And yeah, we have the edge implementation for NFS Ganesha where you use, as I spoke, you Space Maker Encouraging for active-active configuration. Here, you need, I think, at least four servers to carry out edge implementation. And it fails over seamlessly. And if, let's say, a server dies, most of the clients don't even realize it dies. And the administrator can go and bring up that server without any problem. So NFS Ganesha versus Kernel NFS. Firstly, I'd like to mention performance-wise, of course it's Kernel NFS, that provides better performance. But NFS Ganesha is quite good, it's not that bad, it has good performance, it gives you good performance. Other than that, it's easier to debug with. We all know how daunting it can be to debug Kernel NFS or Kernel applications. And it's easier to scale out as compared to Kernel NFS. So scale out, there's a difference between scale out and scale up. So let me just chop on that, since most of you I think you know, but I'll just say that. So scale up is where basically you have a box, you have a machine, and you add in more CPU power, you add in more RAMs, you add in more memory, and you add in more stuff to scale up the box, and then there is only as much as you can put in, and eventually you run out of space to do it, and you've scaled up as much as you can. But whereas in scale out you can add in more boxes, and then of course you can scale up those boxes. So it's much easier to scale out with NFS Ganesha as compared to Kernel NFS. And yes, it's easy to access, since it's in the user space, NFS Ganesha is in the user space, so it's easier for NFS Ganesha to communicate with other user space applications. It's not as complex as Kernel NFS, although it's a bit complex, but it's not as complex. And yes, we are a much smaller community, so we know how, so for development, how daunting it can be to get your patches merged for any kernel projects. But here in NFS Ganesha it's not that bad, and it's much faster. So some recent slash ongoing developments, labeled NFS, so the labeled NFS, right? So basically, labeled NFS in short allows you to set security as a Linux context to restrict processes from interacting with each other. So basically, if a client requests a file access from a server and a server sends a request back access to the client, I mean these two processes for the same file are actually different, so you don't want them to mix up, and you have better security that way. So it was one of the features. I personally worked on this feature, and it's been merged upstream. Then we have delegation. So for delegation, I'd say it's more or less, as far as I understand, it's for better performance, where the client doesn't have to speak to a server, which then speaks with the NFS Ganesha and then to locate a file that... Then NFS Ganesha actually has to find another server where the file is actually located. So delegation allows the server to basically delegate file data to the client so that it can directly access the file, and then we have the sticky grace period. So for grace period, I'd like to touch upon what grace period here is. So when an NFS Ganesha server is up and running, for a while it takes some time to start a problem. So basically in the starting, few of the file operations are not permitted so that the servers can... So the clients that were communicating with the server earlier can have the locks if there were any previous file operations going on. So here with sticky grace period, the thing is for in-flight operations, some file operations require the server to be in a grace period and some don't. So here with this improvement, it overcomes the problem of in-flight file operations so basically you don't flip your sticky grace period. So it basically is suited for a clustered NFS Ganesha cluster. And then we have the async IO, I think most of you know what async IO is. So finally I have transport layer security There is no as-search right now, there is no proof of concept for it and this is a project that my team has been asking me to get a reaction from community from NFS enthusiasts like you. So I urge all of you to participate, like chat about transport layer security. Since we have curve five but it's a little bit complex and you might as well have TLS and so I'm hoping to get some reaction although there is nothing right now for it. So but depending on the reaction we get from the community and about the I don't know any suggestions that the community may have, we may start working on TLS. So please feel free to get in touch with me after the talk. If you have any suggestions you may just raise your hand and you know say anything you like about it. Yeah, that's all and questions. Thanks Arjun for the talk. I have one question regarding this async IO we have mentioned in the Glust. Regarding... Okay, okay. So I have briefly worked in NFS Ganesha layer to develop our own propriety file system. So I have used async IO and 2.5 version of NFS Ganesha. So I wanted to know like what is the state of... Sorry, I use sync IO. I wanted to know what is the stage of async IO? Are you guys in Glust? Have already implemented Afzal for async or still you guys are using sync and it's still in the you know development phase not in the production ready in case of Glust. I know that APIs are available from NFS Ganesha side but how robust they have been integrated with the Glust. Just want to know your view and how team is doing it. Okay, so thank you for the question. Okay. So question, I'll repeat the question. The question basically was how robust is async IO with Glust. In short, right? So okay, as far as I know that the development is still going on but it is better with Glust. It's there and it works well. I mean, I'm saying this because you also have an option for going just with the sync IO and not the async. So I think the async is in a pretty good state right now but I'm not sure where it is exactly right now. So yeah, I mean, I'd be happy to, you know, shoot your mail with after I research about it a little bit more and then, you know, definitely follow up with you. Yes, please. Okay. So we use the set of this with NFS Ganesha on top and I'm curious what you mentioned about the caching the empty cache layer. It sounds like that's a major contribution to a potential performance benefit in particular with considering the set of this in terms of performance with the metadata in particular in small files. Is that correct? Should I focus on that and give it a dot of the cache area? Okay. So the question is, first step, since metadata is I think an issue with stuff and NFS Ganesha has an empty cache layer so should he just rely on NFS Ganesha's empty cache layer and shoot more metadata for it? So firstly, I'd say I'm not familiar with SEF like the problems and the limitations with SEF which SEF has although I've just heard about it I think metadata is the big of an issue in SEF but since I'm not that aware what I'd say is so this is what my understanding is please jump in if you have something to add on Neal's since it's a user space it has the ability to store a lot of metadata so I think if that's what you're looking for to store a lot of metadata maybe so yeah I think that's... The answer is it depends, right? The empty cache layer is great but any time you're stacking caches like this because LIBSA of us has its own it kind of caches a lot of this data as well so well it has to for the caps, right? So it has to do that for caps so any time you're doing that you always have cache coherency issues that you have to work through some things time out, some don't so the issue though is that you're double caching too so you're going to consume a lot more memory if you use the empty cache layer that way most of these are in-memory operations and all in the same process so you're probably not going to get a whole lot of benefit from doing that the one thing that is the problem with LIBSA of us is that it's all under a giant mutex and so you can get contention on that mutex if you're doing a lot of threading I can talk to you about it a lot more detail later if you want but yeah the question depending on what you're doing it may help you but it may harm you too so thank you Jeff the other part two is that we don't have any up calls from LIBSA of us to call into the to Ganesha to invalidate stuff so if we get capped, revoked and you loot and metadata changes it may be difficult for Ganesha to know this so if you have one Ganesha and it's the only thing that talks to us is fine but if you have two Ganeshas running independently or some south of us and some Ganesha then that's bad thanks guys any plans for different HA options? so for example we run Ganesha in a container and like the coarsing pacemaker this doesn't really make this great for that so you know perhaps something like using raft or switch transaction right so the question is if there is any other option for HA as related to pacemaker and coarsing that's what I hear correctly so since I've only used pacemaker and coarsing I'm not sure what the other the other application that you mentioned raft I think I'm not sure about it but as far as I know it's only pacemaker and coarsing that we use for HA in last year especially so yeah sorry so NFS Ganesha the good part is that he does not enforce you to use something so we used zookeeper for HA and we have written our own duplicate request gas DDRC for our own metadata DRC purpose so it is possible yeah it's quite it's quite pluggable architecture that is the best part of NFS Ganesha I would say is that fast enough? yes indeed yeah there's actually some native clustering inside there too if you're using sef so sef I built a way to do active HA using just sef so using rados basically it kind of maintains a shared state database between the Ganesha servers I can point you to a couple talks I've given on it if you want to see them so any more questions? yes just ask do you have a client in user space as well? do you have? a client in user space why do you see me in user space? I'm sorry I'm not hearing that client access so we have user space client user space client we have a fuse client a fuse NFS client fuse NFS clients okay what I know is I know about cluster as a fuse NFS Ganesha has a fuse NFS so there are NFS client utilities that you can use the NFS protocol from a client side there are libraries for that but they're not part of the NFS Ganesha project so NFS Ganesha is really the NFS server the NFS client is either the Linux client that you have in the news kernel or NFS utilities in user space thank you oh, any more questions? I guess if there's no more questions then okay, so we'll do one more sorry thank you for your talk so the question about the client for example if there are no for example system no single AP is used and no pismaker or other things but you have two active active servers and you use something like out there fast on a client side multiple connections and how that can work with Ganesha to achieve HA for that kind of case I'm sorry we have like two servers active active active active so you have multiple active nodes for example 510 and you need to reconnect to every time to different AP so you don't have pismaker you don't check the ability of nodes but you just once it failed you need to reconnect and you're using for like a wrapper for client to do multiple connections and okay, I don't I don't think I fully understand your question but I think what they're asking is about the IP connections for IP addresses that the client uses and multiple IPs so we the NFS Ganesha server has like virtual IPs we IPs for every multiple IPs for every node in the servers so yeah I mean that points you to a different like for failovers that's what points your different server so yeah one more question so we used for example kernel NFS for that for ages for some quite a time but we have this timeouts so once the connection is established it's not so easy to do the switch so we're always losing time and with Ganesha in user space is it something it will be easier, better than I mean from the failover part okay from the failover part okay I'm not sure about the performance okay but you know the logs are transferred to different servers so your data is preserved and yeah I mean the clients don't even notice the failure in a server but I'm not sure about the performance I'm sorry okay thank you okay that's it thank you