 Let's welcome Supreeti and David and enjoy the talk. Hello everyone. Hello everyone, my name is Supreeti. I work as a software developer at Suze. And my colleague David is also a software developer at Suze. I have been working on NNFS Ganesha for around one year now and David here is a Samba contributor for a long time. Today we want to talk about how Samba and NNFS Ganesha work together with CEP. So today's agenda is very simple. We are trying to find answer for three very simple questions. Why, what and how? Why are we talking about Samba and NNFS Ganesha with respect to CEPFS? What does Samba and NNFS Ganesha looks like? What is the architecture? Finally, how these two gateways fare with respect to a native CEPFS client. So I am sure most of the people are aware about the CEPH architecture. So CEPH is basically a distributed storage system and you can access the most basicly rados using different interface and one of them is CEPFS. CEPFS is a POSIX compliant distributed file system and there are multiple clients through which you can use CEPFS. One is Kernel Client, then there is also Fuse Client which is not mentioned here and then there are Samba and NNFS Ganesha. So you may ask why do we need to export CEPFS using Samba and NNFS Ganesha because it would make CEPFS available to even a wider variety of clients. So let us say if you have a Kernel Client which is not up to date so you cannot use the latest Kernel CEPFS client and then you do not get the advantage of the latest feature from CEPFS but you can still use NNFS because almost all the Linux clients support NNFS and then Samba you can re-export CEPFS in Windows, macOS and Linux of course. So now I will be talking more about what is NNFS Ganesha. NNFS Ganesha is an open source user space NNFS server. It supports multiple file system backings at this point. There is CEPFS, there is Rados Gateway, RedHeads, Gluster, IBM GPFS. If you were in this room before for Lizard FS you may be aware that they are also planning to integrate with NNFS Ganesha but today we just want to talk about CEPFS. So it is a very simple idea what NNFS Ganesha is doing here. It is basically translating the NNFS protocol into the language that the Rados cluster understands and it is doing so by using the shared library which is LibsFFS. So let us say you have two clients, one is NNFS, one is kernel client and you can mount these two clients and you see the same file system. Now we can go in more detail about NNFS Ganesha architecture. It is a very modular architecture. So when I said that it supports multiple file system backings it is possible because it is modular. We can start from topmost and go down. So at the topmost we have a network channel. NNFS Ganesha uses TI-RPC for handling the RPC request. So let us imagine you have NNFS client and it sends a request then this TI-RPC is handling the request. Then it passes this to a duplicate request layer. So what this does is for example you have a non-idem-important request like creating a file and you know that once a file has been created it cannot be recreated. So instead of talking again to the lower level and reassuring that should I create the file or not you just have a cache in between which makes sure that some response time is better. Then NNFS Ganesha also supports RPC SEQ GSS which is support for Kerberos. If we go down then there is support for NNFS version 3 and version 4. It actually supports PNFS for Gluster also but we are talking just for CFFS so I have not included it. Then so once NNFS client has sent the NNFS request the topmost layer is now understanding what the request is and now it needs to translate that into a language that the file system understands and that is taken care by the FASAL layer. So FASAL is a common API and using this API all the file systems are talking to the NNFS topmost layer and that is why you have multiple back-ins because it is just a common API. You use it and you plug in your back-in and the most interesting thing of NNFS Ganesha is MD cache which is Metadata cache. So what it does is that with the latest Ganesha version we have chunk caching which means that if you are reading a very large file it will not cache the whole metadata in but it will cache in chunks so that the read response is faster and so instead of always talking to the back-end CFFS for the request it will first check in the cache and then it will talk so again the response time should be better. If we see there are two independent modules as well there is admin debus and logging. So using debus you can dynamically export and unexport the NNFS Ganesha so if you have a production cluster running and you do not want to stop the cluster change something in the configuration file and then restart the cluster you can take advantage of debus so you can dynamically change the configuration and then there is logging which is for tracing and debugging. This slide just retrates the fact that I just said. Now few key features of NNFS Ganesha are that using a single server you can export multiple file system. At one point you can export multiple protocols like TCP, UDP you can also have multiple version of NNFS like NNFS version 3, 4 and you do not need to have multiple instance of Ganesha running you just modify a single conflict file and you can use the note. Then it also supports Kerberos authentication and as I said using debus you can dynamically export and unexport the entries. Those were the generic NNFS Ganesha features. When we talk about CFFS then there is support for CFFS authorization which means so there are multiple layers of security. You have Kerberos security then for CFFS NNFS Ganesha is a client and there is this layer of security using CFFS. Then the other feature which was recently implemented is read delegation which means that the client does not need to talk to the server for every read request it can just be so the server gives guarantee to the client that if you are reading this file and unless nobody is writing you can continue reading but in case someone is going to write to the file there is a callback mechanism which makes sure that you do not read the state data and it also supports exporting sub-directories which I think is really important in case you want to have a load balancing or you want to have proper security so but if we talk about a production setup then of course a single server is never a good idea. You have a single point of failure there is a bottleneck and because we are re-exporting CFFS which is a distributed file system we are not taking advantage of that at all. The solution would be to use some HA solution which should provide us some load balancing so when we talk about Linux HA there are two possible solutions one is active passive and another one is active active so in active passive if there are n number of servers running only one is actively serving the clients in case the active server goes down then the other server becomes active but the new server does not guarantee that it knows anything about the client so you are so we do provide availability but there is no guarantee that it will be consistent so the Linux HA provides a virtual IP so you provide a virtual IP for the cluster of NFS Ganesha nodes that you have and NFS client mounts using this virtual IP in case the active node goes down then the virtual IP also migrates and for the client it does not matter it is still connected to NFS Ganesha but it does not know that the Ganesha node has been migrated. So in NFS version 3 was a stateless protocol which means that the server had no idea about the client state but with NFS version 4 it is a stateful protocol which means that the NFS server should know the locks taken by the client the files opened by the client and in that case if you want to have a active active setup so in that case we need somehow that the new server which takes over the clients of the field server should have information about all the client state and to achieve that NFS Ganesha uses takes advantage of the already clustering solution provided by CFFS so we have we are using Rados KV to store the client state so let us say this server goes down then the new active the other active server should take up the all the existing client of the field server and it can read up the information from the Rados KV and then it can continue of course there will be some delay in the response time but that is something we will have to live up with now I will hand over to David to talk more about Samba Okay, thanks a pretty so yeah hopefully you are also all aware of Samba in a nutshell it does file and print sharing to for the most part Windows clients via the SMB protocol it handles authentication with Windows with active directory it does ID mapping and it can act as an active directory domain member or more recently a domain controller the protocol itself it sort of split between the old ancient crafty SMB dialect which was quite complex it had a bunch of whole heap of commands sub commands it did include at least for the Linux client side it included extensions for Unix which were quite helpful but with Windows Vista Microsoft sort of made a clean break from the previous SMB dialect and yeah bought in SMB2 which is much more simplified at least initially it sort of also offered some nice things like larger IO sizes also more recently in that it is still evolving over time there have been some impressive new features added with SMB 331.1 we have now so SMB direct which is sort of RDMA extensions for SMB multi-channel so something close to MPIO with other protocols with PASCASI or whatever witness protocol so this is then allowing a client to monitor and receive notifications on the state of a scale out clustered SMB server also the extensions for encryption on the client side we have at least alongside SAMBA the most common one would then be Windows yeah one thing worth mentioning is that Microsoft nowadays do a great job in publishing the entire specifications for the protocol so it's no longer such a arduous reverse engineering process it's all done in the open macOS since Mavericks I think they've used SMB by default so also a common client and with Linux we have the internal sysco client and SMB client there's also a more recent lib SMB2 from Ronnie which is interesting with SAMBA or scale out SAMBA yeah I think this was initially sort of implemented by IBM and Cernet but there we had or we still have CTDB so SAMBA for storing session state uses a database which is referred to as tdb trivial database in a clustered setup we then obviously need to share this information across the nodes participating in the cluster so we have this CTDB and that handles so aside from the database it also has basically a HA stack bolted on to handle monitoring you know election of masters fail over within the cluster I think I've sort of run through most of those points but yeah this is sort of how we then fit in alongside SF cluster so we have our SF gateway which again so similar to NFS Ganesha we have with SAMBA VFS or a file system abstraction where we can then just plumb lib CFFS directly in there that then just translates or SAMBA then just acts as a translator from the SMB IO packets coming OSD MDS requests at the back end and yeah along the side there we have the database for storing persistence state at this stage with SAMBA we have this VFS back end I think added by Ink Tank quite a few years ago so that sort of yeah plums into most of the stuff that we have in the SAMBA VFS it uses just static SFX credentials to authenticate with the SF cluster so at this stage there's sort of no mapping between users and groups on the SMB side and you know what's used for the SF cluster it supports POSIX ACLs so SAMBA handles mapping between the NT or Windows NT ACLs and POSIX ACLs and then we just stamp that attribute onto the files and directories with CT2B that requires what's called a cluster of mutex helper so that's then used for avoiding split brain within the cluster so we have sort of just a standalone utility which uses just RADOS locks on the SF cluster also SF Lib RADOS service integration so this is something I'll hopefully push upstream soon it just allows or sees SAMBA then advertise the availability of the service to the SF manager demon for testing there's nothing sort of SF SAMBA specific at the moment or integrated so for the most part I'm just continuing to use SMB torture which is quite a comprehensive protocol test suite with the Linux kernel client there's the XFS or FS test suite and interrupt testing for the most part it's sort of more a manual process at the moment but hopefully we'll get something more automated now on to performance so we did run some benchmarking for NFS Ganesha and SAMBA and how it performs with respect to a native CFFS kernel client so this is what our benchmark setup looks like we have 5 OSD nodes 3 monitors and 6 OSD and we are using blue store and we did run benchmark over 10 clients and each client had 16 cores and 16 gigabytes of RAM the public network was 10 gigabit per second and cluster network was 100 gigabit per second for benchmarking we are using file so we are reading writing to data to a file of a specific size for a specified time and we are running this benchmarking over the 10 client nodes and one job type would look something like number of workers and for each worker thread we are running for these possible block sizes and we did test only for the read write at this stage so if we have 10 clients and on each client when we run only single thread then you have 16 jobs and at max we can have 160 jobs so for NFS Ganesha I used NFS Ganesha version 2.5.2 there is already 2.6 tagged upstream which has quite a few improvements over this version so you should keep in mind when looking at the output CF version 12.2.1 we tested with only a single NFS Ganesha server and we mounted using NFS version 4.0 for my test I was reading and writing to a 1 gigabyte file for approximately 2 minutes if you see then first full disclaimer that I did not disable caching so the results are very high for both CFFS and Ganesha in real life application I would guess that people would be using caching so in a way it's okay but maybe we need to re-benchmark with caching disabled if you see the output then for single thread Ganesha's bandwidth is approximately 80% of what's native CFFS current line provides but as number of threads increases the performance degrades both for CFFS and NFS Ganesha but more for NFS Ganesha this is a more comprehensive read write ratio bandwidth comparison for Ganesha and CFFS so if you look at the first data point on x axis that is just a single thread and then it's approximately 80% of what's CFFS bandwidth is but then it degrades so it's obvious that as workers threads are increasing a single NFS Ganesha server is going to be a bottleneck and in that case we need to have multiple Ganesha servers taking care of multiple clients this is the latency graph as you can see when number of threads increases Ganesha's latency is of course higher I will hand over to you the SAMBA results I will just mention that these are quite preliminary so I have just been writing these up to the weekends so the graphs I am showing are sort of generated this morning I am running so similar setup to Sepriti for NFS this is just with a slightly more recent version of CFLUMINUS I have SAMBA 4.6 with 3 SAMBA gateways set up all sort of exposing isolated parts of the CFFS directory tree I have Oplocks or leases disabled so that's currently one limitation of many limitations of the Ceph SAMBA setup and I am using Linux sys.co as the client relatively old kernel but this has a bunch of back ports on top so it's the SLEE or open sys or leap kernel and I am using the SMB 3.0 SMB dialect so in this case these are sort of the results I had for Ceph FS aggregator bandwidth across all 10 clients so sort of around 3.5 gigabits per second for read and write with the SAMBA gateway might have gone too far there so there is a considerable drop in throughput these are a sort of I should mention streaming they are all streaming IOTES and then we just have a bunch of different IOS sizes so 4K for the lower throughput results 1 meg and 4 meg IOS and then we have one worker thread per client of 10 clients 4 threads or 4 workers 8 and 16 at the end there so these are sort of the results I have to now begin sort of analysis and determine where the listing bottlenecks are but at this stage I think one of the again I am sort of speculating a little without going further into the detailed results but at this stage SAMBA uses or has a synchronous fully synchronous backend for dispatching LIPS FFS IOS I think converting that from the synchronous back into something which SAMBA also offers which is sort of a p-thread pool for local file system IO I think there we should see a significant improvement the other one was caching or client side caching so SMB the protocol itself allows for sort of considerable caching and notifications of cached breaks so similar to CEPFS native cached capabilities flags so having some sort of mapping between these two caching mechanisms to then fully allow the CIS or SMB client to perform caching which also I think significantly improve results there so now on to challenges and sort of what we're currently working on where we want to go with SAMBA and NFS as a gateway for CEPFS so at the moment we're sort of focused on yeah, cross protocol support so I think at the moment NFS Ganesha is missing ACL support so ideally we have ACLs on SAMBA, NFS and the native CEPFS client all consistent coherent caching so this is more just at least the SAMBA side that then needs to be improved to allow for leasing or SMB Oplocks with CEPFS unified authentication so I think it was VMware upstream in CEPFS we're looking at adding active directory support as a replacement for or as something alongside CEPFS if that were to come in then we could look at using basically the SMB client active directory or Kerberos tickets to then authenticate with that specific user with the CEPFS cluster so that's also something which would be nice to have AsyncIO at the SAMBA back end that's one thing I mentioned earlier I think there we would see considerable performance improvements multi-channel support so this is the ability to sort of utilize multiple network interfaces and SMB server and then sort of round robin or failover between those interfaces that's not currently fully supported with SAMBA but hopefully that will then come soon deployment so at the moment SAMBA isn't integrated into deep-sea which is sort of the utility we use for deploying a CEPFS cluster within SUSE. Witness protocol again this is very SMB specific so allowing the SAMBA or CEPFS cluster to notify SMB clients and sort of balance load across the SMB gateways would be something good to have and finally this replacement for CTDB or potentially using the Rados key value store instead of a full HA stack would be I think a nice simplification for the architecture of a clustered SAMBA gateway and something we're looking into so now NFS future that's the last slide so upstream has already started working on having clustering support in NFS Ganesha without using the Linux HA so it means that there will be support in Ganesha to manage the clients I think it's more targeted towards using Kubernetes with NFS Ganesha then another feature would be to have version 4.1 support which means PNFS and as Librado service integration would also be nice thank you can you repeat the question so the question is if PNFS is a near future of NFS Ganesha in SEF I can answer for NFS Ganesha I would say yes the upstream community is looking into having PNFS support so with SAMBA there's or at least the SMB protocol itself it doesn't have anything similar to PNFS or at least at this stage that I'm aware of so I think the closest would be multi-channel so just allowing the clients to sort of utilize multiple connections and round robin or stripe across them it's still very different to PNFS you mentioned the challenges you have looking wasn't missed there I think looking on SAMBA and looking over NFS is not really very compatible that's true I think another task which needs to be addressed so the question was coherent locking between both NFS and SMB clients so at this stage we sort of support independent isolated SMB and NFS setups but yes locking would be something that we'd like to have so the question was how close is the presentation for production use so I can say that within SUSE we currently provide it as a tech preview for our storage product I think if you're purely using it for SAMBA or to expose CF to SMB clients then it's absolutely usable so we found there have been a few bugs just with certain operations getting through the CFS and hitting the local file system but I think most of them are fixed it's getting there things like scalable performance that would then be further out otherwise thanks for coming we should be agreeing