 Yeah, it's relatively easy. So my name is Mikhail Biritsky and I'm like responsible for the community work in LizardFS. More volume. Does this work? Ah, okay. Cool. I was wondering at the speaker before already that. No audio. So my name is Mikhail Biritsky and I'm basically managing the community activities of LizardFS. Right? We are a Polish project and I'm today not introducing... I think it's on. Really loudly. So today I'm not introducing LizardFS, but I want to tell you about what we did in the last 12 months. Right? So somebody doesn't know our project. We have a booth in Building K, plus there are like people around here and running around the show with LizardFS t-shirts that can explain you every detail possible. Like, one is sitting there, he actually wrote quite a lot of the stuff I will be talking about. So we have every answer available you could imagine. Last year was very important for us because we introduced many new features. We are a relatively small community, so being able to introduce all that in one year was quite a job. So we introduced NFS support by Ganesha. We introduced a new ACL support. We have full support of Rich ACL. I will talk about that later more. We introduced a new task engine. We introduced support for HDFS emulation. And we introduced a new C-client library that allows you to write your own LizardFS clients. We also started supporting read-ahead caching. We had some problems with sequential writes and performance and that also vanished in the last year. We introduced support for massive recursive remove jobs. We created a whole new documentation. So at least most of the administration and installation features are pretty well documented. And we started supporting two new platforms, FreeBSD and Fedora. The FreeBSD one is still in the queue with the FreeBSD people. The port is published, but it takes its time in the FreeBSD community. We also did some changes. We have a Windows client, but we have extended the Windows support so that you can probably compile all of LizardFS now in the new Linux subsystem on Windows. It makes no difference if you like the platform or not, but one of the things we believe in is the more platforms you support, the more easy it is to find bugs. Every compiler compiles differently. Every platform shows you different bugs. For bug squelching, this is like the easiest way to have to see things that you wouldn't see if you just do it on one distro, one platform, one operating system. And we added some smaller changes like people were complaining that you basically can have the situation where you have chunk servers running on the same IP address, but on different ports, and your replication would be basically on the same machine, right? So we added a feature to recognize that. Our system now recognizes how much load which chunk server has and tries to adjust its rights to the lowest, to the least loaded chunk server. We now have minimal goal configurations. We finally changed to a semantic versioning system, which was a bit confusing before, even for me. And we did a lot of small fixes that still, to clean out still coding mistakes that come from the project where we forked off four years ago. We have a lot of places where small ports were not understandable in the code, and we seem to have reached the point where 90% of that is gone. We now have caching for faster directory lookups. There are new whole path lookup functions. If you want details on any of the more developer-like features, our main developer is sitting in this room and can answer any question you may have. We scratched quite some bugs. We had a lot of places where things were still set up for low-speed networks and for low-speed disks and not prepared for the new more performance hardware environments. We fixed most of that. We had some problems with global locking, especially people that were using the Windows client had problems with that. We fixed that. We now support ARM, officially. Before it was very theoretically. Now we support it. There are still probably not all problems with ARM fixed because we don't have enough hardware for that, and are waiting for some promised hardware from people that want to use the ARM platform. But it seems to work because I know at least two customers that run the Elisard clusters on ARM. We also fixed some problems that we had with the Mathematica library that we used, LibJudy that seems to be unsupported everywhere. We had to implement some ways around the bugs. I think the Ubuntu one is totally bored. I'm not sure if you found all the problems with it, but it seems that most of them are gone. Finding defective files was a bit complicated till last year. That's fixed as well, and request sizes in reach cache have been fixed as well. They were often reported in zero size, which was just wrong. Last year we introduced ACL support. The way the ACL support is presented depends on which platform your client is using, because your client has to be able to present it. The backend system uses rich ACL, which is a superset of the NFS ACL, and we try to translate in every client to the one that that client is supporting. The only platform of the supported platforms where we are not able to do any translation or any ACL support is FreeBSD because the Fuse library that we have to use there doesn't support ACLs. So for now no ACLs for FreeBSD clients. The documentation is being updated. I already have all the tables ready prepared so you can see which platform gets translated how, and that should show up in the next three, four weeks in the documentation. You wanted to ask something for ACL? The question is how we map users basically in Windows. We have a Windows client that interfaces with the ID function in Windows itself. It's not interfacing to the Unix backend. It's using the Windows own user and group interface. We have built a new task engine into Lizard. We used to have a lot of, a lot of complaints about big jobs taking, basically nearly killing the metadata server. And we now introduced a task system that takes for now snapshotting and recursive remove operations and cuts them into small tasks and in that way frees up the master server and prevents all this, all the locking scenarios. The task engine tries to first create as much a job, as large a job as possible to get rid of as much of the job as you can and then then split the last in smaller tasks. We have tested it and basically we got rid of this problem completely. I mean we haven't seen it happening anymore. It used to really be a big issue. What I see in the next year is that we will try to move that to other features as well. It's not easy because the code base is quite big and you have to rewrite every single client feature to be able to use the task engine but for the two most complicated parts which is snapshots, I mean we had people that were waiting days for their snapshots removals to finish and large recursive removal operations that's completely gone. We also added functions where you can list running tasks and you could also stop them. So if you think that your job will impact your metadata server, you're now able to kill every task that's running in the task engine. The two commands that use the task engines are the commands for snapshots, make and remove snapshots and the recursive remove commands and they got also a new option where you can tell them how large the first job should be and if they should ignore any change that happens while the operation is running. I mean there are both long running operations and if your snapshot doesn't ignore all changes then happen then it probably doesn't make much sense. So you can choose now if you want the snapshot operation to care for changes that happened while snapshotting or not. Another feature that was added was read caching. We had some very primitive read caching till last year and we noticed that it's not good enough to handle large sequential writes. The reason why we never saw that before was that we didn't have so many customers using Lizard for massively large files. Lizard is now used especially in genome setups where the file sizes get like incredibly large and there you can see that there we have basically concrete numbers for how it behaves in sequential writes, sequential read sorry and changing the read system to implement a more dynamic way of the read caching brought us, I don't know when we tested with Roger I think it was eight to ten times improvement into the performance. You can set within the mounts how large your read cache can grow it will dynamically grow to that window size. You can adjust that per mount so if you don't have any sequential writes you could sequential reads you can basically switch it off completely as well. Okay we started I think in January last year we started a new documentation project. There's all of the installation all of the self-compiling and all administration commands are documented even some of the more complicated features are documented like geo distribution like setting up which chunk servers should be preferred for which clients you can find all that what is missing till now and where we would be happy for any input if anybody wants to help are the pieces about daily work with LizardFS. There's only so much we can write as the developers and the company that does support about your daily job it would be far more interesting probably for users to get also cookbook like or FAQ like entries from normal users. We are starting to document some of the more tricky management parts and one of the things I'm currently concentrating on is to create more documentation for developers that want to join the project. So last year after a long time of testing I think two years now we implemented Ganesha NFS as a yeah let's call it a translator from Lizard to NFS it's implemented on directly on the servers so the Ganesha NFS server metadata server is attached directly to our metadata server and every chunk server you want to be you want to take part in your NFS infrastructure is basically linked to its own Ganesha data server as well which enables us to fully support NFS 3, 4, 4.1 and PNFS. We basically tried a lot of ways starting with direct translation which is always horrible um and for two years we we we tried to even implement our own NFS and use different platform NFS systems and basically all of them yeah more or less failed you couldn't do HA you couldn't support any parallelization and with Ganesha we basically managed to to have a fully integrated setup that translates directly without all the the mount operations or fuse translations or whatever we needed before when we had to translate mounts. The the implementation is relatively new I think the release was in December and as with the documentation we are quite urgently looking for feedback and for testers to to see how it behaves in different in different scenarios one limit with that with that way of implementing NFS is that we only supported on Linux chunk servers right Ganesha use seems to use a lot of Linux specific functions so for people that use that use BSD or Mac or Solaris back ends with with LizardFS we can support Ganesha there because there is no Ganesha on those platforms. The Ganesha project was also the the main force behind finishing our new client library which allows you to talk directly to your LizardFS back end and I will I will talk about that later a bit more but basically you now have a client library with which you can write in yeah probably in an hour a client that replaces the mount command for LizardFS so you could do all your file systems transactions for any application you want. Yeah no no requirements for views no requirements for any for any mounts it's it's a direct client library to the LizardFS protocol. Another one another feature that that we were asked for a lot because people when they read that you can support massive data structures the first thing they start telling you is big data was if we could offer an HDFS support and we we started working on that like one and a half years ago without really understanding how Hadoop works. We are file system people we are not really into data language and data analysis so the the first the first implementation no the first incarnation I've called it of our HDFS plugin was quite weird and set on top of the of the mount system and basically translated every command back and forth and was totally unusable right then we we decided to to try the CAPI that comes with Hadoop. After managing to implement that we had to find out that most of the features we wanted for the Java people that work with Hadoop are not implemented in the CAPI and that most our work made no sense whatsoever because we had an interface that was not usable and it it seems to be very complex to use it and create its overhead that was totally unnecessary because it was again Java to see then from C API then again to us. While working on that Piotrek invented a library that I already mentioned in the Ganesha project that allows you to talk directly to the desired protocol and once we created a direct connector that basically uses the API to talk directly to the back end we started to be able to create something that has enough performance that we can now call it a Hadoop plugin right it's still in the works and we haven't published it yet because it was not really in an alpha stage till last month but you should find it in the next couple of weeks on GitHub and can take a look and again there we are looking for feedback because we don't really have a large Hadoop cluster it's all done on medieval stuff in our in our labs and we don't really have any practical Hadoop experience so we think it's feature complete but we would very much like some people that have Hadoop to give us feedback if we are right because it's mostly guesswork right now what was interesting was that there was a there were a lot of problems with Hadoop itself because different things were differently implemented in different parts of Hadoop so for example we had an interesting experience that with the message final found exception is not an instance of final found exception because it's differently implemented in MapReduce and differently in Hadoop and dev spent like weeks on finding where the bug is until we found out that it has nothing to do with us it's on the layer up in Hadoop itself so this was quite an interesting experience of trying to link with a different project where you have actually not much idea so currently we have according to the HDFS test we have a feature complete HDFS on top of LizardFS speaking directly to the back end there is a small installation documentation already done what we are adding now is how to use it it should show up in the next couple of weeks in GitHub we are looking for help with with quality checks for it because like I said we have a very very small Hadoop cluster and there's only so much we can try there because we have no practical experience so any feedback from the real world would be would be very appreciated so why we were working on Ganesha and NFS we created a library that we have published in December with a 3.12.0 release that you can use to write that you can use to write native Lizard clients right there is it's available on all platforms that we support documentation is in the works what is available now is an example is an example file that shows you how to import how to include it into your own project it's pretty easy to use it includes features for logs, for writing, reading, unlinking, opening and copying objects it allows you to get information about the state of each chunk server or all the chunk servers so you could make use of the feature that checks for loads on different chunk servers for example and it allows you to manage ACLs the the example file is I think 1k big so it's pretty simple and if you have any if you have any questions with that the outer is sitting here so yes so so much about what we did in the last 12 months I want to to give a little insight of what we have planned for the next year or probably even longer we are we are releasing today the the source code for our HA engine for the metadata servers that used to be a commercial add-on from sky technologies we are releasing the Hadoop plugin as I said to the public because we think it's so that it's showable and we are starting to work on on something we call Agama which is the next generation of Lizard which has a lot of internal changes I have some more to say about that in about two minutes we have extended our our testing to the Windows subsystem for Linux we found out that we can really put a whole that we can translate all of Lizard FS to it the only thing you can't do on the Windows subsystem is work with signals they are known it's pretty interesting as the next testing platform so the the the new Lizard FS has a whole different internal architecture that's mostly focused on performance we think everything in the normal features is stable and we are now moving to to get Lizard to a different performance level we want to start with a completely new client and then move the changes that we put into the client also into the metadata and the chunk servers the client the client will be totally backwards compatible so basically you can start working with the client and then then upscale the rest as well um the changes are that we have planned is we are moving from the single monolithic setup how we have it now to an event-driven architecture we are switching to async IO utilizing the the async library because we think it's the easiest way to stay current also with the c++ standards we are trying to get rid of get rid of a lot of pieces that use in kernel features especially looking at the the bugs that are that that show up all the time in the in the kernel space we would like to get out of there as much as possible um so all our network features are for example now 100% in user space we are adding a new tracing subsystem to the to the to all the modules because for now you just get you get massive log output from the chunk servers from the metadata servers from the loggers from the clients and there's it's relatively hard to correlate which which piece is related to which function on which on which component so now there are 64 bit identifiers on every transaction that is being logged so you can see and trace in your distributed system where your things go and why and when there is a new there's a new monitoring feature being introduced that will allow us to automatically adjust timeouts to automatically adjust settings because we had a lot of situations where customers don't use the easy set like we tell them but go into relatively complicated parts and just fiddle around and then come back and ask for support with the fiddle setup yes the source what's the source for the transaction identifiers I don't know so I have our the the developer who wrote that here we can answer the question afterwards I get him here to answer your question okay um so we we added a new no we are adding a new monitoring system that is basically set like I said to automate all the settings where people were telling to telling us that they are scripting scripted but understandable and and create a lot of support overhead because people just set them and then call and write and say it's not working we haven't changed anything besides this 70 settings yeah so stuff that is relatively relatively fixed now and work has started on it is the client we have eliminated most of the kernel caches we have started on implementing the client so that it uses a user space network sub system right there is now full write versioning in the works till now the system was doing partial write versioning we found we we saw that this leads to relatively slow writes because only one chunk server at a time can write to the back end if you have full write versioning basically every chunk server that wants to write into that place can do that and the newest version basically wins that allows to to improve the IO by a factor of up to a hundred um it also safeguards you against some edge cases in erasure coding where basically parity could be written twice and destroy your stripe set um it also helps you against some very rare edge cases where microseconds of differences occur in writing to different metadata servers if you have full write versioning the synchronization is much safer and basically easier to handle um yeah and as I said the the writes become much faster now I mean in tests we were able to set to saturate a 40g interface now yeah with one client the the things for the chunk servers and the metadata servers are basically planned I mean currently we are working on the client and the idea is to move all that that we find in the client to the to the back end servers as well and the plans for the chunk servers are relatively easy I mean we will take the same AIO work for for event-driven architecture and asynchronous IO take the user space network functions and move them to the chunk servers as well and there is a lot of work planned to get rid of old complicated structures and simplify the whole thing um reason is one of the main drivers of the project is simplicity I don't know how many of you have used LizardFS but but normally a setup takes I don't know I can set up a very basic LizardFS in about 20 minutes right complete with prepared for simple erasure coding even so we want to make that even simpler and automate as much as possible and get rid of a lot of stuff that we still find too complicated um same as planned for the new metadata server the metadata server additionally will have this this monitoring feature I I said something about before that will allow you to to move a lot of the more complex settings to be automated so timeouts backend drive handling interactions with the backend file system read ahead cache settings and so on will be automatically will will be possible to be set automatically by the metadata server on the chunk servers and even on the client um the the the other feature we want to introduce that was asked a lot is a distributed network of metadata servers we are currently experimenting with different key value stores to be able to to create a network of active metadata servers but that's all more at the planning stage than than than anything else because concentration currently is on the clients so also the the the right versioning that we are currently implementing in the client will take its way to the metadata servers as well so we are we are today um open sourcing the the HA backend that we were that we have delivered developed for LizardFS it's called UREFT it's we are using it since four years successfully at all our commercial customers it's based on the REFT algorithm it it is a quorum based backend that switches IP it requires at least two nodes plus a quorum node or three nodes the settings are really really easy I will show an example the switchover times that we have are in the sub second range and you can use it now you can use it probably for whatever you want but our scripts are that we will publish are for our metadata servers and for the Ganesh meta servers as well it uses a floating IP model the the it has a very fast election process you can see the the theory behind it on the REFT algorithm definition page which basically has a very nice graph about how the election process works the the resources it uses are absolutely minimal very often we put the quorum node at on the same box as the chunk server and you don't even notice that there is a change in in resources on the box so this is an example of a UREFT configuration you start with describing which node has which address you add the floating IP and the floating network and the interface that you want your IPs being switched on and the only difference between your nodes is in the UREFT ID right the configuration is the same on each node you just give it give each out a different ID and that's your whole setup so it's really really simple yeah and my time is up yeah time is unfortunately over if you have more questions to LizardFS visit the booth and building k um yes thank you