 So, what is the benefits of this and what the disadvantages? First let's look at the disadvantage for specific workloads which have many rights to one file involved. Many rights to one huge file involved. Copy and write has disadvantages because you very often replace one block in between of the huge size and you create the fragmentation on a fragmentation on the disk and this is one of the disadvantages all copy and write file systems share. So it's not only ButterFS but also all file systems which use copy and write have this issue. On the other hand you have advantages which are relevant why we are looking at copy and write file systems. One is that you have efficient storage because if you have copy and write you can design the right patterns according to this copy and write functionality. You can do deduplication on the disk because you have information about these blocks, how they are changed. You always have multiple versions of these blocks on the disk and then you can run deduplication attached to these. You can synchronize and if a file deviates then just another new copy is created. Based on the copy and write you can implement snapshotting on the disk, snapshots directly in the file system and to enable this and this is also one of the benefits in ButterFS that you have directly attached to the blocks of data and metadata. You have integrity checks and these integrity checks are also used for building the tree internally to ButterFS and so you have this beyond what is possible with journaling. Come to that point later on when we talk about XFS a bit. Now let's come to our question that I started with. What does this and this have in common and what is this relevant for our topic here? This is obviously cows, copy and write and cow. You see the similarity in the naming. This is butter very obviously and this is a bee tree. The construction of a bee tree is a very simple one. If you go online you find where these pictures come from so it's from Wikipedia or my own. Now we have butter, we have cow. Cows produce milk, you produce butter from milk and this is why butter comes from the cow and this is why ButterFS comes from cow and why we say ButterFS. So that's the right pronunciation and this is how the inventor of ButterFS explained it to me because butter comes from the cow, it is called ButterFS. Not better FS, not bee tree FS, it's ButterFS. Very simple to remember. Let's look at the main features and concepts of ButterFS. The first one is extents so this is how the blocks are organized on the disk. Extents is nothing special to ButterFS, also other file systems use extents. We already talked about copy on write as a major principle of ButterFS and based on these copy on write we also have snapshots which I will demonstrate and talk about later. Sorry for that. The concepts underlying of ButterFS is the bee tree, I already talked about this. So everything, every organizational structure in ButterFS is using bee trees and the benefit of this is that you only implement a bee tree infrastructure and bee tree algorithms once and reuse them for everything that you do. You use it for free blocks, free block list, for meta data blocks, for data lists and the whole file system is organized around these bee trees as we have seen part of the name. Another concept which is important to have in mind is that you have something called subvolumes. A subvolume as a simple imagination is a file system within the file system. So however it looks like a directory. So imagine a directory in your file tree, in your tree of the file system but it is not really a directory but at this point in time which you access like a directory a new file system begins, so a new set of these trees I was just talking about. Virtually mounted into, so you don't see that it is mounted but you can also work with this like a mounting point. I already talked about meta data and data. We have a separation of meta data and data in all file systems. The organization here is important because meta data integrity is important for the survival of the ButterFS file system. It has one interesting feature in this data construction and this is that from every extent back to the inode you have also a pointer in the file system. Normal file systems have a top down architecture. You have an inode and then you have a list of blocks or extents and from top down you have a pointer to your extent or to your block. In ButterFS you also have back pointers. So that from every point of data on the disk you can find out where it belongs to and that's one of the reliability functionality and features of ButterFS that it makes use of in case it would crash. It hopefully never does though. Let's go back to this overview. We have as I said XFS, XT4 and ButterFS as the primary file systems which are currently promoted by distributions, people, companies and so on. We have in the market XT2 and XT3 and RyzeFS as well. Both are not bad file systems. Please don't misunderstand me. They are just not as modern anymore and don't have all the functionality that people need. In the announcement of this presentation we also said that we would go down and describe how to choose the right file system. Now a preface to that. I had the option and I thought about that to give you benchmarks or overviews of benchmarks and I decided not to do so. The reason for that is that every benchmark is wrong for you. And the reason is that a benchmark always has to be done in the context of the storage, the hardware and the workload that it should run to. There are artificial benchmarks, yes, but those artificial benchmarks or synthetic benchmarks are not always the right thing to do. So my advice, before you choose a file system, do a benchmark with your application in your environment and then you may decide. From a SUSE perspective, and this is what I'm talking about, we have a preference, we have experience and this is what we build the following slide on. If you choose a file system, then let's first look where you come from. Is it a new file system that you want to create or is it an existing file system? If it is a new file system, your next question is where do you want to use this file system for? So that's what I call the purpose. Is it for the operating system? In this case, ButterfS is what we prefer and what we suggest people to use today already and obviously also in our next major release of the enterprise product and also sometime in open SUSE when the community agrees to that step. If you have data, then you have to, the choice, do you want to do snapshots or didn't you want to do snapshots? In that case that you want to do snapshots, obviously you have to do ButterfS if you are not using other technologies like storage-based snapshots or device-mapper-based snapshots. If you don't want to do snapshots, then our advice is to use XFS. I will come in the next minute to reasoning for this choice. Now let's talk about file system. When you don't have a new file system, then it is a question of which type of file system you already have deployed in your infrastructure and there are three options. You have XFS already, you might stay with that. It's a very good choice. You have deployed XT2, 3 or 4. Then we are back to the question, do you want snapshotting, sorry, you have Ryzer FS. In the case of Ryzer FS I will, we suggest to convert and I will come to that later, sorry. If you have XT2, 3 and 4, you have the question again if you want to do snapshotting or not. If not, just stay with what you have. If you want to have snapshots on XT2, 3 or 4 file system, you can go the conversion way and also use ButterFS going forward. I will demonstrate later how this conversion works. Now I promised to talk about XFS and I also will comment if ButterFS is mature and if it is good enough to be deployed as an operating system basis for you and also for our customers. Now first why XFS is a good thing. It is very mature. It is a true UNIX file system coming from the SGI IRIS operating system, has been ported to Linux more than 10 years ago, is supported in SUSE's flavors of Linux, community and enterprise since then and we have very good experience with that together with our customers and partners who deploy it into large scales. It has a track record in performance, scalability and also in stability and it has an active development community. It got checksums on the data just recently and it also has self-identifying metadata so that if you have metadata blocks and data blocks and the file system has issues with itself you are not lost anymore and you will be able to recover the file system. This is I think enough reasons to say why XFS is the right file system for data when you are not using snapshots. Now let's go to ButterFS. It's one of these religious questions in my perspective if ButterFS is mature enough or not, if you should use it or not. We say that you can use it and that you should use it if you want to do snapshots on your operating system or if you want to have snapshots on specific data sets. However, ButterFS has many features as you see here, many functionality and it is true that a number of those functionalities are not yet ready for enterprise use. We have separated this with our kernel and file system team and there is a proposal to the upstream ButterFS community to build in an infrastructure to be able to mark those features as supported or not. The definition what is supported or not or mature or not obviously is also sometimes a feeling situation. We are confident though that those things that we list on the left side are mature and supportable and ready for everybody to use and this contains copy on write, it contains snapshots based on copy on write and it also contains the sub volume infrastructure. Also the check sums on data and metadata are an essential part of ButterFS and very mature as is the online metadata scrubbing. Scrubbing means the file system controls itself if everything is correct and if the metadata are in sync with what is found on the disk. Just recently added functionality is the manual deduplication, sorry I skipped the defrag that is also available for some time already and stable. The manual deduplication has just recently been added. It works primarily in user space. The manual in front of deduplication here sets a differentiation to automatic or deduplication. Let me describe the difference just quickly. Manual defragmentation means you have the ButterFS file system, you write it, deduplication, you write it and you will create duplicates on the disk because multiple people write the same PDF thousand times to the disk. Now if the administrator of the system decides that there is a time of low usage of the system for example, a storage system on the weekend in the company, then he can run such a manual deduplication that will slow down the file system. The file system will be mounted and accessible but certainly there will be a higher latency because the other stuff is running. Automatic deduplication happens directly when you write a file or a block to the disk and obviously slows down the overall process of storing stuff and also of receiving stuff from the file system because you always have to go through the file system, check if this is a duplicate or not and this is why we are advising not to use automatic deduplication yet. And last but not least, quota groups are also supported and supportable. This means it's the way of ButterFs to implement quotas per user or per group or so. The challenge though is, and this is why it is listed separately, that on a copy-on-write file system implementing quota is not that easy as you imagine. On a normal file system you have one file, it belongs to one person and in that case you can just measure how many blocks or how much space the specific user has used. In a copy-on-write file system potentially with deduplication happened already, this is not so clear because what happens if two people have very large files which only differentiate in 3%. So these 97% do they belong to person A or person B or do they belong to both? So that's a real challenge here to measure in the file system what the real size is that a person has. And this is why the ButterFs community has implemented quota groups, they are per subvolume where you then describe the size of the subvolume, this is what you can limit in a specific set or configuration. On the right side of not yet mature functionality there are some things which many people in the community or communities very often use and then report issues. The primary things are rate and compression. From our perspective these two functionalities are not yet ready to be used in ButterFs and we advise people not to do it. We also prevent people from doing it in our enterprise product actively so they have to active overcome a switch that they are aware that they are going on to unexplored land. So rate is really not what you want to use yet and also compression is not on a level that we say this is something that we want to support at the moment. These functionalities are already implemented in ButterFs and over time certainly this list will shorten and the list on the left side will be expanded so that we also declare stuff supported depending on where we are with testing with our partners and customers later on the customers then can be sure. That said that's the first part and now we go on copy and write and looking at what we can do with that as a practical thing. Do you have questions? Yes, yes, yes you have the same problem but at least per sub volume you can measure how much is used there. I don't know exactly, I do not know exactly the algorithm which is used to differentiate but yes you can have the same problem and I don't know exactly if they count twice or half or what they do. If you have interest you can drop me an email and I can find out how it is calculated but yes that's if you do deduplication in one file system with multiple sub volumes you definitely can have the same issue again. But you only have in this case you only have to solve it once and this is per sub volume and that's much clearer than per user which potentially has multiple files and multiple sub volumes. Thank you, good question. Other questions before I go ahead? Okay, okay. One of the first things when we talked about internally about butterfes and snapshotting was what can we do with that to manage an operating system better and we created an infrastructure called snapper which is able to plug into operating system tasks on multiple levels. So snapper is an infrastructure, it's a tool, a user land tool to manage butterfes snapshots. It can create, modify and delete such snapshots. You can roll back snapshots and you also can certainly clean up. You can also have an automated cleanup on those snapshots. It is integrated or can be integrated depending on the distribution with the package management stack. So with YAM there are patches for, Zipper is the open SUSE, SUSE is the enterprise tool chain for managing patches and packages. And it also can be integrated in the systems management stack on SUSE that is the YAST infrastructure, other people have delivered other integrations. Snapper has several parts. It has a client library. It has a daemon which plugs into the debus system. So it is a debus service which is available for every tool on the Linux system and everybody can talk to it via this debus protocol. And there is also a command line tool which I will show you soon. Snapper has its own website snapper.io and here you can download packages of snapper for all major distributions meaning Dibyan, REL, Fedora, Ubuntu, open SUSE and SUSE is enterprise. Did I forget another major? I don't hope so. If it's Qs if I forgot one. Okay. So let's first have a demonstration about snapper for administrative tasks. I'm using, as an introduction, I'm using the graphical interface to just create a user. That's the easiest one, what we can do. So I just create a user, let's say, Celestine, I think this right password doesn't matter today. So yeah, I used the wrong password, the Greek password. So now the user is created and once we have done so, we can use the graphical interface to snapper and see what has been done. We see here, Monday, it's 10.38 in the evening, that's correct, that's local time Germany. And we can see what changes have been done to the system and we see here that the group, password and shadow files have been touched and that just has created a respective copy of these. So let's look at the password file. I think that is obvious. What has been done is the user Celestine A has been added to the system, user ID 1002, group 100. I think this is obvious. So snapper gives you an overview. I could roll back now. I don't do now. We do later roll back test. Just want to show you the same thing on the command line. So snapper list gives you an overview of the tasks that have been done and that have been covered by snapper activities. You see here also pre and post. This is specific actions that snapper can pair snapshots in a way that you later can compare them. You see here the users and snapper itself as integrated in just also has gotten a snapshot. If we compare on the command line, it is called status 2 to 3. You see the same list that we have formally seen on the graphical infrastructure. And you can also ask what the difference is on the command line for the password file. Sorry. It is obvious the user Celestine has been added. Now let's roll back the whole thing. Just that you trust me. So Celestine is in, okay. Snapper undo change. Everything from 2 to 3. If we now grab Celestine, it's not anymore. The files have been rolled back and the user Celestine does not exist anymore. In this case, the user has not yet been deleted. This would have been another task to integrate. But I hope you trust me that this is what it really does. How is it done on the disk? We have specific snapshots, sub-volume in this case, where the snapshots are just stored by number. And in this there are XML information files very quick and easy, which you then may see when it has been created, what has requested the snapper to be active, and so on. I also can create snapshots on the command line very easily. Snapper create with a description of test snapper list, or with an additional, I can modify, say user data, city equals New Orleans, number six. And then you see here on the right side, city New Orleans has been added as a key value pair. You can add arbitrary key value pairs such as this, Fulmes, Celestine, can also modify that back again. Sorry again, what? I can't parse the name, the term, you can roll that into any direction. So I could also roll back what I just have done with the creation of the user. Because I have the snapshots there, I can go in any direction, let me just try. You can go in any direction with that. You have the snapshots on the disk, and then you can call out the specific state of the disk and get this. So that's what you wanted, right? Yeah, okay, good. Other questions? Yeah? Yeah? Yeah. Hardlings, hardlings, that's a good, good call out. If you have hardlings, then the number of back pointers from the extent multiply with every hardlink, yes. And this indeed has been a problem in former versions of ButterFest, it's now fixed, in the sense that if the number of needed back pointers exceeds the available space, then a new extent is used just for these back pointers. My personal thinking about this is if an application or an infrastructure uses that, huge amount of hardlings, and one really wants to use ButterFest, then it might be applicable to change the application of the environment to use reflinks instead of hardlings, because that's the way which matches with the functionality of ButterFest better than hardlings. Yes, we have, yes, we have also SUSE internal usage for massive amount of hardlings. I can confirm that. This was an issue, but it's fixed. Okay. So now the next thing is that obviously there are users using Linux on the desktop, even if people always claim that this would nobody do, but obviously I do as well. And there is, there are some requirements that you have to fulfill if you want to do that with ButterFest and Snapper included. The user's home directory has to be in a separate ButterFest subvolume so that you can apply snapshots to it. Then you need Snapper with the Bebas interface, which has been introduced earlier this year, and you need a Snapper configuration per user, where the root user has to define that this user is allowed to do snapshots on that specific subvolume. This is doable, and this is implemented in recent open SUSE and SUSE Enterprise versions. An additional benefit is, and I have to admit I'm a little bit proud of that, it's one of the few things I have programmed really in C the last years, is that you can do automated snapshotting on log in and log out via the PAM stack. And because I'm so proud about this, I will show you very easily. So I have created a user called LinuxCon here. Let's just look at this configuration. It's a Snapper configs. Yeah, home LinuxCon. And you see here, you can read that. You see here that the user allowed to do snapshots on this specific configuration is the user LinuxCon. Now we can look at the configuration of this user this way. It's relatively empty except one suspend. Okay, and if I now log in as the user LinuxCon. It's 1, 2, 3, 4, 5, by the way. Just for testing. Home LinuxCon list. Okay, so you see here that the PAM stack has jumped in. We have done a snapshot while we logged in. You see that the remote user was this and the service which issued this was SU. And it came from the physical terminal, the Toyota terminal 5. Now, obviously, the user can also create a snapshot with the description of test. And we can look at this. And if I'm log out and now I'm root again, then you see also here that we have paired this again. So we have here the snapshot 2, which is a pre. We have here the snapshot 4, which is a post. So these two things are paired. Now, when I did this, the complaint within SUSE was, okay, but nobody logs in and out anymore. Everybody snapshots. Okay, and this is why you see the suspend there. That is just a simple shell script. Then if the system suspends, just a snapshot is done. And so you can also cover this, that whenever you put your system to sleep, that you are snapshotting. Okay. Good. This is available and can be implemented everywhere where Butterfaces is available. Did this. Now, I'm product manager for a server product. And obviously the desktop use case is not of so huge interest for me. So it is more important to see also how the Butterfaces and its capabilities, snapshots and beyond can be used on server infrastructure. And one of these is that our Samba guys went down, took Snapper and Butterfaces and thought about what can we do better in the Samba infrastructure based on what we already have. And they came down with the replacement of the copy system in Samba to make it more look like a real Windows server in the back end. A traditional file copy in Samba does the following. You tell the system to copy a file. The file is copied from the disk to the Samba server, goes over the wire to the client system, the Windows system in this case, goes back to the Samba server and back to the disk. You have storage transfer and you have network transfer. Not really good. The first thing about what you can do better is that you avoid the network transfer. So the Samba server realizes, oh, something is copying locally here, but so you avoid the network transfer. Okay, that's already a big plus. The real trick is now to avoid also all the storage transfers from the storage to the Samba server and back by simply doing a copy based on so-called raffling copy in Butterfaces. So what you do is, it's a clone copy. If a server-side copy is requested, it's just cloned on the disk so that you don't have anything moving instead of a few metadata blocks or kilobyte or whatever, and you have a really fast copy transaction from your Windows system with the Samba server as a backend. That is enabled in recent Samba 4 releases. The second thing is now going to the snapper infrastructure again. So this was just Butterfaces. Now coming to the snapper infrastructure again, you use snapper as a backend to Samba to do snapshots and present them in Windows as recovery points. This is similar to what you have just seen with the snapshots on the disk or in Linux with the user or with the administrator. You can do this. Also that is implemented in recent Samba snapshots and Samba 4 upstream, where the Samba talks via the Debas interface to the snapper, doing a snapshot or receiving the snapshot data, and then being able to give the right version back to the end customer on his Windows system. On YouTube there is a video about how this works to prove also that it works. Other features, which might be in the future, other features are conversion to and from Butterfaces, more to Butterfaces, and that's my last demonstration very quickly. Convert a riser of S to Butterfaces. So first we need some file system. So that is now 256 megabytes, right? Look at riser of S. So you see now we have a loop device created, which is a riser of S file system. I just create a small text file here, which says linuxcon2030 new Orleans, and I hope you trust me that I did the right thing so you mount. Now I have my small file system here. Just do a Butterfaces, convert this one, and now I mount it again, and we check what file system this is, and now with the Butterfaces. And surprisingly what you see, if you look at this file system, then you see not only the file test or text that I have created, but you see also something called riser of S saved, and an image there. And this image is the real interesting thing on the whole operation, because if you mount this image, then you see that is a riser of S. So what happens during the conversion from riser of S to Butterfaces is that the metadata of the whole riser of S file system are concentrated and are saved in one file, which is this image. So if you are mounting this image, then you see all the metadata information that you have from the riser file system, and in fact you could also convert back from a Butterfaces file system to a riser file system. I will do that now very quickly. Butterfaces convert, and we do the mount check again, and you see it's back as a riser of S. The same also works with XT2, 3 and 4. It works in place but offline, so the file system cannot be mounted during that operation. And if you revert back what I just have done, you will lose all data that you have put on the file system while it was a Butterfaces file system obviously, because those data was not available to the riser of S before, or to the XT2, 3 and 4. Snapshot ruleback for the full system will come in the near future. We are working on patches to Grub 2, which help the Grub 2 to directly boot from a Butterfaces snapshot. So the use case for that is if people installing a new kernel, that's what most people I know fear and are anxious about, myself included, that you directly can say, okay, I want to keep the old configuration, including kernel, the init or d, the whole configuration and everything, and you can directly go back to that stage just in case the new kernel that you installed fails. That's the real thing that we are targeting for next year. And certainly we're working on stuff like how to delete the snapshots and so on. That's the real trick then, because you might create bigger tree. Data de-deplication, I already talked about. Manual de-deplication has been upstream this week, so just three days ago. This came in when the presentation already was ready, but yeah, I added it quickly. Summary is that the file system recommendation now, I hope, is a little bit more obvious why we are suggesting XFS if you don't want to do snapshotting and why we are suggesting Butterfaces specifically for the operating system, but also for data. And with that, I wish you a happy, a lot of fun to try out Butterfaces and also Snapper. Thank you. Yeah. Hi. My question is around recommendations for file systems. So you have your impressive chart that recommends which file system to use, and you recommend Butterfaces for OS and for data. But in one of your first slides, you had mentioned virtualization. Yes. And the Butterfaces is not necessarily a good file system for serving images for virtual machines. Correct. Do you have a recommendation for that purpose? Yes. There are two options that you can use, and I said here if you want snapshots or not. If you want snapshots for your virtual machines, you have to use Butterfaces, at least from our perspective. Then you have to eat the frog that it might run into fragmentation and thus slower. The other option is to use XFS directly. That would be the recommendation. There is a midway. The midway is that you can turn off the copy on write on the specific virtual machine image, losing the copy on write functionality but keeping your Butterfaces. This is also what you could do. My recommendation for storing virtual machine images, so if it's really image-based, is XFS still. But if you want snapshots, then you have the one. But the question is valid. Yes. Other question? So if you have a large, say several terabyte PAL system, has the conversion been tested on, because it's only working with metadata, does it take a very short amount of time or does it take a few hours for, say, an EXT4 file system that's of two or three terabytes? So again, for the conversion? So if you're converting from EXT4 to Butterfaces, if it's a multi-terabyte file system, is it a very slow operation or is it relatively quick? I can't give you any numbers. The real question is not the amount of data on the file system than the amount of metadata. That's the real trick. Because the metadata is converted, the data is not touched. The data remains in place and only the metadata is converted into something suitable for Butterfaces. So if you have 50 million files on it, it might take some hours. Yes? Hi. Question about snapshotting. So is there any sort of limit how many snapshots can I have? And if I have many, many snapshots, like hundreds of them, will there be any sort of performance hit? And also, is it more expensive to snapshot large file systems like multi-terabyte file systems? So the expansiveness of a snapshot does not depend on the file of the file system size, but it really depends on the amount of changes that you have done while you are snapshotting. That's the same for small and for big file systems. Rick, you have a comment? Yeah. You would not... We think that it is good to do. But yeah, okay. So the other part of the question was... Yeah, so there is no limit. The only thing with respect to the number of snapshots is, and this is what you also can control in Snapper, the number of the size of the snapshots and the additional size that you have, that you don't run out of space in the file system. Because with hundreds or thousands of snapshots and a lot of changes on your files, obviously, even if your files are small, you can run out of space quickly. Sorry? How many billion are those? Thank you. Thanks, Christoph. So how guaranteed is the instantaneousness of snapshots? Because like I'm looking at particularly, the ability to use a snapshot to make a copy of a Postgres database. I'm used to ZFS where the sort of instantaneousness of it is guaranteed by the intent log. But you're using a different method in ButterFS. Yeah. So the instantaneousness with respect to the database leads the database to be frozen. At least that's what I know, because there is no direct ability from the file system to reach to the rest of the system to freeze something. So if you want a snapshot of the database, the advice would be to put the database in the hold status. Most databases allow this. Then snapshot and then release the freeze again. It is an atomic operation from the perspective of the application, but it's not atomic enough because you never know in which state your application is in that case. Yeah. Yeah. No, that was misunderstood. Yeah, sorry. So the snapshot per se is atomic, but the amount of space that it needs, is not relative to the size of the file system, but relative to the size of the changes that you do on it. Yeah. Okay, I think we have to stop after this question. Okay, one question. Okay, thanks. As I know, some developers working on RazerFS earlier currently are involved in ButterFS development. Additionally, as I know, RazerFS features, RazerFS features and ButterFS features are common. Does it mean that ButterFS source bases on RazerFS or it is written completely from scratch? I didn't understand the question. Sorry, again? What was the question? As I know, some developers working on RazerFS currently are involved in ButterFS development. Does it mean that ButterFS source is based on RazerFS or it is completely written from scratch? It's completely written from scratch. But it's the same people. Okay, thank you. You're welcome.