 Okay. This is going to be quite a long talk, and I don't have much time, so I'm going to start right now. First, let me apologize. As you can obviously see, I have a little code, so I'm going to do quick stops to cough, and I promise to turn the microphone off if I can. Anyway, I'm here to talk about EXT4 and the case-insensitive feature that we implemented on it. My name is Gabriel. I work for Colabora. Well, let's go. First, what is a case-insensitive file system and why we want that? Basically, case-insensitive is the ability to look up for files and ignore the case. All of these files that you are seeing here, slash temp, slash hello in lower case, slash temp hello in upper case, or a mixed form of that, they all should resolve to the same file. And then you bring this question to me. Why? Why do we want this? Well, Linux has worked it very well for the past 20-something years without case-insensitive. Unix doesn't have case-insensitive. All the other operating systems who try to do it ran into some troubles. Well, the reason why we want to do this is because there are real-world use cases, in particular, too, that really matter to me. The first one is when we want to bring applications from the Windows world. Very huge applications which carry dozens of thousands of files, they all were written with disregard of case and building these applications in Linux is a pain. Also, I really care about the gaming industry and why in the world do they have problems when they are trying to run games that are loading a lot of textures and they need to be real quick, real fast doing it, and all this code was programmed considering a case-insensitive file system. Well, you can solve that in user space, but then you don't have the performance necessary to run a triple-A game on Linux. And finally, I have a very important use case which is Android. The Android developers decided to expose a case-insensitive API for application developers, and now they need to provide backward compatibility for that API. And doing that outside of the file system itself is either racy as they discovered by doing SD card or has very low performance. But there are other reasons why we want case-insensitivity because that's how real languages work. When I'm talking to you guys, it doesn't really matter if I'm thinking lower case or upper case, there is no such thing. Well, we could say that upper case is actually screaming, but that is some internet stuff, it's not real languages. So, when I'm using my system, I don't want to remember if my file was created in lower case or upper case, I just want to find my file. And there is another reason, another very important reason, there is much more to the world than just the English language. And we are not good at handling these special characters like the Sidelia in Portuguese or the asset, which is that little beta that comes from German. That letter in particular is quite funny because in lower case, it's actually SS. And when you write your word in upper case, it's this beta which they call asset. So, if I'm searching for a file and I don't know German and the file is written in one way, I'm going to have a hard time looking for it. And people complain a lot because, well, the UNIX philosophy is about simplicity. We want the least surprise. Well, I could argue that the least surprise is actually not having to know all the idiosyncrasies of languages when you are searching for a file. The UNIX philosophy doesn't actually go against the idea of case incentive. So, basically, how do we implement that? That's easy peasy, right? For those who program, you have your libc str case compare and that solves everything. It's not exactly like that. Well, in Linux, file systems, sorry, file systems see file names as opaque byte sequences. So, it's just a stream of bytes that is no terminated. It can be any byte possible except for the forward slash and the new byte character. What is the upper case of a byte? It doesn't make sense. Upper and lower case are linguistic terms. So, they only make sense when you define a language, when you define an encoding for your language. So, now we have to assign meaning for those bytes. Well, we usually assign ASCII. So, A is equal 61 in X and et cetera. But that's not enough for the world. That's really good for English speakers, but there is more to the world than just that. So, we need something else. We need better encoding. We actually need unicode support. And doing this in file systems bring a lot of other issues. So, what happens now when I have files with two names that only differ by case? So, I have two files that are blind upper case, blind lower case. If I have an EXT4 file system, a simple one, well, those are two different files. But now, we want to see them as the same. So, if I take a file system that is case sensitive and make it case insensitive, I might get into this situation where files collide and then I cannot access them anymore. And the last issue is performance. Well, comparing two byte sequences to verify if they are the same is very easy. You just do a for loop and do the comparison byte per byte or word per word if you want to be fancy. When you are dealing with unicode, it's not that easy. You have multiple ways to write the same string and you need to handle all that. And the last question is what are the right semantics for this? What is the right granularity that I want my file system to work on? Is the entire file system going to be case insensitive or just a directory? Can I do an entire file system case insensitive? I don't know. What about data preserving? What I write on the disk? Is it going to be the lower case version to make lookups faster? Is it going to be exactly what the application wanted to write? These are the kind of semantics that we needed to discover. And there is also the question, what we've regretted wrong, like Apple did in 2017 when they released this new super new fancy file system called APFS for the MacBooks. Well, they released this file system in 2017 and they made the decision of doing the normalization, which is an operation you do on filenames. It wouldn't be done on the file system anymore. It would be done in user space, differently from what was done in XFS, in HFS beforehand. And then they released iOS 10.3 with that implementation and all the iOS applications stopped to work because some applications assumed that what they had written on the disk is what would be fetched. Some applications didn't know you were doing normalization in user space. Everything started to crash. And, well, I work on the kernel. I don't want LWN to write an article about me how I broke things up. So I really wanted to get this right in the first place. But some people may ask, well, is this a kernel problem? Why not do it in user space? It's possible to do it. Apple tried to do it, didn't do it. But it's possible if you do it right. But performance is very important. And there is no way I'm going to explain why you can do it in user space with the current Linux infrastructure and get performance at the same time. I also care a lot about non-English speakers because, well, English is not my native language. So ASCII is not enough for me. I want something that can be used by everyone. And I need to consider these strange cases, like the asset in German. So I want real unicode. I don't want just ASCII. I don't want ASCII 2 and some code pages. I want UTF8. And I need to make it future proof. I want the unicode people to release their next version of unicode six months from now. They do releases every six months. And I want to be able to apply that to the kernel very quickly so everyone can have their latest emojis. So I started by teaching the kernel unicode. How do you teach Linux unicode? Well, basically just a quick overview of unicode. It's composed by code points which is our unit of operation. So we are not talking about characters. We are talking about code points. They are very similar to characters, but they have some caveats. So they can be multibyte. Basically, A is 0061, which is actually just 661. You can see that that is compatible with ASCII. But you can have characters that are four bytes long, for instance. And you have over 100,000 code points already assigning in unicode. But the unicode folks can go up to millions. So we need a look up in this table to have real good performance. We need to be able to do some operations. Like, for instance, in unicode, you have this character A with a tile which can be written as that character or a composition of two characters. A composed with a tile character. You have also this very specific linguistic structure which is called a ligature where you can combine a few letters in a language and that becomes a single letter. So in unicode, you have a single letter that is called FFI ligature. And you have another ligature that is called FF. And you have the letter F. So if you want to write the word office in English, you have three ways to do it. You can write O-F-F-I-C-E or you can use the FF ligature. You write O-F-F, then the letter I-C-E, or you can use the FFI ligature O-F-F-I-C-E. And all those need to match to the same operation, to the same file. And then there is the question, how do we actually encode these things? Unicode refers to characters, not to the encoding itself. We could choose between UTF-8, UTF-16, UTF-32. And that is not as an obvious choice as you may think. Well, the internet goes on UTF-8. OK, we are English speakers. But actually, UTF-8 is not great for every script, for every set of languages in the world. On Western languages, to represent their characters in UTF-8, you actually spend four bytes while to write the English alphabet. You only use one byte, which means that a text file in a language like Chinese ends up being almost four times larger than a text file written in English in UTF-8. And this is not quite fair. But still, just to tell you, no matter how I want to fix that, what I did was I added support for us to be able to implement other UTF encodings and other encodings that are used in other parts of the world. But my customer is American, so I implemented UTF-8. I need to define a few operations just for us to go on here. The first operation is called normalization. And it allows you to get any way to write a specific sequence, a-tiled or a-plus-tiled or tiled plus a, and match them to the same character. So you basically take a string that is unnormalized, normalize it, and then they all match if they are the same character. The second operation that we can talk about is case fold, which is basically normalization but for case. So in this operation, a lower case would match a upper case. And what I really want to do is normalization plus case fold. So a with an accent written as a-accent, accent plus a, or a plus accent, or the lower case and upper case version of that, they would all match. Now there are two ways to normalize Unicode. The first one is the canonical and the second one is called compatibility normalizations. Which one to pick? Well, let's see. The k normalization, the compatibility normalization, it gives us some very strange results. Like 2 to the power of 5, it says it's the same thing as 25, because it doesn't care about the semantic meaning. It cares about the specific characters. So if I created a file called 2 to the power of 5 in my XD4 file systems and then looked up for a file, for the file 25, I would have that, which is obviously really crazy. My second alternative would be to use the canonical normalization, which has more linguistic meaning, but it doesn't consider stuff like ligatures. So remember the office ligatures I mentioned about? FFI is a ligature, FF is another ligature, and then you have the letter F. Well, with the canonical normalization, office written in one way and office written in the other way, they wouldn't match. So none of these are ideals. So I went to other file systems who already implement this and TFS, for instance, APFS, the Apple file system, HFS, to see what the hell are they doing. And turns out, everyone does it differently. So NTFS and NFS, they both use the C form. HFS uses the K form. APFS does something amazing. They use the K form in some cases. And in other cases, they use a version of the C form. And this is not documented anywhere. So we discussed a lot and we went with the C form for now. But as I mentioned, we wanted this to be future proof. So this is always all the information about what we are doing is stored in the super block of the file system in case we want to modify it later. But we believe we got it right, or as right as we could. And then we implemented a library in the kernel that provides an API to abstract all of these front file system developers. Basically, it's a high level API abstracting all the encoding details and the versioning details of the char set. Allowing us, allowing the user file system to just say, I want this encoding, I want this char set with this functionalities. For instance, I want the K normalization or the C normalization. For now, we only support C. And then they are able to provide us with strings and we split back the case-folded version or perform comparisons as they need. This is a high-level API which implements functionalities like str-compare. Unicode equivalent, str-case-compare. The unicode equivalent implements normalization, implements case-fold directly to be used by the file systems. So how do we store all this information in the kernel? Remember that we need to store this huge table in the kernel and it cannot be stored as a simple table because we need to do very fast lookups on it. So basically, we auto-generate this data from the UCD files, which is the unicode database files that are published by the unicode consortium. And then at build time, we generate a digital tree that turns out into a big binary blob that gets linked into a module in the kernel. And then we perform lookups in this digital tree which is actually a forest. We have several trees, one for each version of unicode, looking for UTF data. And the key for looking up in this tree is the encoding itself. So it's very cheap for us to figure out if a character exists or not or if it's an invalid character. And on the leaves of this tree, we have the information already decomposed of what is the case, fold version and what is the normalized version of this character. So now converting a string becomes as easy as performing several lookups for each code point in this table. And the entry of this tree and the each tree in the forest give us the version of where this character was introduced. Why is this important? Because unicode does not assign all the code points available. Some code points are still empty and they can be assigned in the future. And they can be assigned a new normalization in the future, which means that if you go on your EXD4 file system and creates a file which uses one of these unassigned code points and in the future unicode decides that they want to assign that code point to a character that decomposes to another character. And by accident you have another file in your file system that has that other character, then those files would start to collide in the future. So we always need to store the version where that file system was created and stick with it or perform some operation to update all the file names in a file system which is not quite trivial to do online. The way we assemble it, I'm not going to go over through it but it's basically done at build time. It takes a few seconds to do it so I didn't want to add it to the normal build flow because I didn't want to piss off all the other kernel developers. This is only, we only build this if you set a specific flag in the during K build to generate it. And I as the unicode maintainer I do it every six months when the new unicode version is released. So basically this is done by we walk through the entire database, assemble, calculate the normalizations beforehand and then generate the tree with all the information already there. This means that we have, we can perform character lookup by just traversing the tree from the root until the leaf which means that the cost of performing lookups depending on the encoding length. So once again performing English lookups is a bit cheaper because the English alphabet is all comprised, cannot be described with single bytes unicode code points but it's harder for Western languages. But still the way that the unicode subsystem is implemented in the kernel allows you to add plugins, add another encoding if you care about it. I'm not going to do it. And then as I mentioned we provide this higher level operation so the kernel, the file system can just ask give me a unicode map for utf812.1 and then it can do strn compares by passing the map the string one string two and you know if those string matches or not. So implementing support for this in new file systems becomes a bit easier. The basic idea for EXT4 we start encoding formation in the super block and we make each directory decide if they want to be poor, if they want to be case insensitive or not. We can do it on the entire file system by setting the flag on the on the root i node but if you do it on your root fuzz you're going to have trouble booting your Linux because we have libraries in slash user slash lib that only differ by case so we really don't want to do that in your root file system. So the plan is whoever is going to use this feature they can create their file system with the case insensitive flag or which is written in the super block your file system is still case sensitive then you go per node and change and I know that rebuild that tells whether that file will be case insensitive or not. And then on the implementation side we could just well STR comp now is UCD STR case compared for those cases. We still need to handle a few details so for instance EXT4 for very large for large directories they do an optimization they they don't perform linear searches they go through basically a multi-level hash tree and now and they the hash is based on the name of the file so but the hash of the name of the file as it was created so now we need to perform the hash over the normalization which is pretty straightforward to do now our new hash is based on the normalization plus case fold version of that name but that also means that until now our files we didn't have to do any changes in our disk accepting for writing the encoding on the super block which is trivial to do. Now we actually do changes on the layout of our disk so now if you are enabling case fold on your disk you need to do an update you need to update it offline. With regards to performance we need to we had we fixed the cache the same way we implemented two hooks one that will hash using the normalized version and one which compare using the unicode functionalities this is something that could be improved could be a standard in VFS so the code is shared with every file system that wants to implement this feature in the future. I want to implement it for butterfs and I know and I had some discussion with the access fast people already to have it there so before that I'll probably be migrating some of this the cache code to VFS but there are some troubles with what I'm doing right now since I'm using the compare and the hash in the VFS and this can only be assigned at the dentary creation time we are breaking overlay FS with this feature this is something that every feature using this hooks go through I know that the FS script the per directory encryption guys faced and I think this is something that we need to fix in the in the VFS itself also with this functionality I cannot trust negative dentaries anymore in all cases which kind of sucks because according to this guy negative dentaries are quite important for performance he made a very good point of that but they don't really work all the time anymore for instance if you create a file in a case incentive directory and created uppercase and then remove that that file you have the negative dentary for the uppercase version then if you make the directory case insensitive and create a lowercase version we are going to reuse the we might reuse the negative dentary which is a uppercase and this means that the file the new file in the K in the low in the lower case version would be created as the uppercase version which is not disk preserving anymore it's a bit of a trouble to fix this in the in the VFS layer so what the XFS guys did when they supported something similar back in the day was let's just disallow negative dentaries for the entire file system when using that feature they had this is the same thing I did in the XT4 but in truth is actually a bad idea more for EXT4 than for XFS because the XFS guys do other tricks in there on their side that the XT4 can't so they don't rely as much on negative dentaries as we do so this is something I'm working on to provide a proper fix this would be to invalidate when changing from negative dentary to positive turns out is a bit tricky to do that but this I have a patched the ready for this already it's just not available in 5.2 which is when this the code went upstream there's also some trouble with FSCrypt look up because FSCrypt performs an optimization in which they don't decrypt all file names they encrypt the file name you are looking for and use that as the hash to go to the file you want to search well we cannot do this anymore because then we would need to know beforehand what is the exact case version of the file that we encrypted and the solution to well let's just normalize and use the normalization to find the file is not useful here because the file the name that is stored is also is actually an encrypted version of the file name and if we change the hash we are also changing the file name which makes it again not this preserving so I don't have a good solution for this to make it perform well on encrypted directories there is one solution which would be okay let's just decrypt everything and walk through that but that sucks with performance there is another solution that I was working on but is broken so I'm not going to go through that in the interest of time because I like to go through questions and well the current status that we have for this is that we have the unicode subsystem the utf database and the xc4 support merged into 5.2 into 5.3 we started caching the normalized and case-folded version inside the xc4 which allow us to get some very good speedups in the lookup we have a working progress to make fscript support working which has been taken over by the google guys they are really interested in having this for android I have a working progress to split the utfa database as a module so you can you can avoid loading that huge blob before unless you really want a case-insensitive file system and I'm also working trying to fix overlay fs and avoid using the compare the hash and derevalidate for fscript so all of us can benefit from overlay fs and some of the stuff that I really want to do I want to put some of this stuff in vfs so we can so other file systems can benefit of it I want more file systems to support the case-insensitive feature because my customer needs that and I also allow want to allow some dynamic changes of the ops after they are actually create after they are initialized and when they are positive which would allow us to solve once and for all the cases where where overlay fs wants to register hooks and someone and some other feature already registered it well that's what I had for you guys I know it went real quick but I wanted to open a few minutes for questions I don't know if you can get the microphone but this is something that is still open and under design we discussed this during LSF MM so what we want to do is we want the the file system to provide a an attribute indicating that it supports case-insensitive and supports it and support it on the server side this is for Samba I don't know what the NFS guys are planning to do but this is still under design nothing is supported we don't support export fs anyone else okay so thank you