 or for a schedule talk, to read about the conference, why the world was crowded, because there is the competition running, and you will receive some nice rewards, we will reach you then. I tried to find the feedback, sorry. I could not find the feedback from the website. Oh, it's the. The.com.cms slash feedback slash. Yeah, I'm looking on it, sorry. Welcome, everyone. So we are going to talk about Nautilus, the file manager, if this changes, no, it doesn't. So what is it? So Nautilus is the file manager used in the most prominent distros, like Ubuntu, Fedora, Fensus, and most importantly, in its enterprise parts. It was created by EASL. It was a company that was created in 1999. They have 15 developers, and they work on it for two years. Then in 2001, after two years, they released Nautilus, and they closed. So it was not a very successful company. But in two years with 15 developers, they could do a nice product. Some matter of curiosity, the NUS Foundation has to contribute to Nautilus at the start. So Nautilus is a 16-year-old C code. So as you can guess, it predates a lot of libraries that we currently use. And we'll see the issues about it. It also manages the desktop. That means when you have your background in the desktop, the icon you have above it is just another Nautilus window. So OK, we are going to talk about threads. How do we work with it? The libraries we use for them. Find management, which is the most important part. As well, how do we work with it? Operations. The network, because now it's very important to have network. How we access network. And how we do find files. So the search of Nautilus. Threads. So I guess everyone is used to the term thread. But they want to make sure you know what is a thread pool. Because it's important to know for knowing file manager on Nautilus. So a thread pool is a way to limit the number of threads that is running at the same time on the computer. Usually, new CPUs has a maximum of maybe eight threads. So you want to keep a balance between your application being fast, doing each operation, but not having the computer for the user. So I would say a nice formula for that is half of the threads of the CPU. It's nice. Like four threads for Nautilus, for example. Four threads for everything else that the computer wants to do. What kind of threads do we have? So there is P thread. It's used the raw thread from the kernel. It was created on 1995. It's nice. It doesn't have actually nothing bad about it. But it has some things that we actually don't care, like setting the stack size of the thread. Then Glib, which is a library, is really well known on the middle world of Linux. Makes some abstraction for threads and provides a better API for us. Introduced in 2006, GIO scheduler, which is a way to manage threads and operations, tasks. And it introduces a cancelable. It's a way to cancel the operation or the thread. So for example, you have a thread that is running something. You cancel the operation. Then in the operation, you can check whether the user canceled it. And then when it returns to the callback of the thread, you can check whether the operation finished successful or it was canceled. So you don't access any data that maybe was freed at some point. Then in 2011, Glib introduced Gtask. Gtask is similar to GIO scheduler, but it's a little better because it introduced the concept of task and ownership. So a task is a thread, an operation in a thread is owned by a task. In this way, we have a better memory management and we can do changed operations better. So remember, pthread 1995, GIO scheduler 2006, GITAS 2011. So I have a question for you because I want to know if you think in the same way I thought when I started with Nautilus. Good things Nautilus use, and you have three choices. It's multi-choice, so any combination is valid. So for the first choice, pthread, please, right hand, and like this. Wait, wait, wait, with one finger. For GIO scheduler, right hand, this finger. So you can do like this, like this. And Gtask, like this. Okay, let's vote. Okay, who is rocking like me? Those ones are right. Nautilus use the three of them. So we will see what's the problem with it because we don't have a nice way to manage a shared thread pool between them or the threads inside Nautilus. So you may guess what happens. So file management, a jelly file. As I said, jelly is really well known on the middle world of Linux. And a jelly file is used on abstraction later for the file descriptors on the kernel. It provides us, like, the path for the file. We can load contents. But the g-file is not the actual real file. It's just a path, okay? It's just an abstraction. It also provides to us abstraction for operations, like copy, mount. So you can g-file copy, use a file, move a file. It's really easy. You have to put the path, and that's it. So that's really convenient for our site for Nautilus. It also has attribute distractions. That means the information about the file, about the real file. So you do, like, g-file query info. You get the information about the file, what type of file. Maybe it's a picture. Maybe it's a text. The size of the file. The permissions, which is very important. And a lot of those operations are asynchronous. That means when you start an operation, your program still goes on, goes on, goes on. And when that operation is finished, then a callback is called, and then you can react about it. In this way, we don't block the application because of the hard disk. This is, of course, a need in a file manager because you access a lot of input-output operations. But glib is just an abstraction layer. So who is the one that actually implements the operations? So it's a library called gbfs in Linux. So it implements all the operations at the abstraction layer for the file. But what is actually useful about gbfs is that it has some tools, like, for example, the recent files. So you can imagine, you start Nautilus and you want to access your recent files. What should Nautilus do? It should go through the whole file system, took a look at the access time, and then order them. Well, it's impossible, but you can guess. You cannot do that. So gbfs is a daemon. It can track those files that were accessed previously. So for us, from the Nautilus side, it's really easy. You only have to access a nice path. It's called recent slash slash. And you have your recent files there. Nothing else. It's really convenient for us. Then gbfs also provides a fuse, which is the virtual file system for Linux, and translates this path, like the recent, the trash. And also, it provides us with cloud providers, like Uncloud, Google Drive. I will talk about it after. You can also use it as a command line tool. So you can get any information, monitor files from the command line. So for me, it's much better than anything else, like LAS or whatever you want to use. Gbfs is a magnetizer to use on the command line. Okay, one important thing, the cache. So what is a cache? And why do we need it? So for a file manager, you... Well, the most important thing is in the hard disk, right? But accessing the hard disk, well, it's really expensive. So a cache is a way to have on RAM, on the fast memory access, everything that needs about those files. So for example, the first time you are going to access a directory, it's going to be slow. Yes, it's going to be slow because you cannot do anything about it. But what about the next time? So if you can have the information about the directory and it's children on RAM, then the next time you access it, it will be fast. So we really need a cache for a file manager. It's implemented as a global hash table. So you have a global hash table. If you try to access a file there, if the file is not there, then you go to the hard disk and put it in the hash table when you have everything ready. But what's the problem with cache? But as a concept, invalidation. So what happens is a file changes on the hard disk. So you need to synchronize the application because you are working with a file in the RAM that is no longer valid. The information is no longer valid. So you need a way to synchronize. What's the solution for it? So we need a set of solutions. First, a notification system. So you need a way to, in China, connect objects to files if they are interested in those files. So for example, when a file changes, then you notify those objects. This is a notification system. And we use glib signals. It's really convenient. You use gsignal connect to the change signal of the file. And whenever it changes, you notify the whole application. But science, this is 3D, and you have a lot of things going on, a lot of files requesting information and all of this. You need a queue. So you have the change queue. The change queue is for the cache. That means when Nautilus does some operation, then it put this operation, the result of this change in the change queue. And we also have priorities in the queue, because it's more important for us knowing, for example, the permissions of a file than knowing the type. Like, I don't know, it's text. Well, I care more about the permission if I can access it. So we have the high priority queue, the low priority queue, and then in this way, we can manage them on all the threads at the same time. This is for the cache. But what happens with hard disk? Because we have a lot of files that want to access, sorry, Nautilus want to access a lot of files in the hard disk. So we have the work queue, and also have its priority. That means when a file requests some information to the real file in the hard disk, it puts in the work queue, and when the thread can, it will try to get it from the hard disk. So here, we have already a few threads going on. But well, this is how we implement the cache. And I think it's a nice way to do it. So now that you know how the cache works, what is a Nautilus file? Because we have to do something, right, in the Nautilus part. So Nautilus file is used as a G file, as we saw it before, and the cache for the information. Then we have an Nautilus directory, which is used as a G file as well, the information of the cache, and a set of Nautilus files. So why we make a different session between the file and the directory? Well, it's really convenient because for cache invalidation, when you invalidate a directory, you actually want to invalidate as well each other. But that doesn't happen with a normal file. Like if it's a text file, you invalidate the file itself, but not nothing below it, under it. Because there is nothing under it, right? So in this way, we have the different session. It's nicer in this way. And also for monitoring, so when you are in a directory and you are watching the file for changes on the file, you want to get notified about all the children. But for a normal file, you don't want that. So monitoring. So we need a way to watch the file on the hard disk, right, for any change. This is called monitoring, and we actually use glib as well. It's called g file monitor, and that's it. You put the path, and whenever it changes, it just notifies you. It uses the kernel iNotify, which was a tool introduced in 2000, I think. Yeah, 2000. But it has an issue. You can only have 1024 watches for the whole process. What is the issue with that in Autilus? So imagine that you have a directory, and you go to the command tool, to the terminal, for example, and you move a file from that directory to another one. So now in Autilus, we'll see that a file got removed there. And we'll say, OK, there is some file deleted here. But is delete or is move? So iNotify and glib needs to monitor also the destination to know whether a file got deleted and moved here, and then the operation is a move, or whether it uses a deletion. But since we only have 1024 watches, we cannot watch the whole file system. So we could do something smarter if we could watch the whole file system, for example, for this issue. For us, it's not this big issue, but for Tracker, which is a library we will talk about later, it's a really big issue. Operations. How we do operations? Well, as you can guess, glib provides us with that. And Autilus operation is actually a batch of glib operations, because it will be overkill to create a thread for every glib operation. And so for the user, it uses select a bunch of files and move them. So for the user, that's one operation. So we thread in the same way on Autilus. So what do we do? We use create a thread. We put the different glib operations there. And here comes the complex thing. We look for conflicts. What happens if the destination has some file that has the same name? So we have to record the user. What to do? Should we replace it? Should we skip it? But we are in another thread. So we have to actually record the user in the main thread, and this thread block it. This makes it complex, because you have to actually block all the operations until the user answers and makes this communication with the threads. Then once it's done, we put it in the change queue. And that's it. What's the challenge with the management of files in Autilus? For me, after this one and a half year working on it, memory management really is the worst thing on Autilus. Because the problem is if a file doesn't get freed, it means we leak that file, Autilus crash at any time later. But who is the owner of files? Well, there is like 300 references to the same file at the same point. And the plaging system, a plaging of Autilus, can have a reference to Autilus file. What happens? So imagine that you have a file in the cache, and then you change the directory. So that file no longer has to be in the cache, because you change the directory, you don't care about that directory anymore. So the cache system, what it does is like, OK, we mark this file as gone. This file is gone. But imagine that a plaging of Autilus has a reference to that file. So that file doesn't get freed. OK. So you try to access that file later, and then Autilus has an asset, like if this file is gone, you scratch, because you cannot do anything about that file. And that file is gone because it doesn't get freed before. So really, the memory management in a file manager for me is the worst part to deal with. OK, network. So network in a file manager? Well, yes. I mean, probably when you are on the network, you go to the web browser. But there is some network locations that are used as a bunch of files, for example, FTP servers, or some bar shares, the Windows network directory. So it's really useful if you can have them integrated on the file manager, right? They are integrated as amounts of EVFS. It's really nice to access in this way. But even more, what about cloud providers, like Google Drive and Cloud? Well, it would be really cool to have them on the file manager, right? We have them because they are used as a bunch of files, right? So what we do to mount this kind of network? So we create a new asynchronous operation to monadiretory that EVFS provides us with the path, an H path, like Google Drive slash slash. We ask for the password, is it needed? And then we don't load the file metadata, not the files itself. And then we provide it as a tray translated with this easy path like Google Drive slash slash. So from the file manager point of view, EVFS here help us a lot. But what happened with cloud providers, treating them used as a EVFS mount? So are they used a bunch of files? Well, yes and no. Because you want to know if the file is synchronizing or I don't know, a bunch of other things that their applications of Google Drive and Cloud are doing. So we don't have integration for it. And they are going to either way make its own private way to synchronize. So providing the backend like in EVFS, yeah, it's good because we have some integration but it's not the best solution. So what will be a solution for that? Well, we can use Deepass service. So Deepass is an IPC system. It means that it's a way to communicate between different processes. So we can make, for example, Dropbox, connect to the file manager in a basic API and then say which files are synchronizing. We are working on it. It's something to work on the future. But I think it will be a nice point versus Windows, iPhone or whatever, machine tools because they don't do it. Okay, let's move on. Search. What do we want when we perform a search? Well, one of the most important things I will say is when you put, for example, text because you are searching for it, you want the file that contains in its name this text, right? This is like the most important thing or the most basic thing. But what about the users that, for example, has a presentation file? I don't know, for university and they are working about a psychologist called, I don't know, Florian for say something. So the user wants to search for Florian and then provide this file. That means searching on the content inside that file. So we have to do that. Also, we want the search to be recursive. That means if you are in a location, search on every level below it, under it. Why? Well, searching on the same directory you are on is more or less easy. But what about if it's three levels deeper? It's really hard to search for a file in this way. So we need to provide this tool. But we want to give preference to a closed hierarchy. That means if a file is found in the current directory, we really want to give a better rank so it goes up in the solution, like in the final view. So we want a ranking system. And last but not least, we want a 3D, but not hanging the user computer. So, okay, what's needed for it? For the search or not? So first, for matching the name, it's really easy. Just check whether the name of the file contains the steps you are searching for. Then content matching. So how do we do search inside the content of the files? So this is very tracker. What it does is tracking parts of your file system. Of course, it cannot track the whole file system for the issue I said before with iNotify. So it only tracks like documents, downloads, pictures, these most used folders. So it also tracks the contents inside the files. And it puts everything in a database. So it's really fast to search. Used to make up points. Search on tracker is more or less one second for me. A normal search, like iterating over the editories, takes two minutes. So it's really a big difference, and it's really useful to have tracker here. Of course, if we didn't have tracker tracking the content of the files, what will happen is that we should go to every file, parse all the content and search for the word. Of course, that's completely impossible. We need a way to track it. So as I said, we want to conceive some rank for the hierarchy. That means, as I said before, a close to the hierarchy means higher rank. So what algorithms is for this? Well, there is an algorithm that is called breadth-first-search. It's really common. And what this does is that you are in a location, then you visit the first level of this location. Then once you are done here, you go to the search editories, and only for that level. And when everything is done there, then you go to the next level. But you don't go like this. You go level by level. So in this way, you found the files that are close to the hierarchy earlier. We also, of course, need a way to make the ranking. So we make a rank based on the word proximity between the text and the filename. And then for the full text search that tracker provides for the content of the files, tracker provides user rank as well, which is easy. We do some kind of zoom between them, and then we put in the final result. And as a nice thing, the search is implemented as an utility directory. So in this way, we can reload it. We can show the files. We can invalidate the cache. We can do everything. So it's strange, but it works well for us. Like, created a virtual directory that is nothing to use results of search. So OK. We have everything for making a search. Let's do it. We have three engines. The model engine. This is the engine which search on the cache, on the current cache. As you know, the cache is very fast. So we want to use an engine for it. We have a simple gene, which is the one that goes recursively on the whole file system below you, under you. And then you have tracker. OK. We create a thread for everyone. We prepare to catch the hits for every file that is found. We connect to the signals, everything. We start the engines. And of course, normally it was, but let's do a little evil example. Let's search for the character A. What happens if we say, well, A is the sorted one, and A is the character most used in every language, in European language. So Natilo is going to be crazy. So let's go. What happened? Well, I don't know if you knew this scene from the Simpsons, but this is what happened with the threads. It doesn't go. It cannot work. So the computer hangs. You see that the computer is hanging, not the application, the computer. Well, I'm there for the application, but you first see the computer that is completely stuck. Why? So we have threads for the search engines. We have a thread for some learning. Some learning is the preview of the files. We have the threads for the work queue, the one assessing the disk. We have threads for the change queue, the one that is for the file cache. Imagine that you are doing some operations, like moving files. You have more threads, but they are not limited by a search thread pool. So what happened? We have more search than the CPU can handle, so we don't limit them. So we hang the computer. This is a real problem, Natilus. I will say it's one of the biggest problems under the hood. But, yeah, I know. Don't leave me with a bad taste in this talk. So we are working on it, and we are already working on it a few. So what's already there? Now the search has stopped when the user says, and engines are reused. Because before, when you perform a search, engines are searching, and then when you stop, Natilus, what he was doing is just forgetting how to search and creating a new one. That means if you have 10 threads, you have 10 threads more now, 20. You're not going to work. So he was making it slow, even with normal search, like, I don't know, hello, which doesn't make Natilus slow. And now the engines are reused. So that means you don't go over creating and allocating the search engines again, stopping the threads. No, you only change the query. Also, we improve the catch invalidation. Because as he said, the search is implemented as Natilus directory. So what can happen is if you are searching for a file, and that file changed, then you invalidate the directory, and then the search goes again and goes again. And Natilus was doing a lot of invalidations because you have to make sure that your information is valid. But, well, you can make the performance a little better. So we improve the catch invalidation for that. And we are working on porting to gtask now, which is the first step to be able to share the same thread pool and limit the threads, which is really important for us. What's the time? Yeah? Oh, cool. So after this talk, what I want you to keep in mind, what we learned about how to implement a file manager, what the challenge is, use my experience working on it. So, yeah, use threads. They are really good. You need them. But use a thread pool, please. Use the same thread pool because it doesn't make any sense to use threads if you are not going to limit them. It doesn't make sense. Of course, you have to use a cache. If your application is accessing a lot of input output, if you don't use a cache, you aren't going to go anywhere. You need to use a cache. And well, for Natilus, I understand because it's system just all. So the libraries are, it predicts a lot of libraries. But please, use already well-tested libraries. For example, glib. And try to use it with coherence. Because most of the problems with Natilus is use the custom code that is using. It's really hard to maintain this kind of code. And last but not least, in my humble opinion, Thin2ice, if C, is the language you need. Because I think we are in a time that we shouldn't allow the tools we are using to rely the memory management on us because it's one of the most pitfalls in applications. And it's really hard to, I mean, it's half of my time it's just fixing these things. So Thin2ice about it. Okay, we are finishing. So if you want these slides, they are in my jihab account. If you want to use them, read them. If you want to know more about Natilus, go to the Wikipedia account. And if you want to talk with me, discuss with me whatever, go to IRC at gimp.org on Natilus channel. My mail is Cisoriano at nom.org. And I really like to play ping-pong as a matter level. Don't go professional. So if you want to play ping-pong during the conference, just catch me. If you are a hater, we can go to the office. There is some table we can play there. And now if you have any questions, please ask me. And thank you for your attention. Sorry, I'm not listening well. I'm not listening well. Can you up your voice? Yes. Yes. Yeah, OK. So the question is, if type of head is going back. Well, I think it's a different pattern. When you are doing type of head, it's only searching for the star of the file name. You have to know a lot of things. Only for the current directory. It's really a small use case. I mean, probably it's very important, but it's really a small. So what I would like to do is use to improve a lot the search for the current directory. So you can have almost in the same time the results you will have with type of head, but better because it will search any part of the file name. It will search the content of the file. So I think that's the way forward. Yes, it was instant because it was doing nothing more than use on the widgets that was already there. Yeah, yeah. So I think the path forward is used to improve what we can with the current directory. I think we can do it. I think we can improve it much more. I answer your question. Yes. OK. Cool. More questions? Asos and you. Let's get a question. You have a scarf. More questions? Yes, tell me. Yes, GBFS. Command line. Use GBFS. Mount the network or whatever you want. And GBFS has a lot of debugging. So you can actually look the process. Ah, home directory. OK. Still, the answer is GBFS. So GBFS is for everything, for every file. So use GBFS info, GBFS ls, everything like that. You can surf for it. You can debug a lot of it with it. Yeah, so GBFS is the answer. It's really convenient. Yeah. More questions? So the question is, if we are going to support more kind of implications, no idea. Sorry. It's not in my roadmap because I actually don't have any about it. Wait a moment. I have to give more scarf. More questions? Sharing code in tracker. Yeah, but for me, tracker is another beast. And tracker, use what we use, like glib, GBFS, if I remember correctly now. So I think what we can share is already shared. I don't know the code of tracker, so I don't know more. But no, it's really a different beast. Tracker is a database. Nothing is not. Tracker uses a database of metadata. It has some nice features, like connecting objects of, like, you access a user, sorry, a contact. That contact is in some place, so you can access the map where he is. That's not natural. So it's really a different beast. So what can we share about the use of files? Tracker is much more than just files. It's already shared. So nothing else there. More questions? Now without the scarf. I don't know if you want to make more questions. More questions? That's kind of tracker, right? But yes, but you need to track the change as well. And you cannot track the whole file system. It's a problem from the kernel. You don't know. You don't know. That's the problem. So you search, and you have to be sure the files has the current information. Maybe that file is already gone. Maybe the file was deleted. So you have to take that database. How do you do it? You have to monitor every file. But we have the issue of inotify. It's the problem with tracker. So really the solution will be to have more support for something different to inotify in tracker. Then we can use tracker. Use as a matter of getting files from there. But the problem is inotify. It's still inotify. So we cannot do anything about it yet. And not only the problem with being fast or not, sometimes it's more Nautilus itself with the threads and everything than actually going through the directories. More questions? Another question? Can you repeat? We don't. We don't. Yeah, we don't. So what happens is that Nautilus use cache and watch the directory he's looking at. That's it. We cannot watch the whole file system. So there is no solution for it. More questions? No? OK, so thank you for your attention. I have an answer. I don't have an answer. Because I don't. It's really hard. It's really hard. The right language for the job. Yeah. I'm a bit delighted with my suggestion. Because I remember what inkscape. You know this inkscape. Yeah. They do, they have garbage collector wired to C++ and they manage tricky stuff with garbage collector and do manual memory management. And I wonder if you can do the same. Like Terak G5. I guess. But for me it's already complex, the code of the project to use at another layer of complexity. What I would like is the programming tool. Like the programming language I'm using, just write me this super. Like I don't have to do anything about it. It could be garbage collector, but then we have performance issues. So there is, for example, this new language Rust that is using mix of it. So we can have something like this. But it's really a hard problem. It's really a hard problem. This has spent half of time for a minute. Yeah, yeah. That's true. That's true. But it would be too much work to add new thing, I think. But yeah, thank you for the idea. Oh, yes. You should try it. Say something. Test.