 use results in reduction of SSD lifetime and performance degradation. So the main question, why is it another file system? If trying to consider what we have currently is available file system, it's something like NFS2. It's focused mostly on reliability. F2FS, it's mostly focused on performance. BKSFS focused on reliability and performance. And important, probably the important goal prolongs SSD lifetime. So SSDFS, the first goal is prolongs SSD lifetime. Second goal, it's guarantee strong reliability and finally guarantee stable performance. This is three important goals of SSDFS. So how SSDFS can achieve this? First of all, SSDFS is flash friendly. Open source kernel space file system and goal of prolonging SSD lifetime can be achieved by decreasing write amplification by excluding garbage collector overhead and by means of decreasing retention issue. So write amplification can be decreased by using compression, compaction scheme, delta encoding. Garbage collector overhead can be excluded by means of excluding FTA, garbage collector responsibility by means of minimizing garbage collector activity on file system side. Strong critical reliability can be guaranteed by check-saving support, metadata applications, snapshot support, and stable file system performance can be achieved by means of deploying parallelism of multiple non-dice, excluding garbage collector overhead and minimizing write amplification. So if we take a look at SSDFS architecture, first of all, every SSDFS volume contains a segment, it's a logical concept. It's every segment always located in the same position of the volume and this is guaranteed and this is a way to implement logical extent concept. It means that if we store some logical block into some segment, then this logical block will always located in the same segment, never will be moved in another segment. And this provides a way to not to update, for example, block mapping information and it's one of the important way to decrease write amplification. So there are several types of segments that SSDFS is using and the goal of this to aggregate in the same segment, the same type of metadata or data and to manage this data in more efficient way. So every segment contains one or several logical res blocks and logical res block, it's something like basis for migration scheme, migration stimulation and this is the way how it's possible to eliminate garbage collector overhead. Any logical block can be mapped to any physical res block and this is one of the important point that can provide the way to implement migration scheme and migration stimulation. So finally, every res block contains some number of blocks and every block contain some block mapping of the transition table payload and this provides a way to use compression, compaction scheme and data encoding and this is the way of decreasing write amplification. So first of all, metadata structure of SSDFS start from super blocks, there is special type of segments that contains actual, some sequence of super block states and for every mount and mount operation, it is stored something like actual state of super block. Also additionally, every log contains copy of super block information. So finally, super block is metadata structures that are heavily located between res blocks. So second important metadata structure is mapping table. Mapping table responsibility to map or associate logical res block with physical res block and second important goal to implement migration scheme by means of association to res block inter-migration sequence. Third important metadata structure is segment bitmap. Segment bitmap keeps information about state of every segment and it's easy to find clean segment or dirty segment that can be raised or trimmed or something and it's possible to find for example, there are using state of segment use state of segment, so it's possible to find the segment in different state and this can be used by garbage collector. So next important metadata structure or the rest of metadata structure is represented in the formal B3s. The first is side nodes, B3s that contain side nodes and every node could contain dentists B3 or extends B3 and it's also can be used for example, extended attributes B3. But initially, this is the first time to store some user data or metadata in line without using dentists B3 or some allocate technological blocks if it's not enough space in my nodes and finally it needs to create some B3 or allocate logical blocks for user data. So there are three types of segments for B3 nodes. It's leaf node segment, hybrid node segment, index node segment and the same leaf node segment could contain nodes from different B3s and it means that if we are doing something like flash metadata flash, so updates from different metadata structures will be aggregated into one segment or maybe several ones, but it's something like aggregations that provides a way to manage data or metadata updates in efficient way. So it was used special testing use case scheme. First of all, it was used metadata use case and user data use case. For metadata it was create update delete empty file. For user data it was used create update delete file. It was used several sizes of files, 64, 16K, 100K. For assess deficit was used also three sizes of race blocks, 128 kilobytes, 512 kilobytes, 8 megabytes and finally it was used multiple mount and mount scheme. It means that for example if you need to create something like 100 or 1000 or 10,000 files, the whole operation was split on some number of iterations and during one creation we create 10 files, 100 or 1000 files and multiple mount and mount approach was used to guarantee that file system cannot hide something called all, for example dirty metadata, dirty user data will be flushed and you can see the total number of read IO request and write IO request without any hidden sample. So it was used, it was tested all these use cases but currently I'm ready to share only create empty file use case. So about methodology, first of all the first goal of SSDFS prolonged SSD lifetime. So it needs to suggest some scheme of estimation, a lifetime of SSD device because if you can compare for example how long can be lifetime for one file system, we can compare with another one and then we can make conclusion can some file system prolonged lifetime or not. So first of all, we need to use something like raise a limit because in the SSD contains some number for this block, some capacity and maybe raise block as a limit for raise operation. So finally multiplication of capacity or this limit can provide some raise limit for the whole SSD. Next we need to estimate how many raise operation can use case generate some total number of raise operation and if we divide limit on total number of raise operation for some use case, we can calculate for example how many times some use case can be repeated for this file system. And if some file system can repeat the same use case more times, so it means that file system can prolong SSD lifetime more. So if we are talking about total number of raise operations so we need to take into account how many raise operation can generate less translation layer for example garbage collector of FTA how many trim operation can initiate file system how many operation can generate garbage collector of file system side, read disturbance could also generate some raise operations and retention issue could also needs in some raise operation. So if you are talking about FTA and GC so it means that we need to estimate amount of in place updates because calling place updates it will be responsibility of for less translation layer. We can estimate this as difference between the total number of right file request and payload. What is payload? It's something like some unique number of raise blocks minus same operation. So something like some number of raise block that was updated during the whole use case minus in place updates. So we can estimate garbage collector activity by means of for example difference between payload size and really data size. It's of course it's something like not exactly accurate estimation it's something like more upper bound estimation because first of all it's not so easy to estimate the data amount because file system using different techniques but it's possible to make some assumption and for example this difference can show for example potential activity of garbage collector because it's possible to imagine that every valid block that was invalidated during update operations could be located in the same place block that contains another valid data in something like worst case or this valid data needs to be moved. So finally it's potentially can generate some significant amount of activity. So read disturbance can be estimated like division the whole total number of read operation on some threshold. Retention issue can be estimated like a division duration of use case on three months duration and multiplication this portion of time on payload size. So we have pretty reading methodology to estimate for example SSD lifetime. So first of all take a look at the right request case. So SSDFS is using compaction scheme, compression is using data encoding. So but in for the case of for this testing that they call delta encoding hasn't been used. So usually it was used mostly compression and compaction scheme and this is the way how SSDFS can decrease right amplification decrease some amount of right request. It's possible to see that SSDFS behaves in a pretty efficient way and mostly on the NLFS tool can compete with SSDFS but mostly for something like 10,000 or 100,000 use cases because in this case especially for big areas blocks, SSDFS can generate significant number of partial blocks and it means that it could be created significant amount of additional metadata. But first of all delta encoding hasn't been used and another point that usually if you're talking about 10 or 10,000 it means that we are using multiple mount and mount operations that it's not a real life use case. For real life use case we never do such significant amount of mount operations. So for real life use cases SSDFS looks better even for NLFS tool case. And especially for because country talking about metadata especially for user data I expecting SSDFS looks much better than NLFS tool. So SSDFS is using migration scheme and migration stimulation technique. This is the main way to exclude garbage collector overhead and one of the evidence of this is the amount of claim operations because they come here namely pressure of regular requests in migration scheme and migration stimulation do the main activity for moving valid data for exhausted there is block interclean one. And it's possible to see that this technique is pretty efficient because for example it's techniques provide opportunity to train to raise all invalidated. First of all, alter is blocked can be validated efficiently. In second point, if it's completely validated migration scheme, migration is finished. Then this is blocked can be restored or trained and it's possible to see that this technique works pretty efficiently. So why this technique doesn't work for some use cases. It's because for example if you talk about eight megabyte of this block so the amount of operation is not enough to finish or even start migrations. So in this case, dream doesn't happen. The same for example, for some use cases for smaller is blocks. It means that for example it was not enough operations to finish migration. And this is why I'm using for example multiple mount and mount approach because namely this way provides a way to see how for example, dream policy is working not wait a long time because for example namely multiple mount and mount operation provides a way to see that this policy can work efficiently and especially it's possible to see that for example even multiple mount and mount operation cannot affect dream policy. This policy can work efficiently. I believe that it's not so for it to be fast enough because for this case a garbage collector can be affected by so short for example time of mount operation. So this is I think a pretty good point. So if you would like to consider payload. So first of all again, this is DFS is using compression compaction scheme and this is first point to initiate smaller number of writer requested. Meaning that we store smaller amount of metadata or user data for example. But second point, second player in payload it's something like dream policy. And for example, if this compaction scheme compression plus dream policy works together. So finally we can see that SDSDF is capable to create smaller payload. And we can see that SDSDF can generate bigger payload for some use cases because dream policy doesn't work for this use cases because it was not enough operations of right operation to start to finish migration summary blocks. So, but in average SDSDF is mostly looks much better as a file system and mostly for in most use cases it could create smaller payload. So especially for real life use case because it was used something like multiple mount and mount operations. And for this case, SDSDF could be not so efficient but it's not a real life use case. In for example, for smaller, there is blocks for example, SDSDF could work much better. But again, it depends from use case. It depends from environment and so on. So for estimate SDSD lifetime, we need to estimate FTL garbage collector activity. And first of all, SDSDF doesn't create this responsibility of FTL garbage collector FTL because it's pure LFS file system. But for example, for LF2FS or LFS2 another file system create with responsibility and it's possible to see that LF2FS creates pretty significant responsibilities or necessity for many situations on a tele-site. And LFS2, LFS2 because it has a place of data and LFS has two super block and the begin and end volume and the end and close volume and this super block updates in place so it's also created some responsibility. So FTL, LFS2 has garbage collector LFS site and this is something like try to estimate upper bound of possible overhead or possible operations that garbage collector LFS site can generate. It's not completely accurate data because it's possible to say that this estimation between zero and this upper bound, I think that potentially the real overhead could be between these two lower bound and upper bound but it's something like potential upper bound of this possible operational garbage collector site. It's impossible to see that LFS generate significant amount or generate significant amount of garbage collector activity but again, F2FS and LFS2 using such policies that garbage collector start to work until, though it doesn't work until achieving some threshold. So finally it's create some retention issue. So if we consider right amplification, so how we can estimate decreasing right amplification? For example, we have numbers for right request. We can add to this number estimation of garbage collector overhead on the system side mostly for LFS2 LFS2 LFS. And for SSDFS case, we can do the same but for SSDFS no garbage collector by request. So we can for example, divide total number of high request right plus garbage collector and divide on right amount of right request for SSDFS. And in this case, we can see right amplification ratio and we can estimate how SSDFS can decrease right amplification ratio. And it's possible to see that SSDFS can decrease right amplification significant even lower bound really looks pretty good. But if we consider, but this lower bound usually for multiple mount operations or it's for real life use cases this number should be and it's possible to see that even now that it's pretty better than the other four systems. So if we take a look into read our request, so in this case, because SSDFS it's log strike full system at least logs and there is some price for this because in this case it needs to use, it needs to read slightly more or read more than other full system can do. So SSDFS looks better like new FS2. It's looks comparatively same like XFS, but currently SSDFS generates bigger amount of return request and sometimes for big risk logs it can affect currently performance. And the main contributor to this it's offset transition table. So currently offset transferable health is some issue because currently this table distributed within multiple logs. And if you would like to build a set transition table for some risk block or for some segment it needs to read all more logs, to check all logs for presence actual state of portions of offset transition table. As a result, it's create a lot of return requests for some use cases, but there is solution. It's possible to store full offset transition table in every log to use compression to store this transition table in compact manner. And it's possible to achieve bigger improvement for this and it can decrease number of return requests. And also it's possible to use binary search to find latest log. So there is a way to decrease number of return requests and it's a pretty interesting way how SSDFS can be improved. So retention estimation shows that for example, SSDFS looks better for mostly for all use cases, but for some use cases it's possible to see that it looks like not so good, but there is interesting point here because when I'm trying to estimate retention issue, I'm using, for example, time, I'm using duration of use case. But for some use cases, for example, 10,000, so it's multiple mount and mount operations, it creates a lot of partial logs and it's create a lot of read operations. And finally, read operations effect or make duration of use case longer and it's create, for example, bigger retention issue estimation. So if, for example, offset transition table, issue is this, offset transition table will be solved. In this case, it decrease amount of return request, it improve situation with duration of some use cases and retention issue will be smaller. And in this case for all file system, SSDFS will be better compared with all file systems. So finally, we can combine all the numbers, we can combine all the estimations in the final number that can show, for example, how SSDFS can prolong SSD lifetime. So some use cases doesn't provide good numbers for this estimation because for some use cases, retention issue estimation and read disturbance estimation define the number for SSD lifetime. And this number mostly makes SSDFS better but better significant amount of time. So I decided not to use these numbers, I decided to use the most reliable numbers. And this number shows that as minimum SSDFS can provide SSD lifetime two times for real life use cases, even for even lower bound, really showing that, for example, compared with this X4 and 2FS, it could be one time, one point four times bigger SSD lifetime. But in real life, it will be significantly bigger. So it's possible to see that SSDFS can prolong lifetime of SSD. So it's possible to see that mostly for some use cases, for example, if we have a smaller number of mount and mount operation in this case, SSD has pretty the same comparable duration with other whole system. But as I mentioned already for the case of, for example, multiple mini mount and mount operations, something like 10 on 10,000 case. In this case, SSDFS needs to read a lot. And in this case, this affects performance or read path effects SSDFS performance. So it's possible to see that because it's log-based file system, SSDFS using blocks. And finally, this is a price that it needs to read multiple information, a lot of information from blocks. And finally, SSDFS looks like read domain. However, it needs to mention that first of all, SSDFS has been tested in debug mode. It means that it was used debug messages and it also increase, for example, duration of some use cases. SSDFS still has not fully optimized code. But even now, SSDFS performance not comparable with our whole systems. But if the problem, the issue with offset transition table will be solved, then situation can be significantly improved. Because if we can generate smaller amount of right request, so it means that we can be faster. The problem currently seems to be because SSDFS generate a lot of right request and it's possible to solve this issue by means of solving problem with offset transition table. Then numbers will be significantly better. So another question, is SSDFS solving SSD rate? So from architectural point of view, yes, it's completely ready because first of all, it's pure log-stract file system. It's possible to use logical release block equal in size to zone. So from architectural point of view, no troubles. But from the implementation state, it needs to make some implementation efforts to fully support zone namespaces SSD. Especially because it's possible to have something like an aligned zone size. It's possible to have a limited number of open active right pointers. So in this case, it needs to use slightly more smart technique for SSDFS. It's not big trouble, but it requires some implementation efforts. So the rest is pretty no troubles at all. And it needs to solve problem with offset transition table and everything looks pretty good from one point of view. So what's about future work? First of all, because I shared only create and defile case, it needs to process as a rest of benchmarking results and to share information or results for update, delete, case for metadata and for user data. So to prove that for other cases, SSDFS works better too. It needs to fix issue with set transition table. Of course, there are bugs in current implementation. It needs to fix these bugs. It needs to, for example, to make some implementation to support zone namespaces SSD. It needs to finish snapshot support because it's implementation is functionality duplication support, it's entertainment implementation. It needs to finish recovery first implementation and assistant chair utility implementation. So this is about future work. As a conclusion, it's possible to see that SSDFS can generate smaller amount of right request and this is the way to decrease right application issue. SSDFS introduce highly efficient new policy and this is the way to decrease retention issue. SSDFS is capable to create smaller payload. Again, it depends from race block size and depends from use case. There's a amount of right request and so on, but it can be smaller because of compaction, compression, delta encoding and efficient trim policy. So SSDFS doesn't create FTA which create the responsibility because it's pure extract file system without any place of data. And this is also creating zone namespaces D compatible. So SSDFS mostly doesn't create garbage collector right request because of migration scheme and efficient trim policy. There is only one use case that garbage collector need to participate if for example, to race block stuck in migration because for example, some data could be called or maybe warm and migration could start but for example, no enough right request to finish migration in this case, garbage collector can help. So we can see that SSDFS decrease right application. SSDFS capable introduce smaller retention issue. And finally, SSDFS can promote SSD lifetime as minimum two times, but I think that could be real. It's possible to say that SSDFS can promote lifetime more for example, 10 times, but again, it depends from use case. It depends for example, what file system we compare and so on, but I believe that SSDFS even now looks pretty good and can show pretty good numbers. So it's everything that I have now. Thank you all for attending this talk.