 So in this talk, I will start with a blue type reduction and cover hibernation, and then I will start, I will introduce a proposal for a new hibernation. And in this proposal, I'll talk about how to improve hibernation speed and how to extend the lifetime of fresh memory. So I used to use our older navigation system in my car and when I start our car, it takes a, I have to wait for a minute and 30 seconds to use it. And I think it's not a good user experience. And additionally, there are critical safety reasons for the automotive industry. For example, when you put your car into reverse, immediately up to start, and your rear view camera system must show you what's behind you right away. And also it can be used for marketing. Have you seen our one second Android boot? Have you seen, no? Actually, I have seen our one second Android boot on YouTube, you might Google it right now. Well, it's quite impressive, and it's a huge motivation for me. So I'd like to mention the traditional techniques and tools we have been using to reduce boot time. Because some of them we cannot use for hibernation. In terms of measuring, boot chart is a handy tool for profiling a Linux boot sequence. But we cannot use this tool for hibernation. And in hibernation, we don't need to put a lot of effort to optimize the user space. It was a maintenance burden. So before we further discuss how to improve hibernation boots, a little backgrounder for those who are not familiar with hibernations. Often hibernations, first step is to suspend all the devices so that they cannot change their system state. And then the system, the memory is copied and the device has a resume so that they can be written to swap space. And after resumption, the image is reloaded and the system is exactly as it was before entering hibernation. So basically hibernation is worked like this way. And I use this kit because one reason I use this kit is because both the suspend and the resume works fine. But if it doesn't work, then you might need to put a lot of effort and time to fix it. And my work is based on the top of the upstream links current hibernation. So it is generally known that hibernation is faster than cold boot. And cold boot means start a system that is turned off. So my question is how fast the fast boot is compared to the cold boot? So here is the result. The result shows that it's about two second faster than cold boot. But this hibernation is the upstream version and it's not optimized at all. So as you can see, the hibernation is faster than cold boot. But actually it's not exactly as fast as we expected like the one second Android but you probably not have seen those, but I have seen it. And there must be some place to improve in the existing hibernation, I mean the upstream hibernation. So here are things that are found through analysis and tests. The first one is the upstream hibernation is not very scalable on a multi-core system. It's limited. And the load image system, loading image system takes most of the time during the boot. And reducing the system, the snapshot image is can lead to the faster image loading. So I really try to focus on reducing the image size. And the snapshot image size, the snapshot image does not include the phases that swap out before entering hibernation. So I try to swap out as many phases as possible before entering hibernation. And also a clear page case as well because it's really helpful to reduce the image size. And lastly, I deduplicate the pages in memory and it also helps to reduce the snapshot image size. So what is about deduplicating pages in memory? So the first line represents the physical plane numbers and from second to fourth, then it shows the process of deduplication in memory. And at the bottom of the slide, you will see a table that has entry which points to the physical plane number. And it should copy to restore the page data during the region process. And here is the result. So the left chart shows the boot time and from the cold hibernation and the optimized one is just we have optimized by reducing the snapshot image size. And how fast it is, I mean the optimized version is two seconds faster than the upstream version and I think six seconds faster than cold boot. And the right chart shows the image size and as you can see the image size is reduced from around 900 megabyte to 200. So we can tell the existing hibernation can be optimized like this. So this is about extending lifetime of press memory. After press memory becomes very popular in better systems and but it has a limited lifetime and also it has a limited number of life cycle. So we just can switch in able to hibernation and let it create around 900 megabyte of booting time. So in order to extend the lifetime of press memory, I focus on two things. First one is try to use a raw search of long management to reduce the right amplification. And the second one is use a storage based data deduplication also to reduce the amount of image to be written. So I'm going to give details in the next slide. So this is about how we reduce the right application with the blog management. This are basically this approach is to ensure the sequential light. The chosen partition is but divided into divided up into closer and each closer is aligned with the elage block size. Closer is a composable block and a block, a block is four kilowatt in size and block which closer are written sequentially in a log light structure. Closer are not overwritten until discarded except a header. So not like the upstream hibernation, a new version has more closer type. So you might be familiar with map and metadata closer because it has been used in upstream hibernation. But data and user counter garbage collection clusters are added to the new design. And for example, the dedu cluster is used for deduplications and the users counter is used for keep tracking of each block usage count and the garbage collection is used to each cluster is reclaimed and discarded and this cluster is useful those ways. And we're still talking about hibernation not like five system, but this new version proposal has this clear up. This is how the partition looks like when it is used for data block management. The, as you can see, the header is the only block overwritten every time. And take a look at the data cluster and data cluster is a composable chunk table and chunks can be either compressed or uncompressed and they are packed into the data cluster. And this is about how to reduce the amount of image to be written. So basically this process is similar to the way we reduce the snapshot image size but it duplicates a page in stories not in memory. So after applied both are two things that I mentioned earlier, this is a result that you might be interested in the blue lines in red line because the blue line represent the deduplicating in memory and the red line means represent their duplication in stories. At the very first time the red line, red line just almost the same, but the second time the right size is dramatically reduced decreases as you can see there's second time and third time. And the green line represent the compressed one. So it's lower than the red one. Well, as we already expected, the snapshot image is getting fragmented by the deduplication process. So the image loading speed is getting slower because of a more randomized pattern. So as you can see this graph shows that the image loading speed is getting slower each time. So how to fix this problem? The snapshot image is getting fragmented but also we need to reduce the snapshot image size at the same time. First of all, I need to examine how this snapshot image is fragmented each time. So I use the usage count on each blur and use the HimMap to visualize their whole partitions. I will show you in the next slide how it looks like. And what I found is that there are some data closures which have only a few block to be used for the snapshot image. Every written time the number of those cross are increasing and each time the snapshot image is getting more fragmented. So there's a simple way to fix this through though it could cause an increase in snapshot image size. Just exclude the fragmented cluster from the data cluster pool. But it turns out there are too many fragmented clusters so I cannot get rid of all the fragmented cluster. So I came up with rating clusters with the usage count on each block. So each time a cold cluster are selected and excluded from their dedication process. So this is HimMap actually based on the usage count on each block and each tiny blocks represent their usage count on each block and each line represents the one cluster. And we can see the red block means it's a hot block and the blue one is cold block. It means they are used more frequently and the blue one means less frequently for dedication. So the image loading speeds is not just decreasing. So after apply, after use this or a method then you can see that it goes up and down but it's not just decreasing. So this is about the amount of data regions. The amount of write increase whenever hot cluster become cold because the dedication rate is lower. And I think there's a trade off between the amount of write and the image loading speed. But as you can see this approach improve the loading speeds a lot. So this is part of a block management I mentioned earlier. In this scheme, use cluster are reclaimed first before discarded by the garbage collector. Non-data cluster are reclaimed during the region process because they are no longer used but the data cluster are reclaimed at runtime when the number of either cluster is below threshold while hot clusters become cold. And also there is the order of the data cluster to be reclaimed, cold cluster first and then less hot cluster next. And this is about the garbage collector, how the garbage collection is work. The garbage collector is a background throughout of this system and it keep checking the number of reclaimed cluster and if the number is above threshold then it start discarding the reclaimed cluster by giving a discarded command to underlying crash storage device. So this cost command gives a hint to the underlying storage device. There are specify address range has nobody data. This command is sometimes called trim or unmapped. So yes, that's the end of that. So if you have any question, please. Yes, oh yes, yes, oh yes, yes, right. I see the point. Yes, that's very good question. So we're still working on it. But from actually we focus on the actual first screen, I mean the first screen because of course, if we are swab out all the pages to the device, then it takes time, especially when we are launching an application in Android, it makes slower than without the swapping. So that's true. So but we are still working on that problem. Thank you. Yes. Oh yes, yes, basically, you're all right. Yes, oh yes, yes, that's right. Actually, my work is based on the upstream corner hibernation, it's exactly the same version, but I hacked and added a more feature like this one. So and also we are trying to, use this scheme into the, maybe boot loader. Yes, it'll like you, but maybe you build or yes, actually this feature can be like a library. So we can use the API basically. Yes, yes, yes, currently, this implementation, we implemented the device. This implementation, we implemented this feature into the links corner and then the next version, we tried to implement it into the U-boot or like boot loader, yeah, all the boot loader, maybe. Possibly. Yes, that, yes, yes, right. Yeah, that's our final goal. Yes, all right, yes, all right, okay. First one. So far we only use the EMMC, but we can also use UFS too. Yes, yes, but as you can see, the test results came from the implementation we already have done into the corner. So yes, but also we're still working on the, try to implement this version into boot loader, but we're still working on it. We don't get the final result. Yes, yes, all right, correct, okay. That's right, yes, it takes more than the normal. So we actually worry about the time, but probably it's really hard to reduce the time, so we're just still working on it, but it depends on the hardware specification, but the boot we use is not very low level specification, but I think we can almost ignore the time delay. So maybe if we use more higher specification of hardware, then I think we probably ignore the difference. Thank you, okay, I'm sorry. Could you? Yes, yes, the last part, could that, yes, yes, yes. Oh, so based on the board we use, it takes our, okay, we can reduce more time. So the final version was, I think it takes five seconds, around five seconds? Yeah, from the power off to the launch screen, just five seconds, it takes, yes. The suspend and the resume time is not really, compared to the loading snapshot, it takes more than, not sure, but actually most of the time it takes, so suspend and resume time, it just takes maybe less than one second, yes, less than one second, but I think we can reduce the time, if we focus on the suspend or the resume, yes. So, oh yes, yes, oh yes, that's a great question, and I'd like to ask this question to my boss leader. So, actually I mentioned this, we discussed this, but the company, they say that it's not available right now, and not in public, but probably after we launch our product, then it will be available in the open source. Oh, yes, yes, yes, yes, that's a problem. Yes, yes, yes, for example, if we use our, you know, we use compression, but if we have their compressed engine, then we can accelerate the speed of loading the image, yes, I'm not sure, but yes, it's not yet done. And also, there is a case, if we use more cord, then it accelerates the speed of boot time. I think the average is 30%, yeah, 30% overall, but basically when it boots, there are too many zero-pays, so the compression rate is very high, but it depends. All right, so, if there is no question, that, okay, then thank you so much for listening.