 and there was introduction, so I'll have to cut it short a little bit. So I just want to present the current state of the things, why I'm from Android, so why Android cares about it and kind of understand how we can move forward. So how it got into Android is because a number of actually multiple OEMs and SSC vendors noticed improvements in application launch times on their workloads and application launch time is very important metric for Android obviously, probably much more important than for like server workloads or any other workload. So multiple vendors asked us to include that into Android common kernel and we also did our tests and found out that it's beneficial for some of the applications, especially multi-threaded ones. So I looked closer into why Android benefits from this and the reason is when Android spawns a new thread, that thread maps a number of VMAs, not a big number, but if you have a big number of threads and even three or four VMAs in each thread and some of the threads start page-folding right away, you will get a map log contention because when those VMAs are created by one thread, it takes red lock and the page-folds take red lock, so they will contend and step on each other's feet. So as you know, Michel posted recently several months ago a patch set, the latest version of SPF and Mel Gorman ran his tests and benchmarks, posted the results, looks like the only benchmark which highly benefits from this is page-folded PFT test in the multi-threaded configuration. There is also I think a small regression on hack bench up to 4%. So and I think the main pushback was basically outside of Android, we don't see other beneficiaries of this and it's quite complicated, so there's a maintenance cost. I tried to also run some additional benchmarks without much luck to show the performance improvements and because most of those benchmarks don't test this particular case, when I created a specific benchmark to test it, it shows like 84% improvements, but one can argue that this is kind of artificial test to show how this use case is handled by SPF because SPF was created for this use case. So why we don't see more reports from the fields like outside of Android that somebody tested it and it showed benefits. I think there could be several reasons. One is SPF has now been merged upstream, so it limits its visibility. Many people just don't know that it exists. Another thing is some people might not care about start time of the application, so that's set up when you allocate a number of VMAs and you're also page folding. Like for servers, it's probably very small amount of time versus the whole execution time of the workload, so they might not look into optimizing this kind of miniscule part of the whole execution time. And the third reason might be that the people who really cared about it, they probably worked around it. And couple of workarounds could be like instead of M mapping multiple VMAs, one can M of one big VMA and then supply multiple threads from that big VMA part by part. Another workaround might be like instead of spawning threads, spawn process. For example, web browsers. Today, if you open a web browser and ask it to restore your tabs, it will create a number of tabs, whatever you had before, and it will start filling them with page content, that is page folds basically. So technically that would be a perfect showcase for SPF to show the benefits, but because each tab is basically spawned as a separate process with its own MM and its own M map look, we don't see those benefits. Of course, I'm not saying that the design was done specifically for this problem. It's probably for security reasons, but this is kind of a secondary benefit that you get. You don't get a M map semaphore contention. So basically right now we are, in Android, we are merging this out of three patch set, which is quite sizable. It's actually the biggest out of three patch set in MAM in Android today, and we are very interested in eliminating that debt. We approach any out of three patch as a technical debt that we want to eliminate. So basically right now we want to understand how do we move forward. Does SPF have a future? Do we go with a different approach? Do we just patch it in the user space? Just like I said, we can figure out some workarounds, but obviously this is not, this is a kernel issue which should be fixed rather than we should be pushing it to the user space to figure out the workarounds for it. So that's, I guess, where we are right now. I wanted to say I've seen in several applications I've seen a lot of people hitting M map look issues and they always find a workaround, like they always find a way to deal with it in user space. That doesn't mean the issues are not there. It just means, you know, the people, they divide their stuff and then at the end when they think it should work, they spend an extra month finding, you know, some weird performance issue and having some weird things in the deployment to make it work. If it's there, they say they're going to do a M-local and it makes it work. Somewhere else, if they do a M-local, it makes it not work. They have a different workaround. They always find a way. So it's not like, it's not like the situations impossible for people to deal with. It's just always extra frustration that people have to deal with and that the kernel should not really impose on them. So, yeah, I feel like we do have solutions that work, you know, better than what's there. I just wish we could go forward with pushing it because, I don't know, it's there. Is there people already use it in Android? I think a lot of concern people have was that it was too much of an edge case that wouldn't have enough support. I don't think that's really true. We know Android cares. They already use it. In Mapo 3, the fact that, you know, that you can have a lockless lookup is kind of a fundamental part of the design and I think you have that set of developer that's also approved to care. You know, we may disagree on the details, but I think the idea that we want to do lockless falls I think is fairly common right now and I just want to find a way to move forward with it. And hopefully it will not take us another 10 years to get to the implementation. Yeah, it will probably take some time, but the primary question is, are there any workloads that benefit from SPF and they wouldn't benefit from the Mapo 3 and whatever kind of solution on top of that would be a problem for those? Because I don't know whether we want to have both because both are really large and from the maintenance point of view a huge burden. So do we want to have SPF just to rip it out as soon as Mapo 3 is merged? I don't say there's a conflict in that way. So yes, you can have Mapo 3 with the current and map lock locking. You're not really getting some of the benefits of Mapo 3 that they want to have a lockless lookup if you do it that way. So if you look at it that way then Mapo 3 is really just a more efficient arbitrary, but... So I don't see it as a conflict. I think if you look both at what Matthew is looking and what I'm looking at, it's very similar steps that have to be done in the page vault. I don't think we should look at them as one or the other. One question is, we were talking earlier about delaying some of the lockless pieces out of Mapo 3 possibly for some time. Having SPF in addition would allow you to get at least some of those benefits earlier rather than later. It's not a free ticket, sure it's not, but it is a ticket. I think there's also some things that you only discover while trying to do the patch. For example, one that we found was the conflict with, well, that we had to deal with MMU notifiers and even if there's no notifier you have to take some sort of look to know that there won't be one being installed while you do your lockless page vault. It's a lot of details like that that have to be dealt with. Really there will be the same whether we do SPF or we do it the Mapo 3 way. The set of issues that has to be looked at is pretty much the same. We can argue about details of how we're going to implement it, but it's really the same basic issues that have to be solved either way. Just to note that SPF has been back ported for a number of years starting when Peter Z came up with his original implementation, Android, vendors, independently from each other started back porting it and seeing the benefits. It's pretty widely tested on a lot of devices. Even the latest version already started being tested by a number of vendors. From that point of view, it's at least it's not like a very fresh thing. We have tested this for a number of years or on a very large number of devices, mostly on ARM but also on X86 emulators. Another point I want to make is that the previous iteration of SPF, if you did PFT on a multi-circuit platform you would hit the performance issues that were with ASRCU. In the previous iteration, if you did PFT micro benchmarks, you would see a regression. While the way I do it, you actually see a pretty significant improvement on multi-circuit platforms because you don't have any shared cache line contention. That's honestly I think talking about micro benchmark optimizations like PFT is not that interesting a workload to me. But that was one thing that people were concerned about previously that's really the opposite now. So with the Android builds though, you do use Clang so the VMAs are a bit different. You have a guard VMA or something so it's a bit of a different workload than what we would experience in Linux externally, right? Yeah, true. I mean there could be some differences. If I'm not too confused, the Power Proceed testing is GCC, right? Yeah, that's another point that we do test number of configurations on number of architectures, XHC and ARM with different compilers. So that I'm not saying it's as tested as our mainline configuration but there is some tests that that are run in those configurations. So do you see a difference between using the different compilers with your guard VMAs at all? We haven't tried probably running performance tests because those additional tests that we run is mostly forced to make sure that it's stable. And running performance tests on a emulator is kind of a niffy proposition you start with. We don't do the performance tests on, as far as I know we don't run performance tests on other architectures or with other configurations with other compilers but we do run the stability tests for those. I was just wondering if the guard VMAs affected at all, that's all. Well, I can check for sure and come back with the answer. So the test in Android I presume it's pretty much the same pattern you start up Dalvik and start running the Android applications. So how does it compare to say server workload and from the stability perspective? The changes in the virtual memory layout and devices here and there. So how is Dalvik representative of more or less generic case? So if I understand correctly you want to understand how Dalvik compares with like Java standards environment or? There's been some testing by Laurent Dufour who you may know using a certain large commercial database on a PowerPC platform, very large PowerPC platform. Okay, so that would be an example of the other, right? My question was more like how much Dalvik Android stability is representative of a more general case. So you said it pretty much tested but you tested more or less a single use case, right? Yeah, of course. Yeah, I'm not saying that we have bulletproof solution here. I'm just saying that this is a solution which has been tested on number of devices in number of configurations and sure it's not upstream until it's upstream you can't say it's tested, right? So when it's upstream we will find out all the issues that hopefully will find out all the issues that come out but that's that's a chicken neck problem. You won't get the testing until you get at least into some stable trees and you won't get into some stable trees until you get reviewers and acts. So yeah, any more questions? Thank you very much.