 I� recommend it. I'm Lleveen Bartier from SAMMC Research UK I'm going to present our project which is running a dual android on a Nexus 10 device I have a few Colleagues here through Niverson Craig who can help me answer difficult questions What I will do is I will introduce the project because I know this is quite different to the usual Zen presentation Then I will get on quickly to a video to show it really running for real Rwy'n credu fyddi gyda'r defnyddio'r gyflaenau a'r wrth i'w ddod y gallwn gwneud y mynd i'w'r gweithio'r wneud. Rwy'n credu'r gweithio'r gweithio'r gweithio'r gweithio a'r cyfeithio'r gweithio'r gweithio'r gweithio. Rwy'n credu'r gweithio'r gweithio'r gweithio'r projeg, rydyn ni'n dda i'r Android o'r Nexus 10 o Zen on Arm ond Zen on Arm with a real good user experience. Zen is for servers and we wanted to see if it can run on mobile and Android uses the GPU heavily for all of its graphics and scrolling and menus. So the big challenge is to virtualize the GPU between both domains. The Nexus 10 device we used, this is a commodity hardware and a commercially available hardware. In this case, unmodified hardware, you can just buy it from Google Play. It has two Cortex A15s, so it's running Zen on Arm as Stefano just said. The GPU is a Mali T604, so it's an ARM GPU. The ARM ecosystem requires many different vendors or has many different GPU vendors. So it's important that we support all of the GPU vendors for us to be successful. Memory is a constraint, so we have two gigabytes of RAM on the Nexus 10. I say plenty of flash, that means that it's plenty to run to Android, not plenty in terms of the server world of terabytes. The screen resolution I mentioned here, this is very high screen resolution on the Nexus 10. This is like four times the resolution of a smartphone. So when we virtualize the GPU and get the second Android running, we are stressing our system. It's kind of like the worst case. So how did we do this in software? So we have two VMs running in Zen. DOM 0 is amalgamated with the first Android and then DOM U is our second Android. So that's it, no more VMs. Linux 3.4, this is the standard Android Linux 3.4 and Android Jelly Bean we're running. The Zen that we're using is a parallel development done in Samsung. It's based on the Zen on ARM A9 code that we have released previously. And this is modified to work with the ARM A15 chip set. So it's a parallel development, not just for us in the UK but also in headquarters in Korea. I've identified a few key parameters here, which is important. So the way we split the two gigabytes of RAM, we split it equally one gigabyte for each Android and that's fixed. So there's no ballooning, no sharing, it's just fixed memory. SMP is enabled, Zen runs in dual core, DOM 0 runs as dual core, but for now, for this demo at least at this stage of the project, DOM U, only single core. And the IO, the IO is the big challenge here, DOM 0 has passed through, so we expect good performance from DOM 0 for all of the IO graphics and display, but DOM U uses PV drivers and that's where the challenge comes. And here is the architecture, I think it's PVH architecture but maybe George can tell me if I'm wrong. So we use the conventional Zen architecture here. We have Zen on ARM running in height mode in PL2 and actually this Zen on ARM is not that different. I mean, I know it's a parallel development, but we haven't had to make many changes to core Zen. It's just the architecture specific things that we did differently. So we have Linux 3.4 DOM 0, which has all of the native drivers and that's the display, the touch. Two versions of Android Jelly Bean mostly unmodified, the kernels are unmodified, so we use the hardware support in Cortex A15. And the big challenge here is the front end and back end drivers, because that's where we put all of our effort in. So to get the virtualized graphics from DOM U Android, we have to take all of the GPU packets that it wants to send to the GPU, send them across the rings into the DOM 0 back end and then into the graphics driver. So that's the challenge that we faced. I'll explain why and you can see the results of this soon. Okay, so just a quick overview of mobile virtualization specifically. So this is slightly different to the Zen on ARM for servers. This is Zen on ARM for mobile. Thereabouts have about 50 device drivers in there. I've listed some of the important ones, everything from gyros to compasses from GPU to USB. Of those, the hardest is the GPU. And the reason is, on ARM, we have to be portable. We have to support multiple GPU vendors, which means we are para-virtualizing above the GPU driver layer. We can't be specific to any one GPU vendor's driver. That makes the project complicated and large and its performance critical. If we suffer, the user will see it straight away. So those are why the GPU is a particular challenge. And this project is really focused on the GPU. We haven't done all of the other drivers and I won't be demonstrating any of the other drivers. It's really about the GPU. And on mobile device, we need to talk about the user switching. I want to do this before the video so you understand what's going on. The Zen scheduler runs as normal, switching between the two domains. In our case, for this video, we chose a tick of 3ms switch. That's to achieve a 60 frames per second. But actually, I think we can get away with the regular 30ms. But that's future work. The Zen scheduler runs as normal. The GPU is shared, time-sliced as normal. The GPU just receives the commands from both and just processes and renders them. So both Android are rendering their output at the same time to its buffer. But the user cannot see both androids and we have to interact with only one. So we have what we call a switching where we then only display the output for whichever Android the user has chosen. And we send the touch events to that Android. So at the bottom, you can see two Android. One's produced its blue wallpaper, one's produced its orange wallpaper. They both render those off-screen to their own buffers. And depending on the user, the one on the right is the one that you will see. And that's what I will show you in the video coming up. So in the video, I've chosen DOM0 to have a blue wallpaper. So just to remind you this has passed through drivers. The performance will be good and we expect it to be good if you like it's the reference. DOMU has an orange wallpaper that uses para-virtualised drivers for both the... well, all of our drivers, but display, GPU and block driver, the main ones that we did here. I can switch between the two using two ways. There's an icon, as you can see in red. And if I touch that icon, it will switch the other domain. Or I can use the volume key. We just trap the volume key press and initiate a domain swap. For commercial use, we can decide how we actually want to do it, but for this demo, those are the two ways I swap. So I'm going to show switching between Android using both the icon and the volume key. I'm going to do some real-world use cases, which is playing Angry Birds. And a little 3D benchmark clip. So this is just to show that when you run a benchmark, there is a lot of GPU data running in the benchmark in DOMU that we need to get across the rings into DOM0, into the GPU, get it processed and onto the display at a very high frame rate. That's the big challenge. OK. Just before we start, I will set the scene. This is important here. Both androids have booted. So DOM0 is just idle, and I will show you the performance of DOM0 so you have it as reference. And then I will switch to DOMU, and then we'll play Angry Birds. Angry Birds has already started in DOMU just to save time because I know we're time limited here. OK. There is annotation, so I don't need to talk. So this is DOM0. So Android is using the GPU for even this kind of work. And here we are. This is Android DOMU. So you can see the performance is actually very good. Remember, the video is shot at 30 frames a second. In reality, it will be smoother and you can see it for yourself. So this is DOMU running. We're playing Angry Birds. All of the graphics commands are being paravirtualised across the rings into DOM0. Thank you. Thank you. But I didn't kill the little pig. OK. So I used the volume key to switch to DOM0 while Angry Birds was running in DOMU. This is to show we are running two independent Android running concurrently. So now I'm in DOM0, I've started Angry Birds in DOM0. And this is part of the Jedi. So you can see a different colour on the display. There we go. So it's clearly a different level. I'm not sure what my shot was like. OK. Now I'm going to switch to DOMU using the volume key just to show you they are both still running. So these are two Angry Birds. The GPU is processing both and I'm just merely switching the display. So DOM0 has stopped Angry Birds and DOMU. I'll stop Angry Birds and DOMU is still running as normal. So two Android running concurrently. Now just to stress the system, to show the system under real stress, I'm going to show a clip of a 3D benchmark. This is GL Bench 2.1 Egypt. So typically to give you an idea between 2,000 and 4,000 GPU packets are being sent across the rings every frame. So that's within about 30 to... of 16 to 30 milliseconds in order to achieve the 3D benchmark that you see here. So everything is being rendered by the GPU in DOM0 from DOMU. And just to prove that DOM0 is still running, I just switch back to DOM0, show the home screen is running, Android is still running in DOM0. And then we have two independent androids running concurrently. So what were the challenges in achieving that? It looks good and that's the good thing about when it works well. It just looks like normal. So 60 frames a second means 16 milliseconds per frame that we have to render. So DOMU Android will render each surface to the GPU, send a load of commands to the GPU. Then DOM0... DOMU Android will compose all of those surfaces together. For this demonstration, we're also asking the GPU to do that work. So the GPU then composes them all into one final frame output and that is then displayed as a frame display. So that work has to complete within 60 milliseconds to achieve 60 frames a second, which is what we expect to see. The virtualized graphics options, very important for us, we do API remoting above the GPU for portability. As I said before, many different GPU vendors on ARM, we have to do that. And Android Jellybean has what they call butter smooth technology, which is some things they put in so that when you touch the screen, everything runs really smooth. And for that, they use triple buffering. So while one buffer is being displayed, there are other buffers that are being rendered ready for display in backgrounds such that we're never kind of waiting. We're never waiting, which shows up as jerkiness. We need a reliable v-sync interrupt into DOMU. I'll talk about this a little bit more in a minute. Surface fling in Android, which does the composition, actually it takes the v-sync 60 times a second and makes sure it gets on and starts rendering its next frame when it gets that v-sync. And Android has what's called deferred waiting. And that means that when you give a job to the GPU, you don't need to wait for it to finish. The GPU will return back what it calls a fence. And you can see that at the bottom. The start of the black lines are where the GPU has received the commands. And the end of the black line is when the GPU has signaled our finish rendering that. And as you can see, there's quite a lot of parallel activity going on. So the GPU can schedule all of its jobs as normal. Android doesn't need to wait for them, only when someone needs to actually wait for the GPU to finish, which can typically be the display driver. Only then does it need to check had the GPU finished that. So it's a way of deferring the wait to the last possible moment. We had to virtualise all of this and make sure all of this worked across both domains because the GPU is in DOM0, Android is in DOMU. So these were the real challenges. What you see below, the top three frame buffers, 012, are the Android composing the final output. The bottom three are the application. In this case, this is actually just the home screen launching scrolling. The vertical lines are milliseconds, and you can see we're meeting the 15 milliseconds frame. So this is just scrolling Android, the menus. And our results. So I showed you some of the results, but here's the results table. On the left, you'll see real world use cases. DOM0 in blue, DOMU in red. So DOM0, we expect it to be good because it's passed through. DOMU is all para-virtualised drivers. So for real world use cases, we've got 60 frames a second. The user won't see any difference. But as we get into the benchmarks, we sort of don't quite hit the same frame rate as DOM0. We're still better than 70% on all of these, which is very good. The green line shows the number of GPU packets going across the rings. So for real world use cases on the left, it's typically less than 20,000 per second, which is about 500 per frame. But as you see, the GL benchmark that I showed you earlier, where we could only hit 34 frames a second, 140,000 packets per second, or 2,000 to 4,000 packets every frame, we have to transfer between DOMU and DOM0, get it to the GPU, get it rendered on the screen. And that's the challenge that we faced. So when we still made 70% of the success, and the event channel, I talked about having a reliable v-sync, so we measured the v-sync arrival that goes into DOM0, we sent it across the event channel into DOMU. And this is a clip of about 10 seconds of benchmark running, real benchmark running, this is 600 v-syncs, and we measured how long the previous v-sync, how long it took for the next v-sync to arrive. Most of them are within 16.6 milliseconds, as you can see, centered around 16 milliseconds with a small delay, which is fine, we can handle the jitter, jitter isn't the problem. But a couple of them, less than 0.6 of a percent, we do appear to be missing. Now this could be our code base, this could be an inherent problem with our system, I don't know. Further investigation is needed, I just thought I'd highlight it, that when we start stressing the system with so many packets and events going through, we are seeing a very small percentage, but then 99.4% is going through fine, so that's pretty good too. And some of our challenges on Zen specifically, so I talked about the numbers of packets that we have to get through per frame, and because it's kind of like a batch job, we used multi-pagering. I know there was some discussion yesterday about increasing the packets above 32, we've increased it enormously above 32. But because it's such a big batch that we have to get processed, it makes sense, I think. We found that granting many pages between DOMU and DOM0, and if some of those grants are freed, old grants references are reused, but the PV drivers in Zen assume linearity across, so they don't assume that old grant numbers are going to be used. If you want to grant three or four pages, they assume they're going to be allocated linearly. So this is a bug that we identified in Zen, the PV drivers, and we hope to be able to put a patch through to fix that one soon enough. Which were the PV drivers? You remember? I'll get back to you on that. So we identified, again, the RAM allocation limit of 512 megabytes today. It was not a problem. We fixed it to one gig, and we worked around this problem, but again, I hope we can either discuss this further or we can take this up. And per VM interrupts a little unreliable, .4%. So I would say, though, Zen is fine. We didn't change a lot. Most of the work was done in a PV driver, conforming to the standard architecture, as we'd expect, and Zen works perfectly well on an embedded system with high performance. It scales down nicely. Further work, we are keen, or Samsung is keen, to share the work with the community. Certainly the Zen changes are. At the moment, our PV driver that does all the hard work in the GPU on the proprietary. I'm here to show this is possible and Zen will work, and we will work with the community to try and share as much as we can. I've also shown that two Android can run on a mobile device with near native graphics performance for normal use cases, for sure. Zen runs well on mobile, on devices and tablets. Please remember the screen resolution as well. It was very high on this, so it's being stressed. The screen will be easier. By using the PV driver that we developed, no major changes to the Zen architecture was required, so we haven't broken anything. We can feed this in and it will just work as part of the normal Zen architecture. As I said, if you want to see more or try it for real, if you want to play Angry Birds at the same time, then please just come and find me anytime today and you can have a play. That's my talk. Questions? Questions. The context switching of the GPU that was managed because I don't know if this GPU has some support or awareness of two concurrent users, let's say, or you have to do all the mapping in software and save states and even resources. Between two games, the GPU will just see two clients using it. It doesn't know that we're actually taking one of those clients representing a whole different Android. So as far as the GPU is concerned, it just gets requests from different clients in the same way that you can have multiple apps on your phone and they can all use the GPU. So the GPU just sees these and it will schedule it. So when you're running to Android performance will drop because the GPU will have to time slice between them. And at the moment, the GPU isn't aware of VM specifics, so we can't raise the priority depending on what you're looking at. I think they're all treated equally. Thank you. Since Android uses OpenGL exclusively, did you investigate any of the existing OpenGL loading technologies Chromium, XDMX etc? Yes. We chose our own solution for performance reasons. Specifically to ensure that we get real near native performance so we went our own route and wrote our own. How do you ensure that remote graphics operations always apply to the correct frame buffer? That is to say that on you can't send something over the PV frame buffer that writes to Tom Zero's screen. Do you know answer this from us? So there are individual buffers for each of the surfaces. So we don't have FBDF for example. So we don't have a frame buffer that is specific to one domain. So there are graphics buffers. So we have surfaces for each VM separately and we use Ion of Android and we pass the Ion hands directly to the GPU. So we don't have frame buffers there. It's just Ion buffers. It's a little bit different. And the Ion buffers, the guest domain, is actually the Ion buffers that run into that. That is to say that because you've got a handle to that buffer and all of the GPU operations are tagged with a handle so that you know that they're constrained to the right. That's kind of what the handling works. So part of the graphics in Android or most of it is in user space I take. So did you have to modify the GPU driver to be able to cope with this or just an additional back-end limit to drive? We're using the GPU drivers and as delivered by that Google would deliver actually on the tablet. We haven't made any changes to the GPU drivers and we haven't made any changes to the GPU user land drivers either. Is it back-end in user space or in kernel space? Carry on, you're doing well. There is a minor modification to the Android user space in terms of buffer management but the majority of it is implemented via the server running in the back-end. No, I didn't show that but yes, there is some user space but we haven't modified Android. It's kind of like on the side. We wanted to keep it decoupled as much as possible. Okay, two questions. The first question is did you merge the power overhead caused by DRS? You might understand if you forward all the OpenGL operations from domain U to domain zero that involved actually a domain U to ZIN and ZIN to domain zero that involved several of RIN switch RIN switch that could have another overhead. So did you get any power overhead caused by that? Does it mean power, it means battery? Oh yeah, battery. So no, we haven't focused on battery life for this. At the moment we were trying to get the performance up to show that it's possible but there are not that many domain swaps because we've chosen our architecture and the multi-page RIN we have to minimise the domain swaps that's the key to getting the performance so that shouldn't be a problem. By synchronous memory we don't need to wait for the response so most of the GL packets you just send and forget so we can batch up a whole load of those send them across in one big go so it's really very few domain swaps are needed. Okay, thank you. The second question is that you said that the domain zero and domain U will run simultaneously, right? That means even the domain zero is in the background the application of domain zero is still running. Actually you could assist by an application in the background so that you can save more power. You could do that. That's possible. We could then further enhance it if you wanted that when you switch we can lower the priority of the background task for power for battery life reasons. Power and performance are the key to their mobile devices. I'm just curious if you have the same with the PM power management in W and thermal management completely? The power management on the demo it was not running. You just fixed the frequency as high as possible for running these tests. What about thermal did you burn a couple of Nexus 10? No. Just kidding. Do the BCPUs have affinity? For this? No. Does that improve performance? By pinning? Not pinning but just giving some preference. Implicitly we have actually done that because we've got dual call for DOM zero and a single call for DOM U. One of the calls is effectively dedicated to DOM zero in the demo that you saw. The other one, which happens to be CPU zero, is time slicing according to the Zen scheduler. Essentially we have given DOM zero priority but that's only because we're not supporting SMP in DOM U today. One more question is do the system properties in Android are synchronised or separately maintained? For example, Android has its own power mode. Guest domain has its own power mode. One Android can't get sleep mode and another can't be awakened. I think what you're going into is the battery management which is very important for mobile. Not only the battery management but also the entire system properties. Okay. So let's Renevasse, you have an answer? System properties of Android should not be a matter of concern because each one is managing its own properties even if a domain goes into sleep mode. Yes, I understand what you're saying. So the common power management is system of your property. So that's one thing that we have to work on. Okay. Okay, one last. Very good. Have you tried to do video playback? No. Video playback isn't so focused on the GPU because that's a sort of separate codec. Okay, yeah. Yes. All right. Okay. Thank you very much.