 So first, thank you everyone to come here instead of going for lunch. I know that the time is not the best, so I really appreciate everyone's there. I'm going to talk today about what we have learned when we have been developing our Enlightenment Foundation library toolkits for wearable devices and what kind of experience we have learned on optimizing for low battery usage, basically. So I will go too quickly on like background because that's not really are so interesting. I am Cedric Bay, I work for Samsung open source group on this toolkit for now a few years. This toolkit has been created for Enlightenment 17 which has been released a few years ago now and it has been in development since 97 so it's not really recent technology it was the first window manager of GNOME, it has been a pool of rights since 2001 and it has been built from the ground up from the idea that they will never be a year of the Linux desktop but that embedded was where Linux is going to shine. It was 14 years later or it's pretty accurate as maybe more than 14 years now. So we have designed everything there for embedded device and when we started there was no toolkits that did match this goal and need and today we still do think that our Enlightenment EFL library that come with Enlightenment is best targeting our embedded device. So today Enlightenment is number revision is 21 it does use EFL completely that means Syngraph, Mainloop and all the intimidation we have been putting into EFL so Enlightenment itself can be used on wearable device we do at Samsung use it for our smartwatch they are actually running X and a window manager and a composite manager which is Enlightenment. We are working quite actively on the Weyland support so it is today also a Weyland client to connect to it it has multiple backend so X, frame buffer, DRM it is highly customizable of course because it's used on so many different devices that we really need to be able to have very very different UI and our interaction with users so it requires profile module and theme to match Samsung and or need an embedded device and we're based platform now for targeting performance is Raspberry Pi 3 and the Raspberry Pi we are today are doing better on performance than our Western does so that gives you an idea of what we're targeting and we are only now going to get better from there. So what is this Enlightenment Foundation library? The community spent a decade writing a modern graphic toolkit it's a mix and match of LGPL and BSD license there is nobody that owns all of it or nobody can change the license so that's what you get today and it's not going to change it's highly focused on embedded as I've been saying until now and it's really where it makes sense. The first release of EFL itself was in 2011 so 10 years after the development started basically we are trying to have a stable long-term API and ABI we are for every release doing ABI report checking that we don't break things or bugs may happen but we do take or our ABI and API stability quite highly and we are now on a three months release cycle we're trying to keep that it's not so easy but that's like our goal and where we are what we are doing so the state of EFL today as it was written for Enlightenment and the window manager it has evolved to actually write any kind of application it has its own rendering library its own scene graph on top of it it's really optimized for reducing CPU GPU memory and battery usage that's a that's a very core focus on EFL but still we have full support for international language so ETF8 or left to right right to left writings that's like that's a given it's necessary in any modern toolkit we do also have the ability to change the scale of the UI depending on the input size so that your button are always something you can click on with your finger we also scale depending the font size and every readable element depending on the DPI of the screen so all of that is taken into account when we push something to the screen we support also accessibility we have our full timable toolkit so you don't need to rewrite your own button you just provide a theme for your button for the whatever device you are doing which is what you want when you do embedded device you really want to customize a UI to fit your needs so the goal is that if you are running an EFL application when we look even if we are making the toolkit and we look at the screen we should not be able to tell if it's EFL or not because you should be able to change everything you want on the visual or very easily the minimal set of our dependency for the full toolkit will lead to 8 megabyte library and that's like including all the widgets or even a small capability to display videos and stuff like that some people have pushed it further but that's less functional so the reason why we care for our optimization so much the first thing that Moore's law doesn't apply on battery your battery doesn't get like twice the capacity every two years that never happened I mean it will be amazing if it does that's not the case our memory bandwidth also doesn't increase following Moore's law so that's also another problem mostly because all rendering operation are constrained by memory bandwidth so that's constrained also what we can do and anyway Moore's law is kind of slowing down now so we should not bet so much more on Moore's law for the long term anyway also as we focus on a number of devices this CPU also don't get more general purpose power they get more IP block around that do some specific tasks like decoding a jpeg or just doing something very specific and taking advantage of that usually reduce your energy consumption but doesn't give you much more performance so there is all of that to take into account from all point of view as doing number the device toolkit also many of the number the device that we are working with have actually less memory than a low end phone you buy a low end phone for $70 you can buy an oven or microwave for $100 if you did expect to have the same CPU inside that oven it's not going to happen so we have device like refrigerator oven dishwasher washing machine all automation but actually have less CPU less memory than a phone so we have to take that into account and even on a low end phone your native application if they spare memory for the web runtime because people love to go on website it's always best because web browser are like memory org they just consume so much memory so having the native application not taking memory for you from there is also something where we take care of and in general if your application is optimized for this kind of environment it will get better for the user experience and it also increases multitasking so on Android you can have notice when especially when you are on the lower end when you switch from one application to the other it's usually the application is fully restarting every time because there is not enough memory to actually run the application so multitasking is a really poor experience on the lower end if your application are optimized for for memory CPU and all that you end up being able to not kill the application in the background and just keep them running so OpenGL for example is a big user of memory it's actually very common to have 40 megabytes of memory just taken by your OpenGL context so when you are doing a phone with 512 megabyte you end up with like not so many applications you can run so that's why we have a software engine for most applications because you don't need OpenGL for just a button to click so that gives you the idea of why we care for optimization so current state of optimization in EFL is mostly driven by the screen size and your PIM so the bigger your screen the more you need PIX map and stuff to put on screen so the more memory you need that's the first kind of driving force on your toolkit the other thing is we are currently doing better than Android or that's why Samsung is using Tizen on smartwatch it's because we have a better battery life on it but if we were on using Android wearable or and that's a side effect on actually of EFL and enlightenment being more efficient on on the end of the usage of of the for doing the same thing basically yeah as I say we fit in 8 megabytes you can actually have the scene graph the main loop and the very minimum set of like graphic primitive for one megabyte but you don't have a toolkit for that memory there is no requirement on GPU and you can actually make a desktop run in 48 megabyte of RAM at 300 megahertz with one core that's you can run enlightenment in that context you will be able to run one application but you can as a desktop profile it does work and there is no trouble for that in software with no GPU so that gives you an idea of where we are today with EFL so energy efficiency why well the first thing is gives the most of your battery life I mean if you watch as 10 hour of battery life or 13 hour of battery life it make a difference with 13 hour you will be going to bed and your watch will be still usable with 10 hour your battery ran out before you go to bed pretty much so being more efficient the battery usage give your device a longer and more usable case the second one is also if you are more efficient on any other usage you are dissipating less heat so you don't have all the problem of like burning people which sometimes we make mistakes there but that's that's something we need to care about it does allow for thinner device because well you can actually have the same battery life but you can use a smaller battery than a competing device and you have a thinner device and so designer also like to go this way too oh so that's just what I'm going to say a designer like to have more freedom on like what kind of package it can go with and in general electric device today are one of the growing our energy consumer in house so keeping that under control is kind of good I think so how do we do that well it actually happened that when you optimize for classical things like speed and memory and network use you actually already start to optimize for battery I will just cover after that so like what we do specifically to optimize for battery usage and there is also actually a thing that you can do into your visual design to optimize your battery use it's kind of interesting to to see down to or what you display on screen as an effect which is kind of abuse when you say it but you don't maybe realize it so the first thing is that if you optimize for speed it's basically mean that you are doing things more efficiently so you're avoiding unnecessary computation and obviously unnecessary computation does consume energy so it's better to optimize for speed as a starting point because that's like the easiest thing it also apply even to GPU so GPU are really fast at doing things so you may not notice because you already have your 60 frame per second but optimizing the usage of the GPU is saving battery if you you will still have your 60 frame per second but if you'd only update on screen what has changed since the last frame well you are saving energy there if you only try your animation at the speed of your display it's obviously something also that will save energy because you don't waste time doing something that is not necessary and in general GPU the simpler the GPU shader you do all the better it is for your battery usage because well as I say for the CPU before if you don't do something necessary that's it's better for energy efficiency all that is really classical optimization work all what is interesting with that is that we already have a lot of debugging tool for doing benchmark and to analyze where we are spending too much time on their performance side so you can use like cold grind you can use perf to do all this optimization without having the need to any special tool and try to measure energy at the output of your device if you start with optimizing for speed you will be gaining energy efficiency so yeah we do use actually the Raspberry Pi for as a target for speed optimization because if we are able to run fine and fast on the Raspberry Pi it gives us also a good in that we are fine on the battery usage we don't need to have like use a measurement for battery usage all the time because that's kind of a cumbersome thing to do for then the next step for optimizing it's to optimize for memory usage oh that's this one is not so obvious but it actually really helped to be using less memory one of the thing that's your memory accessing memory main memory is costing more energy than accessing the CPU cash so every time you fetch something from the main memory you consume more energy than if it was in your CPU so when you have a smaller application you have more chance to be in cash so saving memory is actually something that also save energy that's something that people don't really see the link first but once you explain it I think it's pretty obvious that if you don't use a necessary memory you are going to be more efficient in energy so here again it's classical improvement that you do you improve cash locality you do linear access instead of random access and we have developed for EFL a technique that is a kind of copy and write in user space so we do duplicate most of the object on screen have a lot of property that are the same it's really like there is really a common thing on the screen so by taking advantage of that we are able to reduce by 5% or by 10% or memory usage and this lead actually to a 5% speed increase well because cash usage was improved and side effect it does also improve our energy efficiency overall so this logic is or goes pretty well there again it's very nice because optimizing for memory use there is a lot of tool that does great things so you can actually run yourself are massive to actually know where you are spending too using too much memory and try to cut that so that's a next tool to use to bring your energy efficiency up the next one is network use so most application today are actually connected and so the network use is quite important and as you send more data over the network it's more likely to have lost packet and lost packet is again it's a unnecessary loss but you have to re-emit it so the more data you transmit on the network the more likely you are to lose data and the more likely you have to retransmit it so being efficient and sending as little data as you need over the network is quite a good optimization also to do in your application the other way around is actually for download it's where it's really tricky because download is usually an application something you do as a prefetch and you don't really know if the user is going to need all your download so you may over download something and that's where it's kind of tricky because you you really have to find buyer between prefetching too much and being wrong into what the user need and waste battery on that are too little and having a poor user experience so this one is difficult to actually are like give you like strong rules on what to do it's something that has to be adjusted by every application and it's where you will be losing energy and efficiency still there is some some few things you can do is group all your download together and do go back into doing no wireless for as long as you can because wireless protocol usually have an energy mode where they do all the transmission and they have a suspend mode or like idle mode which is really low energy and you want to be able to go back into that mode as soon as possible so if you do like ping the network every millisecond you are going to screw up on that and it's never going to go down in energy usage the issue with that is there is no demon or anywhere in the system anything that kind of synchronize all applications so on multitask situation there is nothing you can do today to reduce your network usage and your battery waste there it's kind of sorry that's how it is so the next step for battery usage and to optimize there is to rely on the canal actually because the canal is the one which which shows the clock and the voltage of your CPU and it does that by trying to figure out what you are going to do so it's it's kind of a hard task because the canal has no idea what you are doing and is like putting his finger in the air and trying to see where the wind is going and coming and it's basically a miracle and the thing that for years that's failed pretty badly so that's why there is this energy aware scheduler work that is coming on the canal which try to improve all this situation and make the scheduler the CPU frequency A driver and the CPU idler driver work all together to actually get things right but even then you need to fix user space because the canal is only looking at the history of your task and if your task is doing random things in the past I mean if it's doing waiting on IO then drawing a frame then waiting on some other IO and doing a small frame then do a big frame the canal has no idea of what your process is doing and it has no chance of actually choosing the right frequency and the right voltage for your CPU and that's where you will be wasting either or you will have not the right framerate you want or you will have use a memory battery usage going up because you will have a CPU running too fast for nothing to do so the scheduler is being fixed it's going into a direction where you won't be able to give a hint directly to the canal by saying oh now I'm doing rendering now I'm doing this interactive task it will be our bookkeeping per process and there is most likely going to be our new schedule deadline infrastructure that gives the possibility for interactive tasks to actually be properly scheduled by the system and so the idea that the user space has to break things apart and have process thread all that are actually dedicated to one things so that's where we're going so I'm going to quickly go over what is a scene graph because it's actually where the main optimization is going so the scene graph is basically a map a graph of everything you have to draw on screen it's not so immediate rendering it's not the code that will draw things on screen it's just a bookkeeping of everything that need to be displayed on screen and how to display them it does allow a general view of your application from the inside of the toolkit and as it has a global view of the thing it gives it the possibility to do global optimization so it can deduplicate data it can cut out things that don't need to be displayed on screen it can limit texture and a shader switch it can also properly schedule all the rendering tasks later on because it knows exactly what needs to be drawn on screen so EFL first scene graph that was 10 years ago uh well we didn't fall seeing that that it was going to be multicore in embedded device I mean 10 years ago there was not even multicore on your desktop so we did start with one main thread one main loop and everything was into that main loop and you have a bunch of tasks which have different property so you will have comprehensive tasks which will be preparing the data to actually know where to draw things on screen you will have memory bound operation which are actually all the blitz are mostly memory bound and you have walking all over the place code which is for layouting and preparing all the scene graph rendering so all of that was in the main loop we have seen evolved to this kind of case where we have two threads basically and we start drawing in the threads apparently it's it's actually quite good with where things are going but still the kernel may have a hard time figuring out what is going on there with the main loop and we can actually help it more by grouping things together better so we are going to this direction where we are grouping all the span line computation which is some super intensive tasks for your shape for your rectangle and everything to be moved are into specific thread that will be dedicated to a super intensive task all the memory the memory bound drawing operation that does the blitz and or just a mem copy there will be in their own thread this one doesn't need to actually be duplicated because it's a memory bound so if it's memory bound having two thread memory bound make no sense so that's why there is only one green box there but the yellow is something that should be scaling quite fine by the number of core you have on the system and that's pretty much where we're trying to go and that's what the main cost on computation for user interface toolkit is today our most application don't do much they just like fetch data from the network from a database do some stuff display some and they change the UI and then that's a UI that does all the CPU the GPU intensive task so optimizing that part only is actually saving a lot a lot of battery usage and energy efficiency into the system so the main price that we are going to pay with this move is that it does increase memory usage because we need more thread and every thread has their own stack there is there is an increase in complexity that goes with it or it's more difficult to get right I mean race condition and all this kind of new bugs that show up because you're doing thread it's not really like new things but it does come with a cost to go with this kind of architecture the thing that we're trying to do is that we're trying to put all these our risky things are inside the toolkit and we still push for application to not be multi-thread to have as much helper in the in the toolkit that you don't need to write your own database processing or in the thread because we already have a data model that does connect to your database correctly from a thread and doesn't have any trouble there the reason why we need to use thread and we don't have like a int to get to the canal is because obviously the canal cannot trust user space application even the schedule line API required to be root so it will be a system demon that will actually say okay this application this thread of this application is actually the one that should be the rendering strider so there is a a big part of the system that is not there yet there is the schedule line is not upstream there is no demon in the system to do so but at the same time building all the thread infrastructure in the toolkit is already taking us quite some time anyway so by the time we are done there all they may be done the system side so it's something that you have to we have to take that now and to make it now and what is also interesting is the same kind of our pipeline actually work quite well with Vulkan so you will not do exactly the same computation in the yellow boxes but it's maybe actually something that you will be doing our in parallel to build all the queue the command queue that you are going to push to Vulkan so going into that direction for us is actually something that should enable us to be able to have the same architecture for Vulkan rendering it doesn't work well with OpenGL obviously because OpenGL and thread okay I have been going like really fast on this I'm guessing I'm kind of angry and I want food or something like that if you have any question just like you should have raised your hand so the last thing and last optimization which is I find it quite funny is optimizing your visual design it's something that's well your designer don't want to hear about but if your screen is black on you have an emerald screen well it doesn't consume any energy so black design save a lot it does save so much that actually it's on an amoled device it will be way bigger win than any of the optimization I have talked before so that's that's something to be said of course people like to have white things so I mean it's a trend so let's waste energy there anyway all the things is as I've been saying we can do partial update and it does help on energy saving but if your designer love to have full screen animation think like slide from the left to right up and down well that's a full screen animation that means no partial updates so full screen energy use some screen also have a integrated frame buffer so that's something that we have in many watch actually is that there is a small frame buffer inside the screen of the smartwatch itself and that can like keep usually gray level of display in the screen itself that does enable to just like completely suspend the system there the sock and save quite a lot of battery too same story if you are just displaying a watch it works well but if someone stopped to want to have blinky things of different color it doesn't work well so that's a another nice optimization there and also it's again it's kind of obvious but your designer is having fun in Photoshop and it adds this 20-layer user interface with all this bitmap and all this transparency and stuff going on and it's it's quite complex and the more complex is all your layers and there's nothing you can do on the toolkit there to reduce the number of layers you want to draw on screen so I think simpler design is actually more efficient it's kind of an obvious thing to say that it's something to keep in mind and well if the simplest design you could think of it's rectangle and vertical gradient because vertical gradient it's just a series of memset that's pretty nice to optimize and a gradient is just and the rectangle is just one big memset so that's maybe the easiest design like pushing the idea to the limit because obviously you don't want to do that but I give an idea of like sometimes the biggest win is on the design itself or because any of this optimization is basically going to give much more gain than any of the thing I speak before so that's kind of a sorry state I guess but it kind of makes sense I mean if you need to do something it's kind of forcing you to spend the energy to do it if you don't need to do it that's a save and as I've been going through things quite quickly today I am actually done so if you have any question or anything just raise your hands otherwise could be lunch time yeah sure can you repeat oh yeah so the question is can I elaborate on the copy and write technique all that we use in EFL so we have a two macro system where you write you in your structure you just actually write the subfield as const and we have a macro that will actually duplicate when you start writing on it and there is a macro to close so it gives you a field that you can actually write in it and at the end of the macro it actually gives you a chance to copy back that in memory or to find another field that has the same value in another object and use that instead and it will be reusing a pointer that was already allocated for that purpose are deduplicating like right on the flight so all read access are done like normal code because it's just a const pointer that is the normal pointer to your structure but all the write access have to be put inside this two macro and it's it worked quite well for toolkit because we have a lot of property that I accessed all over the place in read mode and we have been like like one or two spots where you do actually write things back so it doesn't complexify or cut too much and it gives us quite some benefits so the problem with Raspbian and most old Debian is that our package and dependency are quite old so the easiest setup is actually to go with an Arch Linux so there is Arch Linux from Raspberry Pi and that's quite the easiest way to go because all the dependencies are like latest things you can even get the kit when you want so especially when we are talking about Welan, Welan requires to be really on the edge it's not really like the release are not really that completely usable so the more you are on the edge the better it gets so yeah I will come on to go this way yeah so yes so EFL is right now they are using it as the backend toolkit of so .NET is just using binding to EFL and they are as far as I know we're writing the C sharp forms I don't remember what the names are using EFL below but yeah that's that's that's far as I know I'm I'm just doing the support for the library there so I don't exactly know where they are going where they are but that's what they were trying to do so there have been in the past quite a few user of EFL in microcontroller the latest example I remember was a kind of a GPS device but that give you in Europe where they are fixed radar and are and they were using EFL on it it was a 16 megabyte RAM device with no execute in place and some kind of cortex a small cortex with no MMU and basically the full system with EFL and the graphics was 8 megabytes and the application had eight more megabytes but there were some voice recognition included in that so yeah any other question okay thank you very much and bon appetit