 Good morning, everyone. My name is Michel and I work for Pengatronics at the Graphics Deck. So I will be talking today about the modern Linux Graphics Deck on embedded systems. For that, we will first have a look at typical modern Linux user interface. So that's a modern user interface. Just kidding. Usually, modern interfaces look like this, KDE, the Plasma Window Manager, or GNOME with all different kinds of applications. And that's a typical desktop use case. On the desktop, you just choose some desktop environment like KDE or GNOME, install it, and your graphical user interface just works. I put this in quotes, but it just works. So what is different in embedded systems? And that's what we will or I will be talking today. The agenda is like follows. I will start with a look at the modern Linux Graphics Deck. Then especially on graphics in embedded systems. And what's so special about it? And finally, on Western on embedded systems, on embedded graphics. So let's start with the Linux Graphics Deck. Usually it starts with a windowing system. You have a display server that takes care of the display and manages multiple applications, windows. And these applications talk to the display server using specific protocols. This has been for 30 years or even more, the X11 protocol. I just put the XORG here because it's the most commonly used implementation, but it's the X11 protocol. In a more modern use case, you will use Wayland for that. And the remainder of the talk will be more about Wayland. So as I said, it starts with your display server. In Wayland, it's called Wayland Compositor. The UI of the Compositor is usually called Shell. So we have the Wayland Compositor here. On top of that, we have multiple applications that use the protocols. In Wayland, a typical protocol is the XTG Shell protocol, which allows to create windows and has this different functionality, like resizing, maximizing, dragging, and all these things you would expect from your desktop. You can have different applications and different graphic toolkits for that. For example, Qt and GTK both support the Wayland protocol, especially the XTG Shell protocol and Chromium as well. So if you are using an HTML application, you can run it on Chromium on Wayland. Below that, your Compositor somehow has to use the display. For that, it uses libDRM, some DRM driver in the kernel, and that drives the display. So exactly. Here comes the DRM, and especially KMS, into play. Wayland uses OpenGL for compositing the different applications together. For that, it uses Mesa and some GPU driver, and actually GPU hardware below that to make it really fast. It can also add animations and transparency, so your windows can pop in or burn if you close them. So that's more or less the graphics stack. If your client uses OpenGL as well, it bypasses the Compositor and uses Mesa directly. That's something that's different in Wayland than it was in X11. So the output of your client belongs to the client. It renders into it and passes this output to the Compositor, which then creates the final frame. So as final overview, we have the clients that render the surface, the Compositor composes it together, and then forwards it to the display driver, which displays it. Any questions so far? I hope not. So now if you go to embedded, what is different about embedded, Maxime already said some points of these. So we have limited hardware resources, especially we have a limited memory bandwidth, and all the different hardware resources use the memory. So the more hardware resources we are using, the less memory bandwidth we have left. So that's really the constraining factor. And in many cases, we have limited power, limited power, so we are battery driven. So usually you have different hardware accelerators for your graphics, like GPUs, video decoders, and display. So the GPU for 3D hardware acceleration, display controllers, here you have especially different overlays, hardware overlays, which can be used. So you don't have to composite in your Compositor, but you just put your output to different overlays and it's composited in your display controller. And you have these hardware video encoders and decoders to not put that much, to not use your CPU for that and waste energy. If you buy an SoC, usually you get IP cores from different vendors. So your SoC vendor buys IP cores from different IP vendors, puts them together in a SoC, and uses them. That leads to maybe incompatible IP cores. So one outputs memory that is specifically aligned, which is not compatible with another IP core, or uses a special tiling or a proprietary tiling. And then you sometimes even have special hardware units for translating between the tiling of one IP core to another, or between pixel format from one IP core to another. So we have all these peculiarities of the embedded hardware. Now if you take an application developer or designer that is used to desktop development, it's used to just develop on the desktop, see a window of the final application. And he's not aware of the hardware limitations. Hopefully you talk to him that he should be aware, but not necessarily the case. So can we somehow achieve the convenience of developing an application on a desktop for embedded systems? For that, our graphics stack must optimally use the graphics hardware, while the drivers support or provide proper abstractions of this hardware. The user space uses it, but your application developer does not need to care or does not have to care about all of this. And now I will have a look at the abstractions that the drivers provide for user space. So there is the Linux DMA buff framework. The DMA buffs allow to share buffers between different drivers. So one driver exports the buffer, another one imports it, and this allows to avoid copies between the client and the compositor. And this is really important to save memory bandwidth. A second abstraction is the atomic mode setting. If you want to use overlay planes of your hardware and you put your frame on one after the other, you might get in between states where you have the old state on one plane and the new state on the other plane, so you want to change all at once, and that's possible with atomic mode setting. And this allows for the compositor to really make use of all the different hardware planes because it just sets up the new state and sends it to the display hardware and it's there in one instance. The third is video and pixel formats. So usually in video decoders you have a YUV format, which is completely different from RGB formats that are typically used by display units. So if you render, you usually convert these formats to something that the display understands, but some display controllers might even understand YUV formats or even different custom formats. Like I said before, IP cores understand strange formats or your hardware has some converter built in to translate this. So this is also provided by the driver, so drivers tell you what they support. And furthermore, you have tiling formats which allow to save memory bandwidth by reordering the pixels of your frame and having a better cache access. This is described by format modifiers and this allows to share buffers with a specific tiling between hardware units. So if your video decoder produces some proprietary tiling format and your driver provides a modifier for that and your display controller provides a modifier for that it understands this proprietary tiling format, Western might be able to just pass the buffer to the display controller. So from the driver side, we have all these features, but does user space actually use these features? For that, we will have a look at Western. Western is the reference implementation of Wayland compositor and as we see here in the readme it explicitly states that it's useful for embedded industrial applications and for kiosks. So that might be a useful application that we can use in our embedded system. Furthermore, if you look into Western, we see the DRM backend which uses the Linux KMS API. So it's exactly using the displays or the DRM drivers for the displays. And if we look into the main page for that, it's again stating that it's able to use hardware overlays and what's very interesting is that full screen surfaces will be scanned out directly without compositing. So we might even be able to save using or to avoid using the GPU for our compositing here, which again saves memory bandwidth. So now I will change the pace a bit so don't be scared of the slides. There will be code on it. You don't have to read it. You don't have to understand it. If you download the slides from the thingy, you can have a look at it, look for the functions and read it by yourself in the code. So that's much better than trying to follow me here. So first, if you prepare the planes, so that's when the compositor asks the display hardware controller for the supported planes, we can see that it does a DRM get property blob and reads formats and modifiers from the blob. So it's aware of what the display controller is supporting. The second step is the rendering. So we have the rendering separated from our flushing to the hardware and the flushing actually does a DRM atomic, DRM mod atomic commit. So Weston uses the atomic commit API for DRM so it can directly change all the planes at once. Then I want you. You can see here at one step in the compositing, the Weston assigns different planes, assigns the plane state and it tries different modes here. Once with planes only, then it tries to use planes and compositing in the renderer and if all fails, it renders everything together. And in the proposed state, you can see here that it first tries to prepare the scan out view and afterwards the overlay and if this, so it tries to put a surface on the overlay, on the scan out and then the overlay and if this fails, it just adds it to the renderer region which will be rendered together. The final part is can we actually directly get the buffers from clients into our compositor or to flip it furthermore on the overlay? And that is done in the DRM fb get from view function. You can see that it tries to get one from, if it gets a DMA buff, it tries to get a DRM fb from that and it's all actually there. So all of this you can find in the libweston compositor DRM.c file. It's pretty large so look up these functions and they will help you through that file. So as conclusion of that, we have seen that the DMA buff import is supported by Weston. Atomic mode set is supported. We can use overlay planes because of that and we also have a format modifiers so we know that buffers have a specific tiling format. What doesn't work is if we get specifically a tiled client buffer and want to put it on an output plane, Weston currently doesn't understand this and the GBM API doesn't provide means to do so yet. So that's about how Weston uses the DRM API. As a final step, we will now look into how we build user interface with Weston. So as I said before, the user interface is defined by Weston shell. If you replace the DRM backend by the Wayland backend or if you're using an X-based system by the X backend, X11 backend, then you can develop this Weston shell on your desktop PC, which is again a point for user interface designers. And to actually test it, you have to run on your hardware and use the DRM backend for that so you won't be able to test the DRM features on your desktop PC. In order to write a Weston shell, you go to the weston.ini configuration file. There you have a shell property and there you can put your own shell as a string. Behind the string will be an executable object file, which is loaded by Weston. In this loadable object file, you implement the XDG shell protocol and then you use the Weston to actually talk to the compositor. It looks like this. So you have some application and the application speaks the XDG shell protocol with your shell and your shell then uses functions defined in libwestoncompositor.h to talk to libweston, which does all this magic stuff with the DRM. So again, a bit of code. The entry function to your object needs to be called wet shell in it. So if you have this function, this will be the first function that is called by Weston when it loads the shell. And at one point, you call weston compositor schedule repaint, then it will repaint the current state of the final frame for output. You need to do a lot of stuff in between. So if you want some examples on how to do that, you can go to the weston sources. There you have an example desktop shell implementation, which is for a desktop use case. Then you have the IVI shell implementation, which is for an in-vehicle infotainment use case. And you have the full screen shell, which is for a single application that runs in full screen. So wait. In-vehicle infotainment sounds a lot like an embedded use case. So we might just use this IVI shell for our embedded system. So the IVI shell is exactly for the embedded or HMI use case. This allows the compositor more control about the surfaces from clients. It identifies clients by a specific ID and knows which surface belongs to which client. And the problem with the IVI shell is it supports the IVI application and protocol for to be able to specify these IDs. So the XDG shell is not supported and all your common applications and all your all the graphic toolkits won't work on the IVI shell. But there is a patch set on the Wayland Devil mailing list for adding the XDG shell protocol to the IVI shell. So if you use this patch set, you can just run normal applications on the IVI shell. So that makes the IVI shell again interesting for an embedded use case. So we have a look at the IVI shell to know what we have to do. We have some IVI shell implementation which implements the shell protocol. So the IVI application protocol or the XDG shell protocol are implemented here. This again uses the IVI layout library which takes care of all the surfaces and different layers that allow you to manage your client applications. And this library talks to your compositor back end. In order to create a user interface with that, you additionally have this HMI controller. This is responsible for positioning your surfaces, how they relate to each other. And on top of that, you can have a user interface that looks like this, for example. So this is the example user interface that comes with Western IVI shell. So in order to have our own user interface, we just cut here and replace the HMI controller and this user interface and reuse this IVI layout library. This library exports API called the IVI layout. So in your HMI controller, you call this IVI layout, get API. You will get an IVI layout interface. And on this interface, you are able to create surfaces, position these surfaces, add the surfaces to various layers, and do your user interface implementation. Once you're configured your interface, you do a commit changes which will then trigger the repaint and update your frames. That's basically how you implement an HMI controller for the IVI shell. So are there any alternatives to Western? Yes, there are. There are the WL roots project and the Qt Weiland compositor, which are the most prominent from my point of view. First, a look at the WL roots project. So it's a modular Weiland compositor library. Again, it has an interesting protocol, which is the WLR layer shell unstable V1 protocol, which allows clients to specify anchors and positions on the final composited frame. So this might be also interesting for writing such a user interface for an embedded system. It's currently used by Sway, by the Fosh, for the LibreM5, and by Rootson, which is the example I'm compositor for that, and several more that I'm not aware of. And the problem with WL Roots is it does not support overlay planes yet, and it does not support the format modifiers. So it is interesting from the application developer point of view, but it's not yet suited for using it in an embedded system. And then there is the Qt Weiland compositor. It's a Qt module for developing display servers, so exactly what we need. It provides a QML interface, which makes it really simple to write a compositor, and someone who writes applications already in QML writing a compositor with QML is really easy. And therefore, as it uses QML, it's a pretty well-tested declarative technology there. The problem with the Qt Weiland compositor is it does not use atomic mode set, and therefore it cannot use overlay planes, and it cannot use the format modifiers, and it's also difficult to, or it's not that performant to use it in an embedded use case. So there come my open questions that I'm still not sure how, if there is a proper answer for that. So first, is it really useful to have a desktop environment for the UI developers of the embedded systems, or does it maybe hurt more if we make it development easy on a desktop, and then switch to an embedded system, and everything falls apart? Therefore, should the compositor really hide the hardware complexity? So if everything happens magically, generically over several systems in the compositor, the developer does not have that much control, or the developer might know that something uses less bandwidth if he does it differently. So is it really good to have everything in the compositor, or do we want to have more control from applications, furthermore in the compositor? Who decides what shall be rendered, what shall be put on the overlay planes? Is it more efficient to render everything together? Is it more efficient to have one single application on an overlay? I don't know, and I'm not sure if it can be decided for all use cases. And should we provide means for clients or client applications to decide their position, or should the compositor decide for all clients where they should be put? So if we have like a notification, and should the application that shows the notification decide where to put it, or is it the compositor who decides? I'm also not sure about that. So with that, I come to the summary. First, I looked at the Linux graphics deck, including Wayland on top, Mesa and DRM for 3D acceleration and for interfacing with display controllers. Then I especially looked at the abstraction that DRM provides for the peculiarities of the embedded hardware and how it can be efficiently used. Then how Western uses these DRM features and where you can find it in the Western code. And finally, how to build an embedded UI with Western using the, so building your own shell or using the IVI shell for that. So with that, if you have any questions, remarks, opinions, answers, then talk to me tomorrow at the Panguotronics booth, write me an email, or just find me at the conference and talk to me. I will be happy to talk about these things. So thank you for your attention. And yeah, thanks. Any questions or answers now? There are microphones on the side. Hi. Are there any backwards compatibility between different shell versions? Because you were presenting a slide of creating a new shell for Western and it has a variable name. It was named like version 6, unstable, something. I'm not sure. I don't think so, but I'm not really sure. So for the XDG shell, it's finally stable. So that's a stable protocol. I used the unstable one because it's from the examples. I copied and cut together. I don't know, actually. Okay. For example, if I have latest Western and I use an application which uses like shell protocol version to build it work. I have to try. Sorry. Okay. Thank you. Okay. Don't see any further questions. Then thanks again and have a nice lunch.