 Anyway, so today I'm going to talk about Greenfield, the in-browser valent compositor that I've been developing for about over a year now, I believe. So, who am I? My name is Erik Dereke. I'm from Belgium. So I'm basically presenting in my hometown right now. I'm self-employed. During daytime I'm not really involved with anything graphics-related, unfortunately. It's mostly Kotlin, a bit of Java. Pays the bills, right? During the night it's mostly JavaScript. See basically anything needed to get stuff working. Now, what is Greenfield? It's a valent compositor. Plain and simple. It runs in the browser. It's written in JavaScript, but unfortunately I did my best. I did my research. It's written in ES6 JavaScript, which means we have all the interesting stuff like classes, yes, classes, lambdas, constants, all the good stuff. It uses one file per class, so it's not a single, bloated file with 10,000 lines. It's nicely and cleanly structured. And it uses JS doc type comments everywhere. So at least we have some kind of type safety feeling, at least. It also uses WebAssembly. So for all the hard parts that are already solved on the native site, those libraries are compiled to WebAssembly and used inside your browser. Saves me a lot of time. So what does Greenfield offer? Well, it does not offer you a solution similar to VNC, RDP or Citrix. We will soon see what it does offer. It's not something finished, so it's very much a work in progress. Still a lot of parts need to be finished, need to be polished. So what is it? Well, it's definitely something super awesome. I mean, I wrote it after all. It's pure JavaScript. There's a wee bit of assembly, as I said, and it requires absolutely no plugins. It's all HTML5. It's all standard JavaScript libraries. A bit of cutting edge maybe here and there, but that should be okay. So what is it? Well, the main difference between your RDP and your Greenfield is it offers rendering of applications on a per application basis. So the difference with RDP or other solutions is that they create and they composite the screen remotely and then send you back the entire screen. Because Greenfield is actually a compositor, it creates the entire output inside your browser itself and every application separately sends its frames to your browser to be displayed. So in essence, what that means is that it's actually a true cloud desktop environment. Now I know cloud is a word with a lot of definitions, but for all intents and purposes it really is that. Your applications, you don't need to care where applications come from or where they run. They just basically pop up on your screen and they can even run on different hosts if you want. And it's definitely still a work in progress. Now for some basic concepts explained, I guess most of you are already familiar on what Wayland is for those who are not. Wayland is basically a display server, a window manager into one. It's a successor of X. It's a classic client server architecture. It usually runs, well it always runs on the same machine, except for this case of course. And all communication in Wayland protocol happens through binary wire protocol. The protocol itself is actually defined in XMLs. So you have a protocol generator. It reads the XMLs and it outputs a stop library, usually then further implemented in LitWail and clients and LitWail and server on the native site. It also offers you to send file descriptors over the wire. So instead of sending a huge chunk or a huge blob over the wire and copying it, it just creates a file descriptor, sends that basically gives you zero copy data transfers much faster obviously. It might be interesting to note that in Greenfield it also does support file descriptors. It does that however through a bit of special trickery called WebFD. Unfortunately I do not have the time today to go into detail about that. If you want to know more about that you can ask me after the presentation. A second important building block of Greenfield is WebRTC. Now what is WebRTC? WebRTC is basically what is the browser's solution to video conferencing. WebRTC allows you to establish a direct peer-to-peer communication between endpoints between browsers but it doesn't have to be your browser. And it does that by, they both agree on what would be the ideal form of communication and next a direct peer-to-peer connection is established. So any further communication between peer A and peer B will be a direct link and there is no intermediate in between. And this basically allows browsers to directly communicate with each other without the central server interfering. And this is also extensively used in Greenfield. So how does it work Greenfield? Well on the right side we have a general picture of the architecture. It's interesting to note that the signaling server is actually inside this little cloud as well as the server that serves your static HTML and JavaScript. So let's go over it together. On the top we have our browser compositor. So this is basically the end-users browser machine that you can see and the browser compositor at the top is built on Westfield. Westfield is a separate underlying library. It's also written in JavaScript and it basically functions as the protocol generator which I explained earlier. It generates all your stops based on the Wayland XML files. You can use vanilla Wayland XML files. There's no modification required. And those stops are then implemented by Greenfield in the browser. At the bottom we see application endpoints. Now what are application endpoints? Application endpoints run on your remote hosts. They're basically demons. And they function as a proxy compositor for which a native application can connect to. But an application endpoint is initially started. It announces itself to your browser compositor or rather to an intermediate and then to a browser compositor once it's connected and it creates a separate child process for each browser tab that is connected. It creates a separate child process which functions as a proxy compositor which means that once a Wayland application connects, it connects to this proxy compositor and all communication is forwarded through WebRTC which I just explained with the direct connection to your browser. So the intermediate server is not involved in any way when the transportation happens. The application endpoint is written in Node.js. It was mostly done because I was quite familiar with it to get quick prototyping. It's not obviously not the most performance piece of code so ideally I guess we write it in Rust or something I don't know. It also uses a fork of which lip Wayland server. The reason for that is that it needs direct access to the wire protocol which is now mostly hidden in Wayland server and we need that direct underlying wire messages so we can forward them to the browser. It also encodes application frames. So once an application wants to transfer frames to your browser it's a bit silly to transfer the whole raw image over network that would be quite slow so we want to encode that, compress that and before we send it so it's smaller and that's also done by the application endpoint. The application endpoint as stated uses WebRTC. You could also use WebSockets. The WebRTC implementation was done because it was relatively easy to do. You can also use WebSockets because WebSockets are not tied to the same origin policy as ordinary HTTP requests or at least that's what I've been told. So you could also use that say if you want to get rid of some ugly firewalling. You could also implement it in WebSocket entirely if it's feasible. The encoding of the applications frames is done in H.264 or JPEG if you want. However, keep in mind JPEG encoding is quite a bit larger than H.264 so ideally you would only want to use it on a LAN or something like that. Now here we have an overview of how a frame of an application travels from the endpoint or from the application rather entirely through the browser. As you can see on the right side we have the pipeline. There are quite a few steps involved. So I'll go over them quickly. Unfortunately not enough time to go through everything in detail. Now first thing first, when I first started to implement the remote application forwarding, the first thing that comes into mind when you think about showing an application in your browser is video stream. Browsers support video decoding internally. They do it fast. They do it efficiently. However in the Wayland protocol you need to know when a frame is decoded. When a frame is ready to be displayed and you need to have access to that frame because the Wayland protocol ties a lot extra states to that frame at the moment is going to be displayed and the current solutions or the built-in solutions in browsers do not offer that. They don't provide any callback type of when a frame is ready to be displayed or at least not in such a fine-grained solution. So I had to resolve to a custom video decoder and that's actually the WebAssembly H.264 decoder you see there. It's actually the Android H.264 decoding library. It's a simple C library actually compiled to WebAssembly and it's run inside your browser. The encoding of the applications inside your endpoints happens using gstreamer. Most of you are probably familiar with it. It's a very powerful modular, in short, awesome encoding framework. And if we go through the whole pipeline from top to bottom, what basically happens is the application tells the endpoint, I want to commit the frame. I have a frame ready. This frame is basically a handoff to a gstreamer pipeline which encodes it and there the first interesting thing happens. H.264 does not support alpha encoding. It does not support transparency. Most applications these days they have some kind of transparency implemented. So we use an OpenGL shader to split the alpha channel from the RGB cores into a YUV color space and encoded alpha channel or the alpha channel is represented as a grayscale image and both images are then encoded to H.264, one a grayscale which is actually for alpha and one that is all the opaque cores. Both are sent over network to your browser and there WebGL reconstructs those two images together to form again a transparent application frame. So first you receive the image. It's decoded. Both images are decoded. You have both images here and then you have WebGL that does the color conversion from YUV H.264 originally to RGB and it also translates the alpha channel using WebGL shader to its original alpha value. It's then put into an HTML5 image bitmap, a standard browser concept. It's then written or rather copied to an HTML5 canvas. The interesting thing to notice here is that every application on your screen actually consists of two HTML5 canvases and it uses double buffering to show you a complete and finished picture each time. The main reason for that is that because when you resize an HTML5 canvas, it clears the entire canvas. So if you have to work with only one canvas and we would update its size, you would see a flash of a white canvas before you would see the final image. And in Wail into Mantra every frame is perfect. So we obviously don't want that. So the image is copied to your HTML5 canvas back buffer. A callback is registered in the browser to wait for v-sync which is the request animation frame. And once that fires, the canvases are swapped and you have your final and complete picture in front. Once the HTML5 canvas or rather the frame or the image of the application shown on screen, the browser sends a command back to the application to say, hey, you can start processing the next frame and start sending me the next frame. And this is basically what is used in classical way to throttle the frame delivery or the frames per second of your application. Now there are a few gotchas here or rather a few interesting things to notice. This is a long pipeline and it's a slow pipeline because even if you would have instant encoding and decoding and you would want to have v-sync, you still have your network latency on a slow or network or definitely the internet that can be 40, 50 milliseconds. So in the best case scenario you would get 20 frames per second or something. So there is the possibility to actually send this frame callback to your application early on because while a frame is traveling down here and say once it's decoded here we can also already tell the application to, hey, start rendering your next frame because well the encoding step will probably take some time as well and you will probably still be busy in the pipeline before I'm finished. The difficulty with the solution is that it's hard to predict the speed of which the frame will travel through the pipeline. If you have a resizing operation where you go from a 640 to 480 resolution to a full HD then suddenly the next frame that comes in will be much slower or the other way around it would be much faster and then you would get frame drops. So currently it's mostly a theoretical optimization that you could do. Now we talked about running applications on different hosts and it's interesting to note here or rather it has interesting applications that the compositor does not really know or care where the applications come from. They can run on any different kind of host which means that they could actually also run inside your browser itself. So we just talked about all the drawbacks that you can have by sending your applications over the network. You have to encode, you have to decode, there is network latency. We can all optimize that so that it's okay to use definitely in a desktop environment. However browsers have something called web workers and a web worker is basically a separate threat or process that runs isolated JavaScript. It starts from its own file. It has no access to your DOM or to the main threat of your browser. It's a nicely isolated piece that runs inside the compositor sandbox. And it's only way of communication is through events. And we can leverage or we can use that system to actually run Weyland applications inside a web worker. So let's see. We have web workers. They have no network latency grade. They have zero copy data transfers. We can send blobs in web workers to the main threat with zero copy semantics. In short we have all the good stuff that we have in ordinary native Weyland compositor. And luckily or quite recently the browser started transmitting off-screen WebGL which means we can actually render using WebGL inside a web worker off-screen and use the zero copy semantics to transfer that WebGL rendered image to our compositor. The drawback is that well you probably don't want to write OpenGL to write a desktop application. You probably want to have a toolkit on top of that that takes care of that for you. So about half a year ago I started porting Skiya the graphics library also used by Chrome and Firefox I believe to do all the rendering. Google uses it as well inside their Flutter toolkit. And I ported it to WebAssembly and it can actually use WebGL to do all the abstract drawings. Google found out they say, hey Eric that's really great what you did. Let's continue with that. And they basically continued that port or basically did their own stuff initially based on what I did. And three months later they also announced that they ported Flutter to the browser surprisingly. Another thing is that we can actually also port existing native applications to or at least in theory we can also compile them to WebAssembly. If we were to have a client whose implementation is actually written in JavaScript. Maybe also an important side note all of this is still a work in progress. So you won't find that in a working condition right now. Just to keep that in mind. And what we end up in essence is a true cloud desktop environment. We can deliver applications to your browser over network and you could have say an account on a website where you log in that website could have an app store where you can buy applications and you can then download them in your browser and run them. And if you would tomorrow switch to a different browser or different machine you'll log in and you still have access to those same applications and you can still run them inside your browser. Because it runs inside your browser you're not really tied to some kind of machine beneath it. It could be your phone, it could be a tablet, it could be a desktop. Everything runs inside the sandbox of your browser really. But this is still very much future work, very much work in progress. But I definitely hope to present this or at least give a demo of this work next year and post them if I get accepted at least. So let's have a quick demo of what I did here. Currently here so here on my local machine I have a compositor running inside my browser and I also have an application endpoint also running on my local host here. I didn't want to take any risk regarding to firewalls and web RTC. So sorry, I'm going to demonstrate a remote application here. But I did manage to get it to work using a cheap Hatsner server at home. And if I were to start I have here my X terminal. So this X terminal, well it's X so it's definitely not connected to the compositor here. And I can start from terminal and it launches directly inside my browser. And if you can see if I cannot go beyond, it runs really inside the browser here. It's not subject to any kind of key mapping. The key mapping is handled by the compositor itself. It uses lip xkd command I believe which is compiled to web assembly and it uses key map files to do all the transferring. And we use the key mapping. As you can see I can just type. Currently it's configured to question as you can see here. And let's have a quick look at the NGTK3 demo application here. So here we have the application. And the performance is relatively okay. As you can see the frame rate at the top right. Well, yeah I have a small screen here. If I would run this in full HD on my laptop I would get around 30 frames per second. If you would run full HD on a cheap Hatsner server with a 40 millisecond delay the frames per second drops to 8, 9 if you're rocky. But as you can see here it has no trouble encoding and decoding. It's just fine. So that's basically it in a nutshell if you have any more questions then please do ask. So inside here I think there is an example somewhere I'm not seeing it application or rather it's a separate frame that's rendered and it's nicely synchronized with the other services. So if you're familiar with Wayland this uses a technique to synchronize the updates of all these three separate panes. It's also implemented here which means that if I would press my space bar and keep pressing it then both are nicely synchronized. Quite a challenge to implement. But as you can see the 3D it works just fine. There's no issues doing it. Thank you very much.