 This is just sort of a 100 level course in what the various solutions are in the world for remote graphics, what sort of common techniques that a lot of them have as far as trying to improve performance and what sort of the unsolved problems are in the area hopefully gives you coming away with a little better understanding of what the solutions are in that area and maybe some interest in working on it. So first some just basic definitions. At the basic level I have an image or a desktop system somewhere in the world that's too far away and I want to bring a view of it here and I'm going to use the network to do it. Most of the time but not always input will be a part of this. So you have to go the reverse direction and take things from this laptop and send them off to Sydney. This may or may not be something that's built into your window system. In fact it usually isn't, X is sort of rare in this in this respect that it's something that the window system can do intrinsically and maybe something that you want to forward on a per window or per application basis or maybe something that you want to do the entire desktop, the entire desktop and be as though you were on the system locally. There's sort of three major classes of approach to this problem and I'm going to kind of come from both ends and describing them and because they kind of, it's easier to describe the middle ground once you've seen the extremes and the three major classes are sort of remote frame buffer and remote desktop and remote window system. So a remote frame buffer is something that you're probably familiar with is VNC. It's really just the dumbest possible approach to this kind of problem. All you do is scrape the pixels off the screen and shovel them across the wire as fast as you can. You get notifications about the screen updates through some window system mechanism or you just grab the whole thing all the time and figure it out in software. And in the other direction, if you're doing input forwarding, you pick up the input events and throw them across the wire the other way and now you're done and that's all you have to do. It is literally the simplest possible approach to the problem. This has some real advantages. It's lowest common denominator. Every window system supports this level of functionality. Every desktop that you want to interact with supports this level of functionality. So it's provided just about everywhere. There are clients and servers for VNC for Windows, for macOS, for macOS Classic, for Linux, for both X and the frame buffer console and the text console, which is really terrifying, you know, Amiga OS and BOS and anything that you might want to run and a couple things that you don't. It's so simple in fact that you can provide this in hardware. There are starting to be a lot of remote console server chipsets that have VNC as sort of a feature where the video pixels go out VGA on the back of the machine but they also run out VNC over ethernet so you can see the entire machine through boot up and dive into the bioscreen and change the boot order and then boot on which is a little bit of a surreal experience if you've never done it before. Since there's no complexity in the protocol, there's no replies, there's no synchronization, there's no point to be on the initial kind of connection handshake where I establish the parameters of what each end can do. There's no point at which you need to go from one end to the other. There's no, the server when it pushes pixels to the viewer just sends them off and assumes that once they're there they're rendered correctly. In the other direction, just goes to the server and you never have to hear replies. So the latency characteristic is no worse than the amount of time it actually takes the packets to get there and back. As a result of this, there's also very little state in the protocol. There's just nothing that you have to keep track of. You can almost at any point in the protocol throw away everything you knew and just start receiving new commands and it'll work. So it's very, very easy now to write a client as well as a server to put the server in hardware, things like this. And it's too simple. We have sort of a weird perception of area, 1080p is literally twice the pixels of 720p and doesn't really look it because that's not really how we perceive area, it kind of looks 1.4 times bigger. Even 720p, 1280 by 720 is just a lot, a lot, a lot of data. It won't fit on 100 base T, it won't even really fit on gigabit ethernet. One frame actually takes a long amount of time because bandwidth is a measurement of time, not just of the data that you can move. So you can't do anything like full screen video with this without some complexity and we'll get to some approaches in a bit. But on its own, just moving pixels across is really not a scalable solution. It takes too much bandwidth, so you need something more clever. Also, the input subset is just barely sufficient for what you might want to do. If you're typing on an English layout machine and the machine on the far end is also English, it'll probably work. Everything past that, anything more complicated like trying to remote control a French machine with a US keyboard, not so much. And anything more complicated than a mouse isn't going to work especially well. So what would it look like if you did something that was far more complicated? Well, you would get something to look like X. X is a really nice window system. The defining characteristic of a window system is that you can build a suite of cooperating applications out of it and it's sort of designed to do that from the ground up. X has a full set of rendering primitives for PIX maps, which are off-screen rendering surfaces. And then windows, which are regions on the screen where you can see them. And clipping regions, so you can say, I want to do work on this part of the region and then take that chunk and move it to somewhere else. That move image transfer around that way. It has a very rich set of rendering primitives, most of which you kind of don't want to use anymore because they were never that great, but the big ones being solid fill and blit possibly with an alpha factor. Those actually embody 90% of the drawing that you do in a 2D sense anymore. And they're sort of directly in the protocol. We have input completely integrated with this. Windows have a client that they're associated with and Windows define the geometry of picking so that the thing that your mouse is pointing over is the thing that you're delivering input to. And we've got a whole bunch of logic to take care of that. And we also have this wonderful IPC system that lets you do everything else that you might want to do in terms of communicating between applications. Selections and properties and client messages and all this way, methods of communicating between applications. And this gives you things like the clipboard, both clipboards, however many clipboards there actually are. I think there's technically three. Drag and drop, launch feedback so you can know when an application is starting you can change the cursor and make it go spinny and then have it actually have it rest once it's done. You can get those a window manager ping protocol for knowing when an application is not responding to its main loop and then you can have it gray and fade out and pop up a little do you want to kill this application kind of thing. All that stuff is in the Windows systems IPC protocol. Positive things about this is that since you've got these rendering primitives, since they are the thing that you're building your application's user interface out of the way you're building the presentation, you've sort of defeated the fact that pixels are this huge linear operation. Solid film, blend and blit and window move all take a constant amount of data on the wire to describe. No matter how big an area you're filling, I can be filling one by one or 64k square. And it takes the same amount of data to describe this. Windows scrolling is a little bit more expensive because you have, you scroll up this much. But now you're only linear on the area that you exposed for pixel updates. All the rest of it is I want to take this chunk and move it up, constant time. Having these rendering primitives in part of the protocol moves the drawing to wherever the view is happening. So if I'm running an X application on this laptop and displaying it on Daniel's laptop, Daniel's doing the drawing. Which means that it places less demands on the server to actually achieve render that you don't have to keep a copy of all these pixels. You just kind of throw the rendering request across and assume that they happen somewhere else. And to the extent that what you want to do between these applications that you're remoting or the desktops that you're remoting, they look virtually identical. X has this sort of network transparency feature where a remote application doesn't look any different than a remote application. And so they can do drag and drop between them pretty seamlessly. They can overlap them and they don't behave any differently. But the problem is we didn't give it an IPC system that encodes what you want to do. It encodes everything you could possibly want to do. So the mere act of setting up drag and drop requires a fair amount of chat back and forth between the two applications that are involved, the sender and the receiver. And now you've blown out your performance characteristics because now you're bounded by the latency between these two machines. X's IPC in particular, it has a lot of problems with this because it's sort of fundamentally asynchronous. In order to do drag and drop, you have to hold the state of the window system still. You have to know that windows aren't moving around from underneath the cursor. So in order to do that, you have to do a thing called a server grab. Server grabs are a way of monopolizing the X server's attention. And now you've got a machine somewhere on the other end of the internet that has said, pay attention to me now, I'm going to do 40 requests back and forth until we figured out this drag and drop thing and then I'm going to let go. So hopefully that those 40 round trips didn't take too long and hopefully the link between you wasn't lossy. All that actually happens is that it always takes too long, latency is always a problem, and every link is lossy. X in particular has a really weakly defined notion of an application. There's no, windows are merely clipping rectangles. There's no intrinsic parent child relationship that says all the windows that are children of this window belong to the same application. In particular, if you click on a menu on an X window system, in order for that to not be constrained within the boundaries of the application that generated it, it has to be a child of the root window, not of its parent. Not of the application that got the request in the first place. So you can't just look at the window tree hierarchy and know that a collection of windows belongs to the same application. So of course, we have properties that we dangle on all these windows to say, I belong to Firefox. And we don't put them on all windows because if you did that for things like menus and transients, it would be slower. So you'd be reducing your performance even in the local case. So trying to simply scrape out an application from X in some other protocol is really difficult. And X in particular is basically connection oriented. You connect to the X server, you build up a bunch of state objects in the server. You do your rendering in your communication through them and you tear them down. All the IDs were allocated by the client, which means that if you wanted to detach that application from one server and put it back on another server and do kind of the detach and reattach problem, you'd have to recreate all of these resources. You'd have to recreate all of your PIX maps and your color maps and your windows. And you'd have to hope that the IDs that you use to talk about all these things weren't already allocated by the server. Or that if they were that you build this complex map to redo them, Emax can do this. GTK can do this in principle. It tends not to work. Yeah. So it ends up being this very difficult problem of figuring out all the state that you need to marshal in and out. Much of which can change if you move between X servers with different capabilities. Your channel masks might be the wrong way around. Red, green, blue, or blue, green, red. They might be different color depths. They might have visual IDs that happen to be different, even though they've got the same properties, the same attributes. So attach and detach is really not something you can do in this kind of environment. And, of course, if what you're trying to do is remote whatever application you happen to be running, you're limited to remoting X apps with X, which seems sort of obvious, but you couldn't. It's not a trivial exercise to take a Windows machine and make a Windows application appear as an X application somewhere else. It's been done, I know, but I'm sure it did. How much does that product cost? Given that you can't buy it anymore. Microsoft did not like that product, was the comment. So what would you like instead? Oh, wait, no, that's not what I meant to say. Neither one of these are actually sufficient for the full task of remoting a desktop. If I have a virtual desktop environment or a terminal services kind of environment, neither one of these are good enough. I can't do audio capture. I can't do audio playback. All these things that are up on the screen, you can read. This webcam, if I were displaying a remote session on it, I want that webcam to go to an application running wherever the rest of my desktop is. And there's no way to do it. It's not in VNC. It's not in the X protocol. These are simply insufficient protocols for full remoting. So what kind of system might you like? Well, you might like something that was a remote desktop. ICA is a product from Citrix. It stands for Independent Computing Architecture. Now, I think, wonderful name. It used to be called MetaFrame or WinFrame or something like that. RDP is a remote desktop protocol from Microsoft. It was based on an earlier ITU communication standard for desktop sharing. And as a result, it's very, it looks like every other telco standard you've ever seen, lots of layers that do very, very little. SPICE is a protocol that we're working on within the QMU projects to do similar kinds of things that hopefully cuts out a lot of the complexity and it's a little bit more manageable. These all have some pretty similar properties. They have objects that look like the window system. You can still talk about individual windows instead of VNC, where it's very difficult to talk about particular windows and applications. You can do, I want to forward just Firefox or I want to forward just the GIMP and still be able to do it. But instead of giving you this full blown generic IPC mechanism, any sort of inter-application communication you strictly define on that's way up so that you do only the communication necessary to achieve that rather than having this full blown chatty back and forth. All of these have the property that the rendering primitives inside the protocol are flexible. You can send either the encoded list of commands that the application used to draw the scene or you can send the image depending of the window at a given point in time. Whichever one happens to be more efficient depending on what you're optimizing for, whether you're optimizing for bandwidth or CPU time on the server or any number of other things. But again, all you have to do is push these updates across the wire and assume that the client has rendered them correctly and then you no longer have this responsibility for handling exposures and things like this. And then there's also the way you add new IPC primitives is you've got a sub channel design that kind of adds each new feature one at a time. I want to add a clipboard. All right, now I've got a clipboard channel. I want to add audio remoting. Now I've got an audio sub channel. So why don't we just use RDP? It's on everything. All these Windows machines have it. It kind of works. It does work. It works remarkably well. Supporting RDP means chasing after Microsoft. You have to be compatible with whatever they've implemented. Anything that you want to add or extend the protocol you have to do twice on both sides. And you also have to bootstrap the entire viewer on Windows because you can't just plug into Microsoft's reviewer and add new channel support. There's a contract wall behind for the RDP docs that so far nobody's wanted to look at because there's a lot of scary legal language in front of it that says this is what you have to agree to in order to implement RDP using our specifications. And it's kind of an awful protocol that from an elegance perspective it loses. You'd rather not see all the ways in which you're diving down through these layers of unwrapping and transport security and grossness. Go read the R desktop source sometime. It's terrifying. So if that sounds like the kind of remote video protocol that you're okay with, great, go work on RDP. We do actually need RDP support because sometimes that's what you need to interoperate with. If not, go talk to Spice. They've got a lot of new problems to solve. And Spice is actually very competitive with RDP for some bandwidth usage. I have a question. It does work. I've seen it to work. It operates on the same principles. I don't know if everybody, did the recording catch that? The recording, the observation was that there is an implementation of an X server that speaks RDP at the side of its mouth and sort of the same way that Tiger VNC is an X server that also speaks VNC. And it works. It does a lot of the same things. It encodes copy area and everything in their correct efficient way. And you can view it from Microsoft's viewer. It does work. But it's, again, anytime you want to add these features to it, you have to find a way to do it. If you wanted to do audio forwarding, for example, you'd have to find a way to stuff audio forwarding into X first. And then XRDB would be able to pick it out. So those are sort of the big classes and why you might want something that's more in the middle as far as these implementations. You don't want something that's quite as simple as VNC. You don't want something that's quite as complicated as X. So what are some of the tricks that they all do? They all face sort of the same problems of performance. The speed of light is really slow. It's, in the best case, 20 milliseconds from Boston to San Francisco. Or from Brisbane to Perth. There's really just nothing you can do if getting requests to the other side of the country and back is something that you could only do 50 times a second. That's the maximum performance characteristics you're going to have. You're only going to be able to do things 50 times a second. Again, bandwidth is a measurement in time, not just how much data you have. So no matter how much bandwidth you have, and you may have a lot, you still want to reduce the amount of bandwidth that you're using because that improves your frame rate. If you're doing something like a terminal services or VDI, virtual desktop infrastructure kind of deployment, then what you're doing is you've got one big server off in Iraq somewhere that's running 50 or 100 desktop sessions, all going out to different machines all the way around the world. So now you have a lot of shared resources. You have the shared bandwidth pipe going out. You have the shared CPU usage on the server because all these sessions are competing for the same resources. So you want to optimize for those aspects. Planet wants you to optimize for power. Matthew wants you to optimize for power. He's an angry young man. And yeah, there's never enough bandwidth for what you want to do. So how do we address some of these issues? Job one is pixels. Pixels are too big. Let's have less of them. Let's compress them and make them smaller. It's a large variety of options here. You could spend all your time simply looking at different algorithms for pixel compression, whether to use Ping or RLE or any number of other things. There's a bit of a trade-off between lossy and lossless encoding. Lossless can be a bit quicker to encode because you don't have as many guesses to make about image quality and fidelity. But it's usually bigger on the wire for anything that isn't trivial, for anything that doesn't look well, like my slides. Lossy rendering is tricky when your client rendering is non-trivial. What that means is say that I want to do JPEG images of region updates and I want to toss those across the wire. But I do a couple of those and then later I encode an alpha blend over top of them so that I want to draw some text over it and have that pretty anti-aliased edges. If I do that, any of the errors that were present in the JPEG relative to what it actually looks like will get compounded by successive rendering on top of it. So you kind of can't do lossy as a general technique. It works for something like VNC where you're just getting an approximation in any way and you're never trying to do anything complicated on the client side for rendering. If you're trying to move the rendering to the client side, then you can't in general use lossy compression. You have to be a little bit more careful about it. You can do more clever things like when there's a region of active updates you do JPEG compression and do a series of approximate updates to a region and then once it settles down you do a lossless update at the end, which may use more bandwidth or not depending on what your threshold for detecting active update areas is, how quickly it settles, so lots of heuristics. And depth crushing is just making it so that the viewer doesn't actually see the full depth, the full color fidelity that the server actually has. I could be viewing in eight bits per pixel a scene that is actually rendered in 24 bits per pixel and it'll look like the first time you use Netscape Navigator back in 96 when you had Slackware installed. It's awful, but it does a very, it does a very, very effective job of reducing the amount of data you need to transmit, like factor of three, but it looks terrible. So depends what your user is willing to put up with. Video playback is sort of a special case of this. Like I said, if you had a region that would save your YouTube clip playing and you noticed that that particular region was updating a lot, you could just send JPEGs of that all the time because since you know it's animating, you know the user's not actually gonna be able to see perfectly with absolute fidelity every single bit worth of the image that you were sent. But there are better things that you could do. You could, if you hooked into the window system at the right point, simply grab the MPEG slices before they even hit decode and throw those across the wire and then you sort of optimized perfectly. That was how compressed it was to begin with. It requires that you have the same set of codecs on both ends and then that becomes this polynomial explosion problem. But if you're Microsoft, for example, you can do this really, really easily because you know a Windows 7 machine has this set of codecs, period. So if you have a Windows 7 viewer and a Windows 7 server, if you didn't have that, usually video data is some compressed representation of YUV color space rather than RGB. So you could just grab the YUV data and throw that across. We're usually encoding YUV with more precision on the wide channel than on UNV. UNV correspond roughly to chroma red and chroma blue. And we just don't, the eye doesn't have as much spatial accuracy in the red and blue channels. So usually video has four pixels of luminance and one pixel of chroma in its red and one pixel of chroma in its blue. So half to two thirds of the size just by not taking that extra step to go to RGB. And then you can do the same usual compression tricks on the way out. Or you can do successive JPEGs and then optionally do the lossless frame at the end for the exact version of once the video stops and tells you what the similar videos are. And you can go click on more cat videos. Cursor rendering is another common optimization. There's no reason to send the cursor from the server to the client. If you consider the cursor's position to be part of the server's state, then you have to do this constant stream of image updates wherever the cursor happens to go. And you're kind of drawing this little trail behind of all the regions you've exposed and where you're painting the cursor. If you take that out and simply tell the viewer, cursor's here now, cursor's here now, cursor's here now, that for whenever it warps from the server side, you've eliminated this huge class of bandwidth usage and of latency because you no longer have to wait now that you've moved the cursor on the host to see it actually show up. You no longer, okay, I moved it, motion request gets to the other end, paint happens, paint comes back. And that is what most protocols do actually. That's what I'm saying is a common optimization for this. If the latency is bad enough, you may want to draw two cursors. You may want to draw one that is actually the cursor image and another little three by three box that's trailing behind that says this is where the guest, where the thing you're viewing thinks the cursor is yet. So you don't get too concerned about the fact that it looks like you're mousing ahead. But at least you can get rid of this perceptual lag problem. You really want local input to act immediate. And then if you know that you're dealing with a remote system, you're a little more forgiving with that. But you don't want the local thing to feel laggy. And that does happen. The observation was that the cursor image does change when you go over resize grippies on the edges of windows. And that you have to deal with no matter what. But for the most part, the client controls the cursor position and also draws it rather than telling the server where the cursor is, but relying on the server to send cursor position updates back out. But if you're really trying to lie to your users and say, oh yeah, it's a local system and actually it's often a data center in Phoenix, you're making it a little bit more obvious that you're lying if you draw two cursors. When this is the whole, figure out how that step works. Yeah, this is what x-grabs are supposed to solve. And they're too hard to program. So nobody uses them. The input stream is also pretty reasonable, a modest source of bandwidth usage. It's not as huge as pixel updates, but it's still there. Most of your input just doesn't matter because most of your input is just mouse position and most mouse position updates don't matter. And it's surprisingly chatty. If I just move the mouse around on this touchpad, I can get 500 interrupts a second times 32 bits each for position updates plus a device identifier plus a header bit for what kind of command I'm sending. And now I'm doing 12 bytes for every position update 500 times a second. I'm up in the dozens of kilobytes range per second. You could probably crush that down a little bit if you were trying. The sort of three big approaches to that, one is delta encoding which says don't have to send a full coordinate space update of where you are. You can just send relative motion about where you went. So if I only moved less than 250, less than plus or minus 127 in each direction, I could pack it into eight bits per x and y. Or if I moved less than plus or minus seven, I could pack it down into one single byte per position update. And that actually works really well because again, this is a lot of very small motion. So now you've gone and taken something that used to be 32 bits on the wire and made it down to eight bits on the wire. Helps. Get in there. The observation was you can just throw away all but the last update and that's a thing called squashing where you simply ignore some amount of local activity and only send periodic updates about cursive position. And this does work. Works remarkably well. You could just throw data away, but if you do this in an application like the GIMP where you're holding on the cursor and drawing and painting, now you're turning this really pretty curve that you were trying to draw into warp, warp, warp, warp. And it gets all chunky and doesn't work very well. X has a hack for this called pointer motion hints that allow you to figure out what that activity actually was. And again, nobody uses it because it's too difficult. You can also batch cursor updates. If you sent a single TCP packet for every single cursor update and we've estimated that to be 12 bytes a piece, well, a TCP header is 40 bytes. So maybe you didn't want to do that. But here the trade-off you're making here is latency for throughput. If you pull multiple position updates into a single TCP packet, you reduce the amount of overhead from the TCP header. But you're delaying, you're fighting in time against the remote end giving those position updates and being able to respond to them. So that may or may not be a trade-off. It depends a lot on what the latency characteristics of the wire are. Again, all of these work together. You can do delta encoding and sometimes squash them if you've noticed that, for example, you don't have a mouse button down. So you're probably just moving for position targeting rather than actually drawing anything. So what are some of the open problems in this space? One of them is that TCP might actually not be the best way to do this. Stream transports give you a really nice programmatic interface, you just assume that everything did get there. And it was that the underlying software reliably told the far end about everything that you told it. I don't necessarily need that. If I'm doing kind of stateless updates of the system, I just want to get eventually the right thing on the screen. If I knew that this region of frames had been lost, I might not need to retransmit those if I can just put another thing in at the end that compensates for it. So I might not need to wait for that retransmission, especially if I've run out of the end of the TCP window, if I've lost a packet and the window scaling no longer wants to let me advance and send more data from the server side, I'm going to be waiting until I get an acknowledgement back from the far end that says, yep, okay, finally got that packet that was lost. And then the window catches up. You don't necessarily need to do that, but then designing the server software is a lot more difficult, because now you have to do all this keep track of the client state and what it's acknowledged on your own. Same thing is true for input. The natural response when the system appears and not to be updating is to bang on the keyboard more to start wiggling the mouse around, come on, respond, respond, and all you're doing is making it worse because you're stuffing more data down a lossless channel. So if I knew that I was losing some bits, I might be able to start squashing events on the client side and refrain from sending them until I have some idea that the remote ends caught up. There might be a better protocol than TCP for this. It would, doing so on a protocol than TCP would run into the usual firewall problem because of course it would. But we should be able to get more of this information out of the TCP itself. The kernel has to estimate some of these characteristics on its own in order to estimate the window size, to know how much data is allowed to be in flight at once, to estimate what the drop characteristics are of the link and to know where the packets are getting lost. So if I knew, okay, I was stalled here, wait a minute, and could get that information back out of the kernel, I might be able to not put as much data onto the wire in the first place. I don't really know of any good solutions to this, at least not in open source software. I suspect that ISEA does a better job of this than we do. No? No. Okay, so I'm told otherwise. So that's one that's sort of one of the classic open problems. The other one that's big I really, really love to pretend we'll be solved any day now is 3D promoting, and this is actually impossible. There are two ways that you could do this, right? Again, you could blast across image updates for every time you do a swap buffer to the front, you just scrape the pixels out of the frame buffer and toss those across the wire, that would work. And it may actually be as good as you can do. There are downsides to that, which is that your video card's not actually made for switching back and forth between putting pixels up on the screen and then tossing them back out to host memory. Get image is a slow operation on every video card. We're really sorry. It turns out that most of your drawing operations are in the push direction towards the video hardware, so we optimize the hardware for that and getting anything back out's kind of slow. So you could do it. It also, if you're doing a VDI or terminal services kind of thing, it puts a lot more load on the server if you're doing 3D because now you have this single shared resource that is the 3D accelerator that all your guests have to compete among. Or worse, you have software rendering in all your guests and now you've really just decimated your CPU. If you wanted to do the other thing, the hard thing of forwarding 3D rendering to the viewer and letting the rendering happen there, now you have really, really made your life difficult. OpenGL and 3D hardware in general has sort of all the problems of CPU flags for virtual machine guest migration where you have to pick a common subset of all the things that you might ever want to migrate that machine to. But way, way, way worse. I did a very quick survey of the four renderers that I could get my hands on in my hotel room and there are on the order of 120 OpenGL extensions in any modern implementation. The common subsets on the order of 90. You don't know ahead of time which ones you want to be able to do and so anything that you slice out and say that the renderer can't do, even if it can, we don't have any idea what that complexity looks like and what you would be prohibiting your applications from doing it efficiently. And the OpenGL limits are different, how big your texture buffers can be, how many of them you have, how many instructions you can execute in a shader, on and on and on, how many, what your buffer formats are like, how many ancillary buffers you can have attached to an OpenGL context, whether you can have triple buffered rendering or not. Madness, absolute madness. If you wanted this to be portable, the systems that aren't Unix and it's a very noble thing to want to do, I guess, you have to do two APIs. You have to do Direct3D. You have to do however many versions of Direct3D you want to support, because there's a couple and they're not all the same. And you have to do OpenGL, which at least is kind of nice, it's all sort of a subset, a growing superset kind of thing. So now I have to do two APIs. Do I figure out how to scrape both of those out, figure out the common subset of what I can support in both directions? Do I have to write a Direct3D renderer in OpenGL? Do I have to write an OpenGL renderer in Direct3D? Both directions. Or you define some subset protocol in the middle that is, you know, that's sufficiently complicated to cover everything that D3D or OpenGL wants to do and encodes all the subtle differences, like whether pixel centers are at the top left of the pixel or at the center of the pixel, pixel coordinates are at the top left or at the center. That's different in those two APIs. You have to compensate for all those things and make sure that you can get those translated in this whole big mess. So assuming you wanted to type a lot, you would still be facing the problem that the data involved in 3D rendering is extremely verbose. In a video game, your textures are approximately twice as many things. Your textures are approximately, if you have a model that's gonna be 50 pixels wide on the screen, the texture is usually about 100 wide because Nyquist says that's what you've gotta do in order for it to look good. So now I've got all this texture data. I don't know how much of it I'm actually going to need, but I can't tell a priori because I can't inspect the shader and Turing said you can't do that. So I have to blow all the textures across to the remote end and I have to blow all the vertex buffers across and the shader elaboration, either as it's in the source form or in some intermediate representation like in the Mesa intermediate representation out of the compiler, get that to the other end and run it and it may still have been more data on the wire than just image scraping and it may not be fast enough. So what do you do? And so far it looks like the answer really is scrape pixels out and figure out how to get a fast renderer on whatever your guest system is and just do image pushes, but there's not a great solution there and there are people who want to do the full remoting thing because they're weird. The observation was that for video games, the loading screen, that's what that is. It's uploading all the textures that you're going to be using for the course of that level and compiling all the programs and everything and then running the game engine. So that may be okay, but for GNOME shell, maybe not. That's basically all I've got if there are any questions and I suspect there are a couple. I'm ready to take, go ahead in the back. Right. On live is a remote gaming service that claims to let you do real-time gaming and with pretty 3D rendering, often the cloud somewhere and they're right, you can do that as long as you don't mind that real-time means two or three frames later than you were hoping for. Frames are measured in time. A frame is 16 milliseconds. You only get 60 of them per second. If it takes me 16 milliseconds to get across the country, I'm a frame behind. And the human threshold for latency, depending on what you're doing is anywhere between 30 and 100 milliseconds. So you only get two to six frames before you really start to notice that something is wrong, that it's too laggy to actually be useful. So that's how that goes. Any other questions? Any more, just wait for Mike because we've got it recorded. It seems that the problem that we're trying to solve with this is a lot like having a very low latency video codec for streaming video. It's very, very similar, yeah. One of the things they do for remote control of vehicles is they use something like JPG 2000. So it's a continuous-feed video codec. So you can send like one K. And then when you send the next K, it just improves the first image. Right. So like it's a continuous improvement. So like the first bytes you get, you get a blocky image. Then it magically refines without having to send a separate lossless packet. So or similar to progressive low JPEGs from, again, back in the 1996 browser days. Yeah, except it gets too lossless almost quick. That is certainly an approach. I believe that HP had a product called RGS that did something vaguely similar a long, long time ago. They may or may not still be able to get from them. My concern with that would be what the encoding latency is like. Whether or not that's something that I can do close enough to real time on the server as I'm pushing things out to be able to get, to be able to have a high enough density of hosted machines. But that's certainly a viable approach. Another thing is we're starting using this thing called PC over IP, which has got little hardware compression codecs that take the VGA. Unfortunately, it's a totally proprietary protocol. Most of those tend to be PC over IP and PC anywhere and go to my PC and things like this, tend to be somewhere between RDP and VNC on the complexity scale. They've usually got a more full, complete set of remoting for USB or for storage, but the actual pixel transfer is often just not a whole lot more than pixel scraping, so which is, again, it's potentially legitimate, but you spend a lot of time trying to optimize your compression. Go over here. Last question. This is about the difference between the VNC and using X to do remote stuff. The main thing being that with VNC, you can obviously detach from the session and come back, whereas with X, as soon as you sort of, one end of the thing drops, then even if the server's there on the program can theoretically continue to run. Do you know anyway, you can actually have the program just go to a proxy mode if the display drops out? In X, you basically have to solve that by having connected to the proxy beforehand. And there are a couple of solutions there, like this XPRA is my favorite at the moment because it is a full X server, and then you're forwarding the way you forward out of that is with something else. It's a VNC that's got some magic out the side of it. It's actually a VNC with a compositing manager. Your application runs in that and the compositing manager tosses this protocol out the side. So your application never disconnects, but the remote view can. Maybe I have time for this guy, or maybe I should catch him afterwards. No, sorry, that's a lot because we're gonna run over. Thank you very much for your time, and there's an excellent presentation. And I wonder if you can all thank Adam for that. A gift for you, sir. That's the redded bowls that escape the Brisbane flood. I won't give the full story right now, but perhaps at the end of the next session, I will anyway. Thank you very much. Start again in 10 minutes now.