 So hello welcome My name is Christophe de dinochin. I'm working from red hat and I'm introducing myself because none of these guys can say my name right so So I'm working on virtualization And today I'm going to talk about spice and specifically about smart streaming So who among you is using spice regularly? Who knows about streaming in spice so Normally the talk is about smart streaming which is basically how to optimize the quality of streaming to adjust for adverse conditions But you're in the right room because you get three talks in one three layers of talk There's a meta talk, which is that instrumentation matters and There is a meta meta talk, which is that your work is not done until you can actually demo it So let me first give you another view of spice and how it evolved over time So it's a way to remote access virtual machines and It's portable on with clients on Linux windows and to some extent Mac OS It has many components. There's a protocol and server client Agent there's a driver in the guest called qxl etc And it's them in the middle of a rather painful transition to whole-screen streaming So why because it was initially designed for 2d comments So it was designed for the time where this was considered at state-of-the-art advanced graphics and At some point code was added to detect streaming on part of the screen next point since when you watch a YouTube video But we are now at the stage where we want to have a hardware accelerated full-screen video streaming for real 3d content And then we run into a problem, which is that how do acceleration today is mostly using h.264 Which is full of patterns and patterns are evil. I show you here a Google pattern for Toys with that basically spy on you with cameras and stuff in it So streaming The problem is that it's more sensitive to the environment like the network quality or things like that So you play and you end up with something that looks a little bit like this So the gaming experience is not it's not right. It skips from time to time and so on Smart streaming is tries to address that to smooth things up and Basically degrade for instance the picture quality to make sure that you stay at high fps so What is the traditional spice drawing model? So you have the spice glance on the left and then the network Transmitting the spice protocol and on the right you have QMU and the spice service inside QMU And then you have your guest and you have components inside the guest That's store data. So now you have some sort of modern operating system from the 1980s in it. That does, you know, it's typical stuff Trails on the screen and so what happens is it it basically talks to qxl and sends things down And then they are sent over the network and that allows the spice client on the other side to reconstruct Based on what it receives and it can receive different types of objects Depending on what you're drawing Now when you use the same model For 3d content what happens is that you basically get a big fat big map bitmap. Sorry. You're not big Mac And then and then basically your gaming experience on the other side is like this, right? So this is not great. So that's where streaming comes in and basically turns your content into video on the fly and The problem is when the network is bad you end up with demo with video that is slower Rainier and and has a higher delay. So you can see that the video on the left is way behind the video on the right So of course if you're playing a game, which it's all designed for a game. I hope you understood that So So what are the issues and bottlenecks with streaming can you name some of them? You have an idea. That's an interactive part of this talk So network bandwidth, okay, so we have this what else? Latency what else? Packet loss. Yeah, yeah, let's put that in the network in the I did not have enough room in therefore So but outside of the network what else can go wrong? So CPU which CPU? Both okay, so we have client CPU it may overload and we have the host CPU What else the GPU where? Both sides to but I did not have enough room on the left. So I thought so actually GPU There are two components in it that we use there is the drawing part and there is the streaming and coding part So they can both overload So let's try to see what we can do So the first thing was to identify the problem in a easy to understand way So I had to create some tools to be able to identify these problems and that was the topic of another talk I gave on Friday, so basically sorry too late Some of you were actually there so thank you for attending both talks But summary there is that it's basically a tool that records data in real time and allows you to graph it And also has some interactive parts So the other talk was about how spice forced me to add all these extra features that don't seem to belong to logging But actually did because I needed them So go watch the other talk. It's very interesting sure it's so So from then the other thing is I knew I needed to to have some sort of Feedback mechanism to send data collected from the spice clients and there's a variety of things I want to collect and I don't know ahead of time. So basically I added some Tagged data mechanism that can measure things that values stages and values times in the client and send them to the to the the QMU and spy server and then the spy server has some magic in there and thinks about all this data and Since adjustment comments to the swimming agent and same thing that just meant eight comments may not arrive all at the same time So so it's also a tagged mechanism for that And then there was a need to add some smart in the server So I wanted this to be configurable for experimentation purpose because I did not know what I was doing at all. So basically I better Adjusted to try to find a good way, but I thought initially of Smart technique that was basically tried to say if you can only display at 20 FPS Then configure the host to send 20 FPS So that's a follow algorithm. It's extremely dumb and That was what I wanted to start with and then I thought oh, I have this idea of if the FPS is like this and then it means the network is broken at that So that was the smart algorithm. It's about three times as long it tries to guess where the bottleneck is and in practice it was a fat failure because the Dumb algorithm turns out to be doing exactly as well So let's see in practice What the what it means to work on this? So let me actually restart this and explain what what's going on so Would you have on on the right? They gave us of course they gave us a resolution for the screen and it's not what I have so sorry the text is outside the Text at the top and the bottom is slightly outside. Let me fix that because it's going to be annoying So now you know, you know, it's how it's done inside Sorry for the introduction of the programming so So this is the typical setup since you have Clients server and agent on the right So you have these three columns on the left it's a client and then you see how I can start the various instrumentation So you will see this sort of repeat. It's just so that you understand the layout of the screen it's a relatively complicated test setup and then basically once you have started the agent then you get your 3d graphics and And you can measure things and then you can start the client side instrumentation Start the server side instrumentation so you sort of get the idea of having these columns on the screen to try to understand what's happening Does that make sense? It's a bit confusing, but you'll probably understand better with This first example, which is how to accelerate it client. That's the ideal case. So you have What is measured by the client on the left the server frames and bytes per second And then you have this workload that I will be adjusting to see how the system behaves And then you have the agent parameters at the bottom and it's not active yet because there is no smarts for me So if I start a GPU intensive workload Then the GPU gets overloaded and you can see that the FPS on the left goes down So the bytes personal gets higher because it's complex content and the frames per second gets low because the GPU can't cope Okay, so Now I activate smart swimming on this kind of content. So there I'm activating the dump algorithm equals one is dumb equals two is smart and Person is not so then basically it's it activates and it applies the partners so what you had on the left was a server and then on the right is the Agents and then when I change the workload you can see the effect on the left So it's going to show up in the left most columns So there what I have is a very simple Content that displays at 60 FPS easily and you can see that on the left the client FPS goes up The bandwidth goes down because the content is extremely simple And so the server follows the remember the algorithm is simple So it just follows that and Sends that to the agent and the agent configures the encoder and everything is fine so far Doesn't make sense so far. Please interrupt if there is any question So in order to tune all this there are a number of tweaks So there are a list with help it's self documented and the server logs shows me the kind of things I can adjust I'm going to adjust the one that is called target weight It's it's a weighted average of all these things and so I'm going to make it so that it reacts faster So I set this in real time Where the other talk was explaining is that the reason for doing this for having this kind of tweaks is that so that I Don't have to restart the VM just because I want to change some parameters So basically the VM the server are still running and I send that live and this allows me to See how the system evolves based on this sentence so I adjust this parameter and And I'm going to see that now the slope When I do this workload change the slope on the left is faster now I gave a higher weight to this Target so now I see that basically it's it's climbing faster So that's how I can sort of tweak to try to have something that behaves more or less the way I want So let's observe some reactions It's now adjusted for faster reaction I'm going to try various workloads and I can see on the left that it follows and adjusts So on the left again, it's what the client measures So you see the results which are here dominated by the GPU not being able to do But you see the agent is responding faster than before the slope is slightly bigger than it used to be Now let's try to have a network degradation So it's a demanding workload here, which is both fast and changes all the strain So it uses a lot of bandwidth and requires high FPS. That's a real test And what we see there is that basically now the network is limiting so the client skips It's that's what I showed earlier except there. It's a real workload and you see it in real time So that's not really usable And what happens also is that in that case the network accumulates some frames So when you lift that you see this spike on the left Which is basically the network catching up and sending extra frames. So you see it displays very fast But then when you restore you do the opposite the client phrases So it's basically you you can end up it if you have too much network blocking Then you can end up with for instance so yeah, let me You can see on the right. Let me replay this one. You see on the right that when I restore the network on the constraint What happens is that? The whole network stack gets clogged and it actually freezes and it actually blocks the encoder So I'm at the point where I can't even generate frames So the playback is really random at that stage. It's not really useful So let's activate smart streaming and see what happens when we have that same conditions so now you see on the right that the BPS first of vice-person on the frames per second are now guided by the smart streaming algorithm and The client does report a drop in vice-person Because of the network and a drop in frames per second because it's guided to do so and Then the algorithm figures out that it needs to decrease bandwidth first And so what you see is that now the picture quality degrades, but the encoder is not blocked So you have something where the picture quality goes low goes down, but you still have FPS and the encoder is is not blocked Now if I switch to something that is very fast and very simple Then we measure higher FPS lower bandwidth and the adjustment occurs on the right as well So you end up with now setting high FPS and low bandwidth, which is exactly what it wants So you can see the encoder generates practically no data So if we have a content that is both fast and complex Then you'll see that there is a requirement for higher bandwidth on the left and And then both adjust on the right in the same way So now the encoder ends up being network limited and the reason it's network limited is because at some point on the left The vice-person will reaches the plateau so it no longer increases and that's that's why it stops That's where the system stops. So it basically uses as much bandwidth it can as it can Okay, let me give another example. Oh, sorry. No first return to nominal. So just observe So I observed the one on the right here on the top right, which is over x11 So you can see how x11 fares relative to smarts by smart streaming and you can see that it was skipping Well, it's mostly whether the spice content was relatively smooth So now if I lift network conditions, then everything goes back up because I have more bandwidth until I don't need more It basically says oh at that stage I can encode my image as well as I want and that's why it stops So let me degrade the network very badly now So it's basically half the bandwidth I had before x11 doesn't work anymore. It's it's kept. It's completely slow but then spice It's smooth because we are basically within a relatively good bandwidth Capability and we but what happens is you can see here that the picture quality goes down So it's still smooth but the picture quality gets more blocking and You can see that the adjustment became very sharp as soon as the content required more data So it really adjusts depending on the actual content Now another case and that's on macOS Where we have a very simple content and despite that The macOS client can't keep up. It can display. So look on the left It's it's really the display that is a delayed because it happens to be a software display at the moment It doesn't use any hardware acceleration And so we have this discrepancy and if you look Down it's not a bandwidth problem at all. It's not a network issue But what happens is that we have undisplayed frames that pilot since then Freddy and who has fixed that specific problem But it was to show the average decoding queue then it sort of piles up and and And we need to find a way to address that so this is without smart streaming So the problem with this queue is that it causes unbound latency So you can end up with what is being displayed by your client being five seconds or ten seconds or two minutes behind What is actually sent by the server? So let's activate smart streaming same method as before and you see that we have this display adjustment in the agent And And so the receive Size now goes down quite a bit the queue line starts going down as well and So the queue backlog starts evaporating slowly So in about one minutes you completely recover of this big back where you had to the point where the problem is solved And you are basically now displaying frames in real time again So now it's a completely different problem from the previous one, but we solve it the same way and then there the number of FPS stays reduced so so basically It stays sink in sync with the server So now the server tries to push the FPS a bit up from time to time. You see that it's trying to Get back up, but whenever it does that The queue length starts increasing again And so there is this queue build up which would create latency. So basically it should shoot it down again So you stay with relatively low if relatively low FPS Which is exactly what you need in this condition not like before well could adjust the bandwidth here It's not the bandwidth problem. It's just FPS. So we lower the FPS to deal with a CPU overload on the client side So What we have seen so far is basically that we have a mechanism to send metrics take action and control parameters So it's tagged metrics because just like on these little tags here We never run out of ideas of about what to send so I Love this one removed Okay, so So currently the implementation basically records frames personal and bytes personal for four categories Which are received the code displayed and dropped And I'm out of time It also monitors queue depth queue length because this was the queue length case was the perfect storm when everything was wrong on the client But this can be updated without breaking the protocol because there is a rule Which is if you send a tag you don't recognize you just ignore it So talking to me. I'm not listening and it should still work. I Used this instrumentation because qualitative picture evaluation was way too difficult It's like the game of seven differences between the top and bottom You can't compare if you don't adjust things to see what happens You can't really compare by just watching at a screen and trying to remember how it was before I Used the recorder library to have quantitative results from which I derived qualitative results I let you read this little thing here Quantitative is not what you want to have in the end you want to have a qualitative result out of these numbers everybody read that and They use the recorded tweaks to adjust behavior in real time. So basically the tweak name comes from this kind of things with time tuning And I think it's a convincing way to be sure that it that it's actually well so that back to the meta discussion That's a good way to convince yourself That's what you're doing is actually working in this case in a rather complicated environment with three in the server three components Then interact Now it's not finished. We're still working on it So right now it looks more a little like this, right Fred and oh, that's a fair statement But it's promising And with that I'm done. Sorry. I was one minute late any questions We have only four minutes left for questions No questions. So you see? Yes, is it already upstream that's part of so I was wondering What part of this of this picture was not completely clear So it's not in its upstream in the sense it's published There's a branch that you can build etc. But it's not in master and it's not in master because there are tons of things That are completely independent pieces like the instrumentation is the independent from so So it's complicated. It impacts all the components in spice at the same time so as a result for instance, I also Redid the build system to be able to build all components in one build But I did it in a way that the team went another way They went the Mason way use make files and so so basically I have my own little incompatible build system That they don't want to add up which I perfectly understand. So So there are a number of things that still remain to be integrated so I think your question is can you try this on your own system if you know how to build? Yes, it's Yes, it's relatively Yes, and when I said the the problem was to make the build easy So what I added is basically a top level build for spice where you go at the top you do make there it configures for you And it goes it builds all the components and then you do make install it installs all the components at once So it's relatively easy because I needed to update all the components at the same time So it was the reason for doing so, but then at the same time We were all the whole team was convinced that the build system needed improvement They did not trust me for the make file stuff. And so they said well Mason is the trusted entity Let's try that so they went for the Mason side of things which I think is working now, but But the result is I'm not doing things like the other the rest of the team My bad No, so because I hate autoconf So I also created a small project will make it quick. So that's a plug for another project So make it quick. It's basically autoconf So it's autoconfiguration without autoconf to make make files that are configured for you works on macOS windows Linux and busy Yeah, so those yeah, so Yes, it should work. It's so the recorder itself is a sub module of that specific branch and The scope the tool I am using to tweak is a very small project that is within this sub module So so you can actually build everything from one get pull again. I'm too lazy Folks being lazy tend to create all these extra projects to the thing for them. So normally you should just make install and be done with it Actually, you can you can even test without making so you can test it in place as well You don't have to destroy your system with my stuff to We are completely out of time, but then there is nobody behind so if you want to ask more questions I think you can join me outside and Yes, that's what I'm saying so if you want to ask another question I