 Thank you. So our next talk is an interesting use case with a multi-camera instant replay with slow motion called Futa Tabi, please welcome Steiner Thank you So I have two great passions in life at least at the time one of them is programming and the other one is the sport known as ultimate frisbee And when I tell people well frisbee and they're like, oh well, how can you play anything with the frisbee? I try to explain them while you throw to your teammate You want to get inside the goal and they go like what and I try to show them the video The problem is when I move back to Norway where I live now and started playing tournaments The stream basically looked like this And it's pretty much impossible to see anything right you can see sort of the players But there's supposed to be a goal line here and good luck seeing anything that it's supposed to be Around here somewhere, but it's all smudged out of drama windows on X split They managed to get a full like 15 FPS sometimes the various kind of would take one core And it's really really hard to explain a sport to someone when you can't even see are they in or out When the point is to throw inside the zone So we thought can we do better than this with free software and it's not an obvious It's not an obvious answer I've never heard of sports production being used with being done by free software before we thought, okay We'll give it a shot And this is not get it My live video mixer, which I presented here actually in this very room three years ago It's grown a lot since then so if you tried one out three years ago, maybe give it a shot now There's a lot of new features So I'm going to do a very quick rundown first of What our stream is looking like this is a standard ultimate field indoors 40 by 20 It's a plane handle fields in Norway In all of the countries outside. You can see there's not a lot of room So we are pretty cramped where you can put our cameras First of all, we have camera one camera one is your standard wide angle Let's look at the entire field thing that you will see from any normal sports thing We're trying to be very To be very sparse on manpower. So camera one there is run by the producer He's tilting it with his left hand is operating the negative with his right hand. He's also doing audio mixer He's also doing graphics. It's a fairly busy position, but it's but it's a lot of fun There's also camera two camera two is a detail cam, right? It just basically follow the guy with a disc. It's a bit more zoomed in. It's nice for these detail shots And most of the time I think maybe 85 percent of the time you want to be in camera one Maybe in 10 percent of time you want to be in camera two And then we have three and four and three and four are what we call beauty shots They are go pros actually perched high high above each of the two gold zones So you can see the end zone more or less. It would be nice to have them even wider actually These are go pros are Low power enough that we can actually just run them on power banks We don't run power to these we just put up a power bank put them on the very very cheap and a high extender And then we run SDI converter and go pro on these power banks that late left maybe like 18 hours And now if this sounds like a lot of equipment, maybe it is But almost all this is either bought second hand on eBay or borrowed Or otherwise, it's not an expensive production by any means at least not compared to Compared to a traditional broadcast so after all of this It looks basically like this You can actually now see the goal line. It's up here. There's HTML5 graphics here. This is embedded chromium and Was it in or out? Who knows right 70,000 lines of code later, and I still don't know whether I'm in or out for the record I was in We looked at it later But it became clearly painfully obvious that we needed some form of instant replay We wanted to be able to see shots again We wanted to be able to see shots from a different angle like camera 2 instead of camera 1 or or likes And we wanted to be able to see it in slow motion So we went online and looked okay. Well, we need a box to do this What do what can you get? This is the edx xd3 This is pretty much the standard thing in broadcast if you look at any major sports event They will either have an xd3 or they will have many of them Unfortunately as you can see here the part of the price is 99,000 pounds and this is a bargain you also need to pay $30 in shipping Maybe if you get that waived and to make it even worse, right? The guy the seller is currently on vacation so you can't even buy it Now there are cheaper options, but they're all generally in the 10,000 euro range And that is essentially like a Windows box with some software and and a video card So so we thought okay, let's try making our own and this is of course where it's also becomes into Into the action the first you is silent because the Japanese word Basically, we try to get all the frames from all the cameras from the ghetto over IP We store them and we want to play them back when you do this you have a frame server Now there are a few problems, of course uncompressed video. We just talked about it on 10 gig It's fairly voluminous. You can maybe do it over the network It's storing it and when you want to store all of it so you want to store 12 12 hours uncompressed video It's not gonna fly So we are doing and Jacob and and jpeg compression you would think okay Well, why do you want jpeg jpeg is like in 80s format, but it's very very performant and it's actually pretty good In fact on recent CPUs from skylighting up. You actually had a hardware jpeg encoder instead of using four cores to do dbc Pro or whatever you can just send it directly to the Intel GPU send it over network store it here and Then you have some sort of you out which we'll be looking at later to pick out for instance this segment here and Play it back and even when you play back these unmodified frames. You can just echo the mjpeg back You don't even need to decode or re-encodes to get maximum quality You know you don't get any sort of cpu usage But we also wanted slow motion and that begs the question. Well, we have the unmodified frames here We want to slow down by 2x But what do we do with these ones in the middle? now the obvious first Choice of course is just to repeat frames repeat them twice three times four times how many times we need to slow down and That looks like this It's very very choppy And it's not really something you want to show to people now I will say though that I've deliberately picked a very very hard sample There's two reasons for this a it's a lot even it's a lot more fun to reason about the things that go wrong And the things that go right. I mean the easy cases who cares, right? The other one is that when you were having a talk that this these effects can be really subtle It can be very hard to see things when they're just flying by once So I want to exaggerate a bit now. I'm playing this. It's a lot of motion We're doing 4x slowdown instead of 2x slowdown this kind of thing So it's just to make things clear So okay, so repeating frames is maybe not the best. What about fading frames? It's not really a lot better, right? It's it's still pretty choppy So we need to do something Even better than this I have taken two neighboring frames and overlaid them so you can see sort of we have a frame a we have a frame B And we want to find something in the middle and Obviously what happened in this frame like say you have this carry on the first one you have this care it moved a bit His leg here move a bit from the right to the left and so on so Somehow these intermediate frames in the middle or 20 percent between or 80 percent between and what have you it has to do somehow with motion right if we can estimate the motion Like this maybe we can try to synthesize in between frames So this brings us to the idea of optical flow optical flow is basically a model that says well all pixels came from somewhere Here we have made an estimate again It's what you'll see first of all courses all these little arrows pointing up into the right And that's because the camera moved down into the left in the line in between these frames But it's also picked up that the disk here is moving to the right sort of it's picked up that the player is moving to the left There's slightly different arrows here And so on so if you can take these arrows like our flow field and Somehow invert and hobbit which is much harder than it looks like Then we can maybe get an idea for not where did the pixel go which is not that useful, but more where do we want to sample it from? So this this part is actually non-trivial, but we're not going to talk about it We're just going to talk about estimating the flow field instead, which is a very different very difficult problem in its own right There are more than 200 papers dealing with optical flow. I read maybe like 20 30 of them Dirtiest rating lists There's new papers coming out really every week But between the time where either is rating lists for like who can make the best flow with the best algorithm and standardized tests and By the between me writing this this talk and and now I think like the top list has changed three times or something So but most people don't care about real time We need to be able to synthesize these these in between frames Basically 60 times a second or at least 30 times a second if we can keep every other original frame So this paper here caught my eyes from 2016 and it says the magical words 300 to 600 Hertz Or frames per second now, of course, it isn't really 300 frames a second. It's more like 10 Because they exclude a lot of pre-processing they run in low resolution And they and of course they their numbers there on the lowest quality presets, which isn't really good But I thought okay, maybe I can use it. They have reference code It's sort of unclearly licensed says GPL three to some places. This has for personal use only other places So I ended up implementing it not the least because I wanted it on the GPU My hope was that well, okay, if they can do 10 Hertz on the Jeep on the CPU Maybe I can do like 60 Hertz on the GPU. Let's try I'm going to go and give a very very very high level overview of how this algorithm works If you want to full details, you can probably read the paper, but it's fairly dense So we start off with a motion search. We start on lower solution as you might see We split the image into multiple overlapping 12 by 12 pixel blocks And then we go look in the other picture. Can we find something that looked like it? This is we start here in the middle as you can see like the blue square And then we have these squares here that show the algorithm consecutively homing in now If you do video compression, you'll already know motion search, but this is motion search with a different goal This is not motion search for just finding the most efficient motion vector. This is for finding real motion And we also care a lot about the subject of position. We don't want to stop at like half pixel or quarter pixel So this is a gradient to search this and search we started zero and then we try to figure out Okay, which way which direction am I going and eventually hopefully converge this works fine As long as we don't need to have motion that spans like many many pixels in this case It's one and a half pixel because we reduced the resolution and So after this we have. Oh, by the way, I'm not carded blue for a reason because when we have a million arrows It's very hard to see which way are them all are they all pointing? So instead of coding them as arrows We use fairly standard presentation where the hue is which direction it's going which is the angle and And the likeness is how strong is it? so that gives us now a bunch of Little squares these is a sparse flow field For each and every motion block in their overlapping First of all, we need to get up to the resolution of the image or this is this level So we do a process called densification Where we blend between all of these motion vectors or hypothesis based on how good where they for the single pixel Right now it looks sort of messy So we have a separate set called Variational refinement which is unfortunately I don't have time to talk about this is like two hours But essentially it sets up a huge non-linear differential equation which we solve on the GPU numerically It's not a smooth. It does some sort of edge preserve preserving smoothing, but it does a lot of other things At this point we have a flow field We could have scaled it up and used it for algorithm, but instead we now go to the next resolution We now use the double resolution and we use this flow field here as input for the motion search algorithm So again remember I said the motion search works well when it only has to move a little Well now it has to work move only little because it already had a pretty good estimate So this is our new motion search our new densification Our new variational refinement we go one step up in the resolution again Motion search and now you can see things like the disk is Starting to show here. The people are in the background densification variational refinement All in all this is about 200 passes on the GPU. It really gets a lot to do, but it loves parallel work And of course you can see it's not perfect So how good is it? And for this we have standardized benchmarks. Thankfully. This is the MPI simple benchmarks I think a lot of you will know the simple movie from the blender formation. Someone's re-rendered it with with motion With optical flow data. So this is our ground truth since it's a render We know exactly where every pixel is going And these are our estimates on four different quality levels You have a quality level one two three and four from the paper They vary in like what resolution do we stop on how many motion search iterations are we running? All this kind of thing and you see as as we're getting down here We have this EP which is the endpoint error, which is how much to the flow miss compared to the To the reference it's going down at least until we should like quality three quality four here And then of course it takes longer longer now these times here. I'm quoting six milliseconds on another GT on RTX 270 This is estimating forward flow from a to b estimating backward flow from b to a Computing the in-between flow like the halfway flow between them and doing the actual interpolation So even at 1080p we're like 12.9 milliseconds and that's under our 60 millisecond target Compared to CPU implementation, it's more than order magnitude faster depends a bit on your settings in your GPU, of course And it's a few percent better in In terms of endpoint error Now you can see it's still not perfect. It's still far from perfect, but does it matter? So we can now try taking just these two frames up here And see and just interpolate very very smoothly between them and the result looks like this These are just two frames Let's just look at it again. You can see there are some issues in here For instance, you can see it has problems with the pollution, but it did get the merely get that get the sword, right? The background motion here is hardly even this able and remember this is like 50x slowdowns. This is this is fairly good Let's look at the original example again now with with this algorithm It's not so bad, right? There are there are some artifacts, especially if you know what to look for But again, this is a hard sample It's time for the demo effect And I'm going to try to show the UI as best as I can now of course my laptop isn't really fast enough for this Because it doesn't have a nice GPU, but I I took some test data and lower a solution. You can see here This is real data. I'm giving out as part of this 12 hours of real multi-camera data because it turns out it's really really hard to find online actual multi-camera data So the way this works we have I've been moving around about some buttons around by way for the demo Just because it's hard to see anything at the bottom So we have a clip list Normally, I would use the keyboard for this but since it's easier to see what's happening with the mouse I Can click Q in And when I'm happy when they've been doing something interesting I can press Q out now I have a clip. This is just an instant in time. I Can preview it up here. Maybe I want to have a camera to Okay, maybe I wasn't completely happy about the start so I can scroll it here If I want to I have this little DJ controller here Which is much much nicer to work with It's basically super cheap. I think this is like a hundred euro if you want to buy professional remote It's something like a thousand two thousand euros And once I have done once I have my clips I can queue them up Q maybe Q it from both cameras and Now I can play it back Okay, maybe first wasn't so interesting, but it will fade nicely over to the second one At least once it's done. It was sort of long. I can edit these while I'm running if I feel it's Yeah, so it will play it out on t2b sockets will go nicely back Hopefully in reasonably high quality if you want to be really fancy you can turn off the speed lock here And you can pull this little slider to To ramp it down to very low speed Let me just play it again there and this Very dramatic So I guess that's basically the demo we have like media control over these these things here We have of course undo. There's a full manual if you want to go look at it So that's that's basically it now while I'm taking questions. I think maybe I have yeah five minutes for questions I'm just gonna play here some real world This is actual stuff from the the last event we did so this is an actual replay or replay operator doing actual mistakes in real times. This is pretty much what you can expect into X So any questions? Yes Wait for the microphone, please. Oh They have to change batteries Was the output of the of the application is support SDI or is it a stream or so? I don't think the microphone is working I'll just repeat it if the output is SDI. No, the output is a stream. It's a standard HTTP stream in a Matroska container Echoing back frames half of them are the mjpeg originals I try very hard to sort of lock in to the original frames when I can and half of them are these synthesized frames There I would probably add SDI output if I needed it It's not hard to do but generally the entire idea is you have this Nagero and you have a separate stop the operator And it just goes in there anyway. So why would I want SDI? But it's most certainly doable Hi, do you happen to have an API for the image interpolation? It's it's a class. It's a C++ class. It's fairly well contained You give it to raw frames and it gives you a back another frame or in form of texture on GPU So it's certainly reusable. It's gpl3 Would a read back kill it? It's I'm doing read back. I'm doing read back for the JPEG right, but I'm doing a synchronous read back Which is of course important when you're doing GPU But it's it's certainly possible Might ask you about it Other questions Did you have to do anything special for camera synchronization between the four cameras or you're just ingesting four SDI feeds and I'm just running Four SDI feeds now. This is a low-budget thing like I said And we don't have any clock synchronization because because the clock synchronizer would be like a mass clock would be like 500 euros anyway, and our cameras are HDMI in the first place We just convert into SDI. So we are doing some and I get a has internal frame cues for all four cameras It's trying fairly hard to keep these cues low or keep them short as much as it can without Risking too much frame drop. So essentially they're like a frame or two or three from each other It's not perfect, but it's either. I don't see any I can't see any visible difference between them. Thank you Other questions Yes the previous talk was about GPU optimization and since you know open CV, I think there is a Modular function to perform GPU optimized Optical flow, I think it's actually the same algorithm. Okay, but I didn't think they will see GPU optimized I think there is one but it may be in the country, but Okay, maybe I haven't actually looked in open CV per se. Okay on that. It was wondering Yeah, no, I don't I mean the problem is of course again every week. There's something new if I had this talk in five years I'm fairly certain I would stand here and talk about neural networks It's coming full speed now. I think everything state-of-the-art quality wise is supporting neural networks But it's still I mean the best things are 150 frames milliseconds. There's just too much for me So I just had to pick something right? It's eventually in a future version. Maybe I'll be switching I don't know. I'm not bound to this specific algorithm Even though I used literally a month to decode the variational refinement in the paper With all the strange transformation and typos and stuff Thank you. Hey, nice talk. Thanks. Um, you mentioned some artifacts that appear in the frame interpolation I just want to ask if you could describe some of those and if any of those artifacts are Located at the edge of the frame. I love that you didn't see them because they're playing now and there should be artifacts everywhere, right? Generally edges are not a big problem. The really hard part is occlusion when you have an object that moves right and the background is moving left or the other way around and a the optical flow Assumption really breaks down because well things didn't come from it always assumes things move from a to b But if it just came from behind something It's it didn't come from anywhere. So that it really breaks down motion blur Any kind of blur will also give you a problem because now your pixel doesn't go one place because half of the pixel is the background Half of the pixel is the foreground that are moving different place I think if I go all the way back here you can maybe see Some some of these issues but except that I don't think going backwards with an open office is With video is very easy I Think maybe Yeah, if you look at when he's moving his when he's moving his arm very vertically down So look for the place where he does like basically this And that's where you see it it gets sort of gets broken up But again, this is forex in 2x. It's very hard to see But I was also surprised that edges don't actually matter I do I unlike the original paper. I don't pad with black. I pad with with the edge Because that's what's the natural thing for the GPU is and maybe that helps a bit. Thanks We have one more minute for questions Really impressive For you, is this the end is it good enough or do you want to look at the newer papers this or other other things? you want to build into This software next first of all, I want to use it even more than that then we've already been doing Certainly I'm open to to adding you and more fancy algorithms I doubt I'll be I mean I used to every week just reload this Middlebury rating list over and over again I don't think I'll be doing that because it takes a lot of time I Think basically there will be features. There's a lot of things that I want to do and especially in terms of media control in terms of UI I want to be able to support more of these extra three workflows for instance, I mean a Lot of a lot of slow motion has it's very bound to have an evs s works But it's a very efficient workflow and everything needs to go by the second So every time I can shave off like half a second of the UI by making something simpler That's really the way I want to go for instance. I did Q in and Q out I talked to slow motion operator the other day. He said well, they never do Q out They just set you in start playing and then just abort it manually because then you don't need to wait for the clip to happen So so I think maybe in the UI department. This is first of all, but yeah, I would love to have users and see their feedback Also, by the way, we need the manual I've spent a fair amount of time trying to explain how the slow-mo work from an operators point How do you want to work with the the producer this kind of thing? That's question. Oh, no, okay You