 Can you click back to here? Does that work? It seems to. Okay, yeah, I'm talking about writing GStreamer plugins. Hang on. What's going on here? There we go. Okay, usually when I talk about GStreamer, I start with this question, which can take a long time. But today, just rewind an hour, and you'll get the answer. Okay, so what I'll do... I was to talk about what the experience of writing a GStreamer plugin is like. But I found that quite hard, so I'm actually just going to... I'm going to go through some examples and talk about what it was like along the way. Now, one of the things I did... I'm an artist. I make computer programs that make art. And the first GStreamer plugin I wrote was... Now, it had a camera that looked at this and worked out a transformation to a projection. And then it would project the negative image onto the... what it saw in front of it so that whatever was on the wall would be eradicated by the projection. And so if you walked in front of it, depending on the parallax between the camera and the projection, you'd sort of seem to disappear. And so in the exhibition, what I did is I had two projectors and one camera. And so the camera was looking at the screen. And then there were two projectors. You've got two here, but in that case, they're both pointing at the same rectangle. And they're both the Sparrow plugin, which is the one where it says Sparrows. It would work out for each projector exactly where it was in the camera's view. So then when something went in front of it, they'd both try and eradicate it. And then they were both trying to project Sparrows onto the projection, which was much lower than once so people would work. They'd be fighting each other to control the screen. And if one of them put a Sparrow on it, another one didn't want it there. It would be putting a negative image. And in some places, they could agree, like if they both agreed that part of the screen should be dark, they'd leave it back. But in other places, there was swirly feedback. And then when people went in front of it, and it would lead into a feedback loop. Now, I'll try and do a demo. This isn't going to work because there's only one projector. And each window... So each one chooses its own color to project these lines that can work out the corners. Then it will try and project the negative image of what it thinks, where it thinks its window is when it starts going. Possibly it's not going to go because possibly I have a point in the camera right, or it's not bright enough for it to see. Okay, so I'll forget about that. You really need the property card we set up. So what that was doing is working out like a... Those lines would work out a grid of where that could map the camera image onto the projection. Now, this is a pretty straightforward plug-in from the G-streamer point of view. It's just... Subclass is a video filter. So there's things that are set up. If you're making something that takes video and inputs video out, a lot of stuff is looked after for you. So there's a lot of boilerplate you have to use. Well, before G-streamer, it's not a lot, but it's G-Lib and G-Objects and all that kind of stuff. And you can concentrate on your own little bit. Now, for a long time, I spent ages trying to work out a protocol for the two projections, negotiate which color they'd use, and for one of them to... They'd flesh things because they need to load up the sparrow images. They'd flesh things to try and communicate who was going to pick them up. Then I realized that they're actually both the same code and the same shared object, which is what a G-streamer plug-in is. So, in fact, they just go using Mutex doing it behind the back. Anyway, by the way, in G-streamer language, a plug-in refers to the library, whereas I'm using it to refer to the bit, which in G-streamer language is called an element. There's all these kind of things where it's a little bit off from what I expect, but it's okay. Right, now, this is another... This I made two years ago. Recur is a video thing that watches TV and learns how the video moves and learns to associate the way that the video moves with the sound that is happening at the same time. So, it should say it sees lots of explosions that will associate the sound of an explosion with the appearance of an explosion. And if it hears lots of talking and they are talking at the same time, it should associate those together. It uses a recurrent neural network which I'm actually... I'm talking about recurrent neural networks on Friday and I'm going to do a live demo of it. So, I'm not doing it now because I don't want to do all my demos. Actually, there's not many of you, but... So, in training, it takes... It watches a video... It watches a... I've drawn a file. It sends file source. Video for Linux source. And they both have to go through this element that has a video and an audio thing. And that seemed quite tricky. What you have to do is you have to write a... I was just doing my slides before. So, they're a bit out of order from how I'm talking. Here we go. Yeah, you have to write a manager which is like a wrapper around an audio filter and a video filter. And then the manager has to synchronize the two of them. So, the tricky thing in doing this is everything in GStream is all kind of nice and tidy in terms of time. So, everything's got a timestamp. So, which is good. But then, like, the audio is coming through in little chunks, just really odd-sized little chunks, depending on your source. And the video comes through a frame at a time. But if you're trying to associate this audio that comes through this video at a particular time, you have to wait for the audio to fill up for the whole frame. And then you can process that frame. That image isn't very clever, but... I was trying. The picture on the left is trying to show that the audio and the video, they tangle together, put me a model, and then the video drips out of it, which is kind of unrelated. It's not derived directly from the video that comes in or the audio that comes in. Whereas the audio that comes in is just also passed through because you have to pass something through. So, yeah, then in the exhibition, I was trained up to associate video with audio to narrate its own video or to model how video moved. And the exhibition, the idea was that there'd be a microphone hidden in the room. We wouldn't know. And the noises that people made would form as... would cause associations to be made in the model that would make new video. The noise you made would shape the video that came out. An actual fact, when it came down to the exhibition, the exhibition, there's a hard deadline. There's, you know, 50 people working on it and the mayor's going to come and open it that night. And they couldn't get the sound working. So there was no sound coming in. And it was just... All this work I did was a bit pointless. I could have just made the video filter. Anyway. And now, those are art ones. And I also worked doing science, science-y stuff, helping scientists. And I also used G-Sprim for that. One of the things I've done is an audio classifier which has the same recurrent neural network core as the previous example. Now, with that, you need to train it up. You play the file and you tell it what it's listening to. So one of the things we did, we played a speech on the radio and told them what language it was in. Because the radio stations in New Zealand, some of them are funded to speak Maori for a certain percentage of the time. And if they're not speaking that much, they need the funding cut off. And so there's money in detecting whether they're speaking enough. And we played the lots of radio. We played the machine lots of radio and told it what language was on it at the moment. And it learned to just spit out the answer. And now, the way to train it up, if you give it one file at a time, which is sort of the natural thing to do, because people speak the same language for minutes, it'll be the same answer for minutes. So it'll think the answer's always over this side and then suddenly it'll think the answer's always over that side. It goes right to one extreme then to the other. And what you need is it always to be being pulled both ways at once. And if you're training an ordinary neural network, you can use stochastic gradient descent. The stochastic thing means you just give it examples in random order. But with a recurrent thing, they're not in random order because it's all about the order. I mean, it's recurrent. So I had to train it on hundreds of files at the same time to average out all those effects. And I started off trying to make an element that would take an odd number, any number of audio files and audio streams, sorry, and it just turned out to be complicated. And then I realized that I could just make it take in one audio stream of 500 channel audio and use the built-in interleave plugin which does that mixing down all their ones into one. And that's what one of the lessons I've learned is to trust as much as I can the plugins that exist. So with 500 files, my recurrent neural network uses one of the CPUs on this old laptop. And the rest of it, the 500 threads reading from WAV files interleave, they don't even use the 70% of the other one. So it's kind of like no, there's no overhead in having gazillions of files. And then, and when it's classifying, it just does one folder at a time. I can do a demo of that. But I won't do the language one till I do the sound. This is listening to a recording of the bush in the Romantaka forest part. When it hears the Kiwi, it goes over that side. And the Kiwi calls for about 20 seconds. Can you hear it? And then it goes back. See if I can make it play another one. And that works at about 1,500 times real speed from a WAV file. We can do 1,500 hours of recordings in one hour, which is pretty good. So the development cycle of everything I do, actually, GStream quite often makes me cross. It says things like, you know, if something doesn't work, it says reason, not linked. And then you look back through the logs. And this tells you a lot of things, but it never quite tells you, never quite tells me exactly why it wasn't linked. Well, at least I think it's not, I think it's not. And then, so, I spend a lot of time looking at it and I'm about to go on the ISC channel, GStreamer and say, this isn't working. And then I get there and I think I have to put this question in GStreamer terms that they'll understand. And I kind of compose it and then I think, I haven't tried this because once you try putting it in their language. And then I try the thing that, you know, I'm just doing to cover the first question that I'm not going to get, and it works. So, and then I'm happy again. The things that, once it's going, it's really good. It just, like the exhibition with the video machine, that didn't use the sound but was trying to, that ran for four months. And, you know, I just turned it on. I was screwed to the ceiling. There was no keyboard or internet connection or anything. And it was in a different city from me and it ran for three and a half months and the disc died. But the GStreamer kept on going after the disc died. The documentation doesn't always make me happy. Like, this is a documentation page. It says, read this first. But you don't want to read that first. Usually, well, if you're making a plug-in, you read the plug-in writer's guide. But the thing that I, when I'm making a pipeline, the page I always go to is this one, which is the overview of all plug-ins. And so you've got to jump all about to find it. And then, there it is. And you kind of find the plug-in that sounds like the thing that you want. And then, there's no way back to this page because this is generated by GTKDoc. And from GStreamer's point of view, everything belongs in a, they have the good plug-ins and they have the bad plug-ins and the ugly plug-ins and the base plug-ins. And from the documentation point of view, the top of the tree is, this is the good plug-ins documentation tree. When you don't care whether they think it's something good, you just want to know all the plug-ins you can use. And there's just no way to get back out to the rest of the plug-ins. But once you get there, usually you get something good like this. And this interleave has got quite a lot of words that are probably saying useful things. Long, long examples. And how much time am I using? I'll probably run out of things to say. That's okay. You can get more or less the same stuff with this. Jan was actually talking a lot about a lot of these things. You can get a picture of what your pipeline looks like. That's quite useful. And the GST debug thing, he was talking about that. Now the trouble with, like the GST debug viewer, it's really good. But they've sort of brought it a little bit into the fold now. But for a long time it was in some weird, it wasn't mentioned on the G-Streamer website. It was kind of like, you only heard about it through G-Streamer law. Like if you're hung out on the ISHN or you're in the mailing list. And a lot of the documentation, a lot of the knowledge involved in writing G-Streamer plugins and using G-Streamer is just kind of in a whole lot of people's heads who have been doing it for a decade. And it's hard for them to see exactly how to communicate to the newcomers. Like the mailing list is frequently asked questions. Except every question is just slightly different. So they can't just write it down. There's cool concepts which are always being addressed. And it's sort of impossible for anyone to... You have to not comprehend them to understand that you need to comprehend them. And then once you comprehend them you don't go back. And so there should be people like me who are doing the documentation patching. Except when I'm in the middle of it trying to write a plugin or finish the pipeline and there's five hours to go to the deadline and stuff like that. I feel like going through the process and then afterwards I don't care. And I've done a little bit. The plugin writers guide is quite useful. You can see I've looked at quite a lot of it. But some of it looks like this. Good. It was useful. I've talked about that. And another... This is... From the outside a plugin is all about making it easy. So that... It's easy for Python to work out what's going on. And that means... It makes it correspondingly harder on the inside. There's all the G object stuff. You go through a lot of hoops to set a number. On... This is just a patch. Just adding one little option on one of my plugins. And that's not even actually doing anything. Like where it says xxx down the bottom. That's where you actually have to do stuff to... Sometimes you have to do a lot of stuff because if things happen in the wrong order you need to save up your properties and stuff like that. But then when you come to the outside you do this funny stuff. You don't understand all these things beginning with gr. And like, you know, the G parameter spec float... You write the same string twice and you don't really know why. But then when you... You don't have to do anything for it to turn up when you're using Python or whatever language as a GStreamer plugin. So you kind of pay while you're writing the plugin with some obscure stuff that... I don't understand. It's not like writing C, it's like writing... It's like writing the... G objects and stuff. And that's... Questions? And C. I think you pretty much... No, you can do... You can do it in Python, but you could do it with audio I think would be feasible, but I don't. You can write some plugins in Python. It depends what you want to do. There are limitations in the way that our plugins are written that make some operations hard to do. Yeah. I have written Python plugins for some time. I would say the question was what language was I using and what languages can you use. And the answer was you can do some stuff in Python. In terms of writing the plugins, have you found that it was more easier for one type of thing, so like the image processing or is there one part that could be improved on the most or what are your thoughts when writing them? So the video is nicer in a way than the audio for the things I do. It's all contingent on what you're doing. But with the video, you can guarantee to get a frame at a time, which is what you want. With the audio, if you're doing something analyzing it, you need a certain chunk size, but it comes through in its own little sizes, which I don't... Maybe there's a plugin that will do that to it. But, yeah, so video is easier in a way. All right, so if we've got no further questions, thank you very much for your presentation. Thank you.