 So, yeah, this talk is about how to do G-streamer on tiny devices. So, first question, who am I? My name is Olivier Kreit. I've been working on G-streamer at Collabra since 2007. At first, I spent a couple of years doing video calls, mostly the department of audio and video using G-streamer. I built a framework called Farstream, which is a series of G-streamer elements to do video calls for protocols like SIP, XMPP. As these got out of fashion, I moved on to more generic G-streamer for all kinds of devices, all kinds of industries from TVs, set-up boxes, but also very, very small devices. And some of these, you would think G-streamer is a big framework, it's a big PC thing, it will never fit in my device. So what kind of device am I talking about? When I say a tiny device, it's a device with not much flash storage, probably a relatively slow CPU, and not that much RAM. By not much flash storage, I mean just maybe a couple megabytes of space for the application. But first, I'll give a little overview of G-streamer. So who in this room knows what G-streamer is? Can I have some hands? Oh, all right, it's gonna be a quick overview. So G-streamer is a multimedia framework to build applications that process things like audio and video, probably in a way that's synchronized. There's a core framework which doesn't know anything about audio or video, and just processes timed data in a pipeline manner. And this uses a bunch of different plugins that implement different functionalities, different protocols that are underneath. And it has a very simple API with not that many entry points that is used by applications on top. Some of them are built-in GST launch, GST inspect, but most of them are obviously the user's applications. The plugins, there are a bunch of them included, but there are also some third-party ones that either people write their own applications that come from their hardware vendor or that they get from other places on the internet. So G-streamer is about pipelines, as I mentioned. Pipeline is a source which produces data. It has a sync element which consumes it, and it has filters that take it on one side and push it out on the other side. The elements are connected using pads, sourcing pads that source paths produce data and if it's consumed by sync pads, this is inspired by electrical systems or by other similar frameworks. So in applications, G-streamer, as I said, there's not that many entry points. They have basically only four types of entry points. There's methods, events, and queries which go from the application to the framework. Events are shoot and then you will get to reply from somewhere else or maybe not. Queries, they're blocking and you wait for a reply from them and messages are messages that come from the framework, from the pipeline to the application. They can be spontaneous because they tell you about some event that is unpredictable or they can happen as a reply to an action that you've taken. So for example, there's a message that says async done that happens once you've done an asynchronous day to change. So G-streamer is perfect for embedded. It has, it's a completely zero copy. We have a lot of different tools to allow you to transfer data between different bits without copying anything. So we have a complete mechanism for negotiations, not only of the formats, but also of the properties of the memory, of the alignment, maybe even of the type of memory. If you have more than one type of memory in your device, there is a reference counting-based system to handle the buffer lifetime. We have buffer pooling to not reallocate buffers. There's a whole, the whole system to make it relatively easy to work with devices that have hardware constraints. And it includes synchronization. So the framework does synchronization between the different parts so that you can have the right timing. And we have a pretty big load and load and loads of hardware-enabled plugins for almost any embedded hardware these days. And this means that it's really easy to build a system. And using GST launch, the command line tool, it's really easy to do a prototype of the pipelines you want to process. And even something a bit more complicated, it's very trivial to do in Python, for example, where we have bindings that are really simple. So I'm going to give a quick example of one project where we wanted to put G-streamer on a very simple device. This device is a security camera. It's pretty small. It has 16 megs of flash, and it has one gig of RAM. And there's an RTSP server, and it's an ARM V7. So it's kind of pretty standard device. There's so much RAM because they actually keep a ring buffer of video in the RAM. So the pipeline on this device is quite simple, the pipeline to implement. We want to capture, from this RTSP server, we can get a live feed from the camera. We want to capture it in clips of two seconds to upload them to Amazon S3. So to capture them, we do, we have an RTSP source element, which reads from the RTSP server locally. We DPLO the H264, we parse it, and then we send it to splitmux sync, which will wrap it into mp4 files, and then we have mp4 files we can upload. So it's very simple. So on PC, we did a very quick prototype, a simple JSC launch line. It took me 30 seconds, and I had something that worked. So I thought, all right, it's gonna be an easy project. So I thought then, well, maybe I can do the same thing on my device. So to do the same thing on the device, write a very simple shell script that just did the previous command. The problem that the whole G-streamer built, it's 287 megabytes, so that wouldn't fit. 82 megabytes of these are dynamic libraries, not plugins. So that's, like the first version is just to strip it, right? You just strip the libraries, and then it brings it out to 17 megabytes, which is more acceptable, but still doesn't fit. Next version is replace JSC launch by a small C program, which I've put here, but it's basically just a JSC launch in a little C. There's almost nothing in there. And a little make file to build this. So some for the tool chain, but there's basically just sitting in the right tool chain and the package config tab and linking it with lib tool. There's almost, almost nothing there. So the C program, the binary compile 13K, strip it to 5K. Like, oh, that's minuscule. It's amazing. Plus 17 megs in libraries. All right, but still doesn't fit. So yeah, all right. Solution, make a static build. So we only build in the parts of the libraries that we're actually using and other parts that we're not using. So using lib tool, use lib tool, static lib tool, lib's, that's kind of a trick to make it compile statically only the parts that have LA files and not the parts that don't. So, and then the sys root from the device as LA files can just delete them or remove them. And my server, which is a bit like Yocto but to build an SDK for Gstreamer. So this actually generates LA files. So using this, I can statically link the bits from Gstream but not the bits from the device. And then I get the binary that's 7.5 megs, stripped 1.5 megs. All right, that's still pretty good. I'm gonna work. Then I try it on the device. I put it on the flash, I run it. Damn it, I get an error. The plug-in's not there. So that's not good. We need to copy the plug-ins. And I'm missing all the plug-ins. So then I copy all the plug-ins, all of them 17 megs, and I'm gonna fit. So I use this command when I run it on the PC that tells me exactly which plug-ins it loaded. So then I know which files to copy. I can copy all of these files. So that adds 1.7 megabytes. So then I turn it on the device again. Oh, why are the plug-ins not loaded? Yeah, because they depend on the libraries. So the plug-ins are not static, it doesn't work. I'm back to re-adding my 17 megabytes. Still not good enough. So next, we have to statically link all the plug-ins so that we can statically link in the library and have a nice static build with everything needed. I'm missing any bits. So in G-streamer, to statically link plug-ins, first we have to declare the function, one function per plug-in using the first macro in the C file. And then just after GSTinit, we call the second macro, which registers the plug-ins. So this is only done for static plug-ins because for dynamic ones, GSTinit will actually iterate the files and register them for you. But since they're not in separate files, we have to do it statically. And then we link with Mandus Big L towards the path where the plug-ins are. And we have to then link in each one, each library separately because there's nothing to actually guess them in this case. So static build, 28 megabytes, stripped down to five. That works on a device, that actually works, but it's so big. It's five megabytes, we could do less. So now we have the compiler for help. Compile with OS, because this is supposed to make smaller binaries. And it doesn't do anything. Like, they're proven what was marginal. So well, too bad. Next step is to strip all the function that are not needed because in there, we have a lot of crap that we're not using. Includes a whole of glib, for example. There's a lot of functions that are never called, so we can build everything, the whole distribution SDK, with f function sections, f data sections, that will make it create one object file, one .o file for each function and each data object, which means that we can link exactly the ones we need and none that are not used. And then we link, I'm linking, we pass this argument to the linker and that will make it remove all of the objects that are not actually called anywhere. So that brings it down to four megabytes, which is pretty good. Next, we're wondering how can we make it even smaller? So we basically wanna find which .o files contribute to the size, let's see. So we want to, what I had done, what I did this project was to take the output of the linker, read it with object dump, using the debug symbols, find what's in there and then find the .o files and look at their size, wrote a little Python script and then to know which of these symbols actually takes space and where I can focus my optimization, maybe remove some steps from the pipeline, maybe some do something clever, maybe something really nasty. Then I realized there's a tool called the bloat.tmgloadface from some guys at Google that is basically the definition of bloat. It's a couple thousands of lines of C++ code and when you do make it, download like 300 megs of stuff and protobuf and everything that I download the entire Google internet. But once you've downloaded this 300 megabyte thing and you spend like half an hour compiling it, you have this tool that in many cases works and gives you exactly which symbols take space in your binary, so it's quite convenient. Doing that, I find out that glib is big and glib has a bunch of bits that don't get removed when you do a static compilation or you don't use them. Two things that I really found out. First, glib has internal plugins that are called GIO extension points that are always registered when glib starts and these, so it means that they actually call the functions which have references to basically every function in the extension point, which means that they, even though you never call them, they get linked in. So I did a little very nasty patch to actually just remove all of these that I don't care. I removed gsettings, gdbus, gappinfo, applications, notifications, all these things I don't care about, right? These are not used in this case. So that strips it out to 3.8 megabytes. The other area that I would really like to remove is all of the UTF-8 stuff for this kind of use case because there are giant tables that take over a megabyte and that don't compress well. And then that's basically all I could get to in the allocated time. So it wasn't still fitting in the disk. Remember, this is a special device because there's not so much disk, but there's a lot of RAM. So you just compress the binary, use upx, very nice little tool, open source, compresses down to two megabytes. It fits, I'm happy. And this was basically how we got G-sumer from 300 megs to two megs. For our use case, and this is shipping in probably hundreds of thousands of security cameras that you buy from some Tesco or something. What more steps could make it even smaller? So I mentioned the UTF-8 tables in G-Lib. I would really like to remove them somehow, at least for this kind of use case. And then with the various tools, we can actually probably dig down. I'm pretty sure there's a couple more that can be extracted out that made smaller or removed a bunch of things we're not really using. So this is basically all the key steps we took to make G-sumer fit in our tiny, tiny project. So conclusion, G-sumer is not just for the desktop. We know it's used in a bunch of big embedded devices. If you saw my earlier talk today, I mentioned it's used in space, it's used in airplanes, entertainment systems, it's used in cars, it's used in TVs, it's used in center boxes. I mean, embedded devices that do video, it's pretty likely to use G-sumer these days somehow. But it's only for these, how it's called bigger embedded devices. It's different usable in the kind of smaller ones where you don't have so much space. So basically, that's a core of my message. Any questions? Yes, because these devices cost 30 euros at Costco. Yes, but the reason they have so much RAM is that they have a feature that requires it. They store all the video in the RAM so that you can survive, so that you can get the video from back in time. So they don't always record. They have some imaginalizes in it and when something happens, they will record the previous X minutes of video because the imaginalizes is not very good. So if something happens, they say, here's the video from the last two minutes because this is the burglar entering and then he's walking around the house for two minutes and then we realize that there was something in the house. That's why. It's surprisingly common in these devices. It's Chinese devices. Any other questions? Can you speak louder? Sorry. No. Sadly. So, sorry. Yes, the question is, is there any single flags to use to eliminate the desktop dependencies? Sadly, there isn't, but the horrible patch I made is available there and these slides will be on the FOSDEM website. As soon as I get enough internet to upload them. So the question is, would it make sense to just have a fork of Glib that would have only needed things for Gstreamer? I don't want a fork Glib. I don't think anyone wants a fork Glib and the Glib maintainers are quite reasonable, so I'm sure if we had some patches to do this cleanly, they would take them. But also, these aren't, I've removed functions that are used by other bits of Gstreamer. So I really focus on functions that are only used by this very specific pipeline for this device. So this is why, for example, I don't know what, G application or things like that might be used somewhere else in Gstreamer. There might be even stuff that uses Dbuss, I don't know. Because there's a bunch of plug-ins, right, that do things. Like that. I'm still playing to write the Dbuss thing. Thank you very much and I guess, have a good evening. Good evening. Oh yeah, and I forgot. Collaborize hiring. We have a bunch of job postings on the career boards on our website called elastashcareers and we're looking for people in all kinds of domains. Thank you.