 Hello everyone. Thanks for attending. My name is George. I'm a Principal Software Engineer at Colabora. I've been working on multimedia things for the past decade. I've been working on Gistrumer and the last four or five years I've been working on pipe wire and specifically I've implemented... I've architected and implemented the wire plumber session manager which works together with pipe wire. I'll tell you about that a bit later. So let's talk about cameras, shall we? Cameras are really fascinating devices. They have some mechanism, capture light and takes that into a sensor and transforms it into a digital picture. Amazing. How does that work actually? So on the hardware level there's a bunch of stuff. There's a lens that takes light in. It directs it into a sensor behind the scenes and then this sensor generates electrical signals. They go to a digital converter, they get converted, they pass through some processing to reconstruct the picture. The picture is then compressed and finally it gets to the CPU to the host system where we as software developers working on that part of the stack, we get a picture to work with it. Now I know this is all very simplistic description. I'm not a camera architect. I'm working on the software stack, on the CPU side, so on the cream box over there. Now this is a traditional form of a camera. However in the recent years we've started doing cameras which look more like this. So there's no longer just one sensor. We have three, four, five, six different sensors over there, different lenses, different sensor properties and what happens is all these have their own pipeline, processing pipeline and somewhere down the road these images need to be combined. We need to reconstruct the final view of the real world which is being measured by different sensors and different lenses. Then we want to provide something to the user space which can be different depending on the use case. So really do you want to capture a photograph or do you want to capture a video or do you want to capture just a preview for the screen? These are very different use cases and every time you ask for one of those the camera actually needs to be configured differently. Now as I said all these sensors capture different things and to combine them there are some high computational requirements. We need to run some algorithms. These algorithms may be running on a dedicated processor or on the host CPU to save some cost maybe. There's a blurry boundary between those. Depending on the camera, on the vendor people do different things. Nowadays we also started seeing also AI processing which I suspect it's combined somewhere in there. I haven't really looked it up yet. It's new development. So basically all these images from all these different sensors they go back and forth in a processing pipeline. There are different blocks that process that two different things. So they come from the sensor they go to the integrated signal processor. They go to the CPU. They get combined. They go back to a signal processor back and forth all the time. And to orchestrate that we basically need some kind of software that manages all of that. And that software on Linux is Lip Camera. So that Lip Camera provides all of that. It manages all the devices that are available on the system. All the sensors of every device. It builds all these processing pipelines. It has the ability to run algorithms and also proprietary algorithms in a sandbox environment. It has all the device agnostic device specific components which may be provided by a vendor as well. And it provides an abstract API to the user space. So the user space can then go and ask Lip Camera please give me a video stream or a stream to take a photograph or something like that. All of that's very nice. I am not an expert on Lip Camera myself. If you want if you have questions about that or if you want to learn more about the camera there's a talk tomorrow by Kiran who is one of the authors and you can watch that and ask him questions. Now the point I'm trying to make while telling you all of that is okay we have cameras with multiple sensors that we somehow manage a pipeline that transforms all these images that are captured into a final stream. But what if we have multiple devices which are totally separate devices in separate places of the system. Think of a car for example which might have cameras in various places all over around. And also another thing is can we while we combine things from different devices can we separate the processing into smaller blocks because as you accumulate things in one big block you could write an application that you know opens all devices captures data and then does some processing. But as this complexity accumulates in one place it gets really hard to maintain gets really hard to develop and so an idea is to split all these things into separate blocks and make them separate processes and even better separate them in different containers so that you have also secured blocks that do a small part of the processing. And I think we really need that as complexity increases we need to have this kind of division of work and also the versatility the ability to change these components while keeping the rest intact. I think it's more much more maintainable in the long run. And on that topic comes pipe wire. So pipe wire is basically a multimedia bus you can think of it as a multimedia bus. It allows any form of media being video, be it audio to be transported from one place in your system to another. It can come from a device and it can go to process A and go to process B and go to another device back and forth around the system. And so by providing this feature it allows you basically to build a processing graph which is split across processes in your system. While at the same time this process gets to share the resources so the all the memory that is being used for transporting the buffers is shared, is reused. There is support also for DMA buff for hardware buffers. All this is done with very low latency. There is no there is no big overhead for transporting this around the system. And also pipe wire itself has very low resource consumption. It doesn't take much memory or CPU to do that. And obviously by allowing you to build a graph, a processing graph with blocks just like lip camera, it also needs something that will manage all this graph. It will put all the blocks in their place and connect them together and allow media to flow from one place to the other. And that is what wireplumber does, the project that I'm working on. It's a scriptable component that allows you to discover all the nodes that are existing in the system, all the devices, all the applications that have opened and as soon as it discovers them it connects them together based on some logic that you can define in the configuration. In a picture it looks like this let's say. So you can see how, for example, media can be coming from the hardware. I have two hardware cameras over there in this picture. They go through lip camera. Now lip camera has its own internal pipeline handler, processing algorithms, device agnostic code and all of that is contained in the library. And then there is pipe wire, which is a demon. It uses lip camera to open these devices so it provides processing nodes that represent these devices. And it allows you to link them to any other node. So you see that on top we have some applications, some processes, which for the sake of demonstration here I have also put them to be in separate containers. They can be, they don't have to be. It's your call in your application design. Some examples I have put there is a network camera receiver, so something that receives an RTP stream from somewhere else and provides it as a local device. It appears as if it was a local device. It's just another node just like the other ones. You could have some processing software that has input and output and connects to a device, takes some input, makes some processing and then sends it to another node. So then you can have it. A great viewer, for example, that takes these images, combines them somehow and gives you a preview on the screen or maybe a recorder application, which its entire task is just to encode and store to the file system. And talking about encoding, one more thing that is not really there yet. It's not into pipe wire, but it's something that I've been thinking about and it maybe is a good idea to start looking into it in the future is to allow hardware, other hardware resources like hardware encoders and hardware decoders to be also represented as nodes into pipe wire. So then you could have a processing block that looks like this where you have input coming from a camera. The buffers are transported through pipe wire to a hardware encoder that will encode and give you the encoded stream so that the camera recorder application really doesn't have to access the hardware encoder directly. It will just receive a stream with encoded data. Now, again, this is not something that exists, but I want to look into that. It's something interesting. And the same thing would work for the decoder. So think about, for example, I know Android uses a similar architecture where if you want to play something, it goes into a basically a different process where it gets decoded and displayed. And this is something that pipe wire could also do and enable for mobile devices and other kinds of devices. So this can have many applications. Think of automotive where you have a lot of cameras around the car sensors, possibly doing some processing. Also think about cloud processing applications. There are streams coming in from user devices that are gathered in a data center where you have lots and lots and lots of docker containers. And you need to get all this data through a pipeline of containerized applications until they reach their final destination. Mobile devices like phones, tablets, tv sets, cameras, whatever. You name it. Many, many more applications. I think there is a lot of potential for applications to use this kind of architecture to be able to, as I said before, to separate the work inside the system and make components which are more flexible, more versatile. Of course, let's not forget audio. I've been talking about video based on camera input, but pipe wire is also used for audio. It can transport audio. It's now the default audio daemon on the Linux desktop, replacing the previous daemons that were there, also audio and jack. And it has also been deployed on some devices like the Steam Deck, for example. It has a very nice Bluetooth audio infrastructure, so much better than what was there in Pulse Audio. Works really well. And all these complex audio graphs are actually made possible through pipe wire in the desktop right now. And we are seeing more and more people interested to apply this to their products, their devices. So yeah. Now, all of that is best described with a demo. So I'll attempt to show you something. So in order to demonstrate, what I want to demonstrate here is capturing something from, don't look at the script yet, capturing something from a camera and then taking it through a filter and then taking it to an application which does the rendering, being three separate processes. Now, every process that I'm going to launch is going to be a GSTimmer pipeline. So I'm using GSTimmer and GSTlaunch as a tool to help me build the individual applications. Obviously, I should mention that pipe wire is not meant to be a replacement for GSTimmer. GSTimmer is still a great tool to build applications, to build these small blocks. So here I have a script which runs GSTlaunch and basically receives input from pipe wire. It has launched the pipe wire source element. It takes it through some processing. The main processing element here is the face detect. So it's going to detect my face when it appears in the camera. And then it goes out again back to pipe wire through pipe wire sync. And then I have another script which is going to just receive a stream from pipe wire and render it to a window. So let's launch that. Let's see. So that's the face detection application. It doesn't have any graphical output. Obviously, it's just taking pictures from pipe wire and puts them back into pipe wire. And that's the output window. So that's the script that receives something from pipe wire and renders it. It's not very nice. Why? I don't know. Obviously, nothing is perfect. I think it's a bit confused because this is an XV image sync, an X window. It's a bit confused with the projector or something like that. So how do we know what is going on? There is this tool which was showing up before on my screen, which can show you the graph right now, what's going on in pipe wire. So I have my built-in front camera, which is a node that uses lib camera to open my camera right now. I have also the plugin that uses video for Linux here. So it's also available, but I'm not using it. I'm using the lib camera one. It goes into face detect and then goes out and goes to the output window. That's one part. Now, we're talking about cameras and camera input, but it doesn't necessarily have to be a camera, right? We could also do playback. So here I have another process which basically plays a file and it just, instead of rendering it to the screen, it renders it to pipe wire. So there's a node being created and that gives me the video stream as it is being played out. And then I have another script which is very, very similar to the previous one. It's just something that plays, something that displays the output. So video source playing and video output. It's going to show up. Yeah, this is the movie. Let me mute it. So again, I have separation. I have the rendering, the decoding process which plays back and the process that renders it are different things. Obviously, the interesting thing that I can do here is I can launch multiple renders that show the same thing. So if I have this like up here, let me resize it. Go to another 10 mile window. I can just launch the same script and it plays the same output synchronized since it's being streamed by the other process. Right. So what do we have here? We have, obviously there is some audio being played in the background and we have this process which is the video decoder and player and two consumers which render it. And in a similar fashion, we can like combine these things. So I can launch face detect and the video source and then combine them. Let's see how that looks. Yeah, video source then face detect. So I could like combine them. Now there is the third process there which does this compositing. So it displays a window and runs the compositor element which renders this picture in picture. Obviously, I can start more of these and they will render the same thing and the pipeline now looks like this. That's a bit more complicated like I have. Where is the other one here? Let the audio be down. So yeah, front camera going to face detect. Then it goes to the two renderers, the picture in picture inputs from the two separate processes. And the other one, the GST launch here is the video player. And again, the two renderers background inputs for the two different windows that I have. And obviously, I can split this more because now I have two processes, two windows here, like that one and that one, which basically both compose the images into the window. I could also run a separate compositor like the composites and then it provides an output to pipe wire so that these two windows would just render without compositing. So let's see how that would look like. I could run this composite SH. Yeah, so that's another background application now that takes inputs from this and it has an output, not connected yet. And then I can run this composite output, which renders the result. And now I can go ahead and launch more of these. So there's another window. And now these two windows are not doing the compositing themselves. It's being done by a single process here. So there is a shared workload. It's composited once and then rendered twice. And what about CPU? Let's see. That won't be good. So it doesn't look good for GSTrimmer. GSTrimmer is really, really, really taking a lot of CPU here doing all this decoding. And also the, I think the one with the highest CPU is the compositor, which does the picture in picture thing, since it does it all on CPU. And the video is huge. It's full HD. But how about pipe wire? So how about pipe wire? Pipe wire is using 0.08% of my CPU. So it's literally nothing for what it's doing. And only 50 megabytes of RAM. That's nothing for what it's doing. And wire plumber, again, wire plumber is idle. It connects the graph when the process appears, but then it's idle. It doesn't do anything else. So yeah, that's it. Let me stop all of these. Come on. I can start it from here. So next steps. I think that's your call mainly. On my side and my team's side and on pipe wire upstream in general, what we do is we develop these mechanisms, these tools for them to be there, for them to be available so that you can build great applications on top of them. I am not developing an application, so I'm developing the session manager and the consumer beats around it and all of that stuff, bug fixing pipe wire and so on. But there are great applications that could be built on top of this architecture. And it really provides, I think it makes some things really simple to solve some complex problems. They become really simple when you have the ability to transport media from one process to the other and share devices, access devices from multiple processes and so on. So yeah, that's it. Thank you very much for attending. And I'm open to questions. Yes. Right. So the question is, what can pipe wire do basically to provide synchronization between audio and video that's being captured? So pipe wire doesn't do much for synchronization itself. It is mainly a transport mechanism. But everything that goes through pipe wire is basically being transported as a live stream. So if you're capturing something from a camera, it's being transported live. There is a bit of a latency always when you process something, but the good thing with pipe wire is that this latency is fixed. So you know beforehand how much time a frame takes to go from point A to point B. So eventually what you can do to synchronize is in your application eventually gather all these frames and knowing the latency that pipeline has apply the necessary clock synchronization to synchronize those. Obviously you can get timestamps from the source. So timestamps can be transported. They can be applied and transported to the application. But other than that, yeah, pipe wire doesn't really synchronize things. It doesn't have it has a clock, but it's not for synchronizing audio and video or two videos together. It's the clock is mainly to give the pace at which frames go from one point to another point. Knowing the frame rate, it will just, you know, go every once in a while and move one more frame from the source to the consumer. Other question? Yes? Right. So the question is, are buffers provided by pipe wire itself? Or is it something that the producer and consumer need to allocate and negotiate and so on? The answer is, yes, they are provided by pipe wire. So as soon as you start a stream pipe wire allocates a set of buffers for your stream, it is a fixed amount of buffers. It doesn't change so that the latency is also fixed. And you can obviously influence that from the application if you want different set of buffers, different latency, or if you want to do to do DMA buff sharing, for example, you can also request that or you can provide, you could get buffers which don't really have a buffer, but then you could attach the file descriptor on them since they are DMA buff based. But it's always a set of fixed amount of buffers, which are always shared between the between two nodes, which are linked together. Okay, I have three hands. Yeah, please go ahead. Right. So the question is, while pipe wire basically does zero copy, transferring buffer from one node to another, how does that work when you have multiple consumers where both need to get access to the same buffer eventually? The answer is they get the same buffer. They get access to the same piece of memory in a read-only fashion because if they want to process it, then they have to copy it somewhere else. But since they, when they are receiving something, they are receiving it for reading, right? So they can then copy it somewhere else if they need to. The question is, if one consumer stalls, do the others have to wait? Is it no? The answer is no. They don't have to wait. There is no, I mean, the locking mechanism is implemented in the DEMON in the server. The server, the DEMON, I'm talking to the DEMON as a server, the DEMON basically assigns these buffers to this process and gives them a time slot, which is called the quantum. So they have this much time to actually get the data. If they don't get it, it's their fault. They get, they don't get access to the buffer anymore. The buffer is returned back to the pool and yeah, charges life. So you need to tune these things. It's not, it can go wrong. If you try to have very low latency and give very small amount of time to process, it can go wrong. You can have bad results. It needs to be tuned if you have this kind of things. Yes, next question. Yeah, so the question is, how does format negotiation work? Because my script cheated a little bit and had the caps hardcoded. And the answer is, yeah, I'm not sure yet. It's something that is actively developed. I mean, there is some basic format negotiation. There are some, some of the scripts that I showed did not have caps hardcoded. So basically, they get the caps of the producer, whatever this is. But the camera currently is being started with its best possible resolution and frame rate. We cannot request something else from pipe wire. So that mechanism is not there yet. We are, we are thinking about it how exactly it's going to be. So yeah. And the hardcoded caps that I had were basically for ensuring that this process connects to the camera and not to, let's say, an audio source because there was no nothing else that would ensure that. And because pipe wire source and pipe wire sink into streamer, they can do both audio and video. There was, yes. Yeah. So the question is, how do we, how can we manage more complex pipelines? How do we program things to be linked together? And the answer is that you can do that through wire plumber and it's a scriptable configuration, which I did not show here. What I did was that I, in every pipe wire source that I launched, I was passing a property called node.target, which specifies the target node that I want to link to. And this property acts as a hint to wire plumber. So then wire plumber takes that and says, okay, do I have a node with this name? And it looks up the graph and if it finds a node with that name, then it makes a link. But it's a hint. It doesn't have to be that way. Wire plumber can act differently if necessary. And there are scripts that can do this kind of linking source scripts. They are written in Lua and currently they are pretty complex, but it's there. There is a bunch of logic here written in Lua that can do all this routing. It can discover nodes that can take their properties and see if they can be linked or not. I'm out of time. So thank you very much for attending. And if there is anything else, you can catch me later and talk to me. Thank you.