 Yeah, so I'm always honored to be here last year I was standing pretty much in the same spot and I talked about what we'll finally do when we get around to releasing an orator 3 8 So we got around to releasing narrator with 3 8 Turns out that's awesome. Also turns out. That's definitely not the end of the journey So what we're gonna do in the next minutes It's gonna like take a very very short look back at what we didn't touch in 3 8 Then see What the challenges for the current schedule are in the like very narrow sense the things that we see right now and then I hope we can project a bit about what has to be solved in the You know larger scheme of things and how we started to actually tackle that because honestly I've been at least at five panels where we discussed what awesome things should be done to the scheduler and Yeah, it's time. We actually do want a few of these so I wish my presenter was working so Who am I I'm gonna do this very quickly. I'm still a research assistant at Cal. We got a new logo. I Teach a couple of the exercise classes That might be the case because I'm kind of 25% of an institute and Yeah, we are like on one hand. We're still looking for a PhD candidates. So If you're looking for a job, you know apply a Few of my other roles include, you know Supporting at us and supporting their customers by supporting them with more grumpiness I'm also a freelance engineer but as you can imagine like with a full-time job here and a part-time job there and a Happy fun time nighttime job as chief architect of the radio project that might come a bit short here So What is the state of new radio, I'm obviously not Putting up a picture of you know radio which started as a project probably around 99 We don't know quite for sure. We know when the first code was published. That was a year later. I'm talking about three seven Because that's what a lot of people are still using so this is pretty much the same slide from me last last year Just that that slide ended here And since then we've released 3.8. So that's six year years of trying to keep an API stable Which means that actually there was a lot of useful development In the use space the application space. This is awesome This is something that we definitely want to foster again in the future. We don't want to break things Whenever we can we want to you know move fast, but don't break things. That's a hard problem actually So here I am telling you that way we're gonna do a lot of changes Not all gonna be easy and we need to write some code and we'll break some code But in the end everything will be fine. So please be on with me while I try to assure you So because I couldn't stop myself from actually doing that. That's exactly the same slide as the last year I've just you know Clean it up a bit on you know showing that we've done all this Like from three seven oh two three eight. Oh, that's only like three hundred eighty, you know change lines of code that Is okay ish considering that includes code reformatting Excludes all the white space that is actually a lot of work and it's pretty amazing that basically All out of three modules will be compatible with you know ready three eight. I Can't think of many that have like architecturally problems with Becoming three eight compatible. It's mostly mechanical Co-changes and the fact that your python needs to be python three if you want to be you know future-proof and not only work with 3.8 because as Mentioned before by andre. We're dropping python two for anything after 3.8 Which kind of is the right thing because python just dropped python two So but what didn't we change we didn't change anything about how We modularize Our scott so that's still like no radio runtime. I've got all the gr minus what f's modules in there And we're actually putting in more as 439 so We also didn't change really like what the project is about it's about CPU bound or CPU base Processing of software defined radio signals If it works for audio if it works for so now if it works for your favorite You know, whatever 1d signal fine But you know the scope of the project really is software defined radio at this point and Another thing is that we really didn't change the way we think about contributions So if you've got a patch you send it upstream we do this by with github by now We didn't do that with github when we started 3.7, but it's really like you know very mostly bug fix oriented code contributions and We're just now getting up to speed with how to Encourage people to submit code upstream and for example just code would be highly be a Candidate for you know upstreaming considering that it has a broader usage And a lot of people aren't really happy with the equalizer at our entry like if you I've just had a talk at either Right about people trying to Sink on you are packages won't work with the current equalizers because they're simply too slow Like your package is gone the moment here equalizer locks and The main thing is we didn't change the scheduler so now I'm doing This funky finger thing the air quotation marks. Why am I doing the air quotation marks? I think like everyone loves the new radio scheduler, right? I throw in my block it gets executed I don't have to worry about anything all I have to worry about as a developer of a signal processing Transformative step is take my code put it into a work function Which has an input buffer has output buffers and I just write stuff into the output buffer and tell the runtime how much that was right? So to understand why That's not the perfect way to do it. We'll need to you know Go a bit into the history and do maybe a very very very short Introduction to how great a scheduling works. I try to actually put everything in that Presentation then I realized I'd you know held such a presentation last year took one of them half hours Couldn't do that here. So we're doing this. There's a bit easier like originally Radio from the start was a single core oriented Framework why because 1999 there was no Easily available multi-core CPU machines that you'd usually have access to as a private person and that was what you know the developers were Ideologically aiming for so what it was it was take an abstract representation of a flow graph of a signal processing graph Flatten that graph. So that's a directive our second graph. There was no message Async stuff in there. There was just samples going around Then analyze that graph find the sources in that graph. That's rather simple. You start at any node You do a backwards search until you reach a root node and then you check whether you've you know, call out all your Tree all your graph nodes and if that's the case you're done So you find all the sources you call their work functions They produce samples in their output buffers and then you basically take these samples call the down like take one of the sources outputs Call the downstream block Let that ripple through the graph and you repeat that until you're basically blocked by Someone missing new input and then you go back one step back trace and run another iteration on that block So I hope that if I say something wrong like the editors of the project which aren't No, Tom's not here So I'm not saying anything wrong So we later renamed that I was around 2009 we said right or renamed that to a singer-threader scheduler Because we went in and you said that we said oh now that dual core even quad core machines are readily available cheaply available even Coming to norm. We want to like, you know have multiple threats and what we did was let's say we algorithmically simplified dummy fight even we just took every block Put them in their own threat Still discovered the sources call the sources they produce some data and then we just had like this message passing Thing where we notified all the neighbors of these blocks that hey something has changed and the individual block Executors and their threats would then go in and analyze what that mean and that usually man means that you start processing the samples Which is fine, you know, I can write it down in like literally four lines of C Like the algorithm to the C And that's fine problem is Like the average flow graph has more blocks than I have CPU cores in my machine And I'm less I'm not working for Amazon or Google. That's probably gonna stay that way so With that in mind we can see that okay, this is a very flexible extremely easy way of doing it and it leverages the fact that Operating systems are getting pretty pretty good at scheduling tasks when they're ready when they know that they have data available when they're not blocked on something and We can really see that the Linux scheduler at very least that's a decent job at, you know, finding heuristics of when to schedule watch But it's not, you know, it has simply no clue The linear scheduler has no clue about what the data flow is, right? There's only threads and some thread says oh I have a mute holds a mutex and we notified at mutex So some condition variable change some f mutex is calling suddenly you're in the wild You don't know who of these 15 threads to schedule next So this is fine. It works. It's surprisingly performant But it's also a bit depressing right because that means that we had this, you know someone thought about how to build that single thread scheduler and Then someone else like this the same group of people thought about how to build a multi-threaded scheduler And they started with the simplest thing that comes to mind and it actually like, you know But the hell of that because if you have four cores that's three cores more than you used to have and you know You can have a lot of inefficiency there So let us, you know, take a step back and actually look at what makes new radio signal flow actually tick And I like to call it a back pressure driven And with the threat for block schedule a parallel signal processing architecture meaning that while for example the file thing might still be Processing the samples writing them to a file The multiply cons block here in the middle might already be working on a next chunk of things Because that's the way it is like the signal source just notified multiple cons multiple cons did some calculations wrote to the output buffer and notified the file thing and in the meantime the signal source might have already been Notified by multiple cons that hey you've got some like I'm done you can do some more work for me Could you please so these two can run in parallel and in the end like in a stable situation all three Basically could run in parallel, but I said like no flow graph is as simple as that most have way more blocks than CPU cores So the the you perceive parallelism isn't actually there. It's just maxing out your CPU usage best case So Nope So That's why I put the scheduler in quotation marks, right? There is no scheduling of things We just tell everyone at every point in time that hey, I'm done or hey You've got so like I'm done you've got some more output to produce and as you can imagine like this has a bit of overhead in terms of messaging and it also Doesn't really qualify as you know informing anyone of the intent of your data flow Why is that important on normal machines? Well, if you've got like think of an old style CPU you only have RAM and Registers and that's it right so whenever you exchange a big portion of data. You always write that to RAM and get it out in the next block so Problem is that's not how modern CPUs work modern CPUs have caches and these caches are Inherently local so it would be very desirable that for example if I had a lot of blocks And this is just a part of my flow graph that three these three blocks would run on the same CPU core Why because then the output of that block would never leave the cache of that CPU core The multiply cons would then be called on data and cash and write to a cash Data for the file sync another thing is it's obviously beneficial if we just try to execute these in Sequence all over again. Why because then with lower cache usage. I can keep the same locality So we have no way of actually integrating that knowledge into the current scheduler Bad thing, but here's who we gonna how we're gonna solve it I'm running out of time so I'm kind of speeding up here so What we did in the past is we let it sing a threat scheduler die not because it was a bad scheduler But because we concentrated on a different one and then we added features that only the newer scheduler support So if you're using message passing in the asynchronous case, that's not a feature supported by the single thread scheduler Which is sad. It's not technically impossible to do that It's just you know, we never thought about that how to do that and how to come up with an architecture so we let things die and We also didn't Come up with a way of measuring how well our scheduler works while we were still Developing it, you know, it was all an afterthought. We we added control part to figure out when how many Milliseconds your block call took an average we measured We added metrics after the facts now the problem is if we're gonna do better than that We should be starting with metrics up front right because other than that. We might probably make things worse first so This is a bit of a trouble here because measuring runtime in a dynamically allocating Signal flow is a hard problem. You can't just go and say everyone halt I want to know how much time you spend and you spend and you spend on you spend because That might all have happened like in this instance You're just changing where data is flowing like DMA transfers will happen in the back Behind your back while you're hosting everything and observing everything. So the all you can do is either do statistics or Hope for the best And we are we're gonna change that and we see how that's possible so what we hope to do is is Right a schedule that's you know something we actually understand So we have this threat per block schedule and it's like fairly simple in principle like I've just explained I hope that their principle was kind of clear But it has like a lot of these states Is that block done? Is it block on input? Is it block on output? Is it you know currently processing something? What what we're gonna do when we say, okay, this flow graph is done now How do we flush out the remaining data in the in the in the data flow graph? We need something and that's pretty dear to a lot of people in this room. That's actually extensible We don't want things to be the single schedule that you want to use in radio You need a schedule for payload for workloads that look like a path I really like where I look at it and say, okay, the optimal solution to this is obviously, you know Doing it in in circles as fast as possible and Situations where we have things that look more like a network of graphs where packets might be sent around Even taking different routes through my signal processing Flow graph depending on what that packet contains, right? So if I'm decoding Wi-Fi and not all packets are data packets Some packets might just be needed for equalization purposes or you know initialization purposes They need to go down different routes. So that's something that the current scheduler architecture can't really do So we started to think about how to actually write a new schedule and the problem with that is How do you start doing that if you've got a block that does so much Everyone who's like who's who's already written a new radio block in this room can see hands So that's a little more than half of you So you really know that you know a radio block has this work function that takes an input and an output buffer or multiples of these But also you can have something like a forecast which you need if you have don't have a fixed rate from input to output It also can have like things to check whether the in number of inputs works with the number of outputs You offered it also has a lot of things that you don't really need to have like this Purity of essence kind of I want to process samples. That's a transformation between input to output But a lot of things that encapsulate the state of your signal processing but not from a signal processing point of view but from the You know data marshaling scheduling point of view. So that is a really hard problem that we see there why You know if I want to take that block and Scheduled it elsewhere be it like an add different scheduler type because I realized like this scheduler is not working out for that sub graph Or whether I want to say okay that block goes to AWS now It doesn't run on myself a computer anymore Receives low rate data and you know needs processing power that I can't offer locally. I can't do that with that It's not possible to you know actually transparently plug out a block and move it elsewhere because the block is so intensely Phasing with the scheduler on a basic basis that actually Fills with the internals of the scheduler that we know were documented as an api It's emergent behavior that everyone got used to like we're fixing a lot of bugs because Oh That's how people are using the scheduler interesting that should be working We did something it broke that and now we have to figure out a way to actually Make it work again because you know we try to not break user land so We needed a cut we needed to say okay, what's the block what is good about a new radio block and what's not necessary about the radio block and So we implemented a new block dot h. That's what we were doing at the hackfest at isa And that was really like that is a block that's tremendously Reduced compared to do your original radio block. It really only has a work function And instead of you know having Input items and output items pointers And then also parameter that tells you how much output items you can maximally produce But ignoring the fact that you have multiple outputs so that these might be multiple numbers And expecting you to actually you know tell the scheduler how much you consumed We kind of try to Put that Into objects that the scheduler Calls you with so that we are getting more like a functional interface Why because basically computer science tells us that lambdas are better to reason about than things that change the state of the world around them, right? so Especially if you're writing an fpga Implementation of something you don't want that fpga implementation to have to go back and tell the cpu bounce scheduler that hey I'm now producing 20 items That's nothing it should be caring about first of all and saying of how you just literally can't without incurring like an overhead That's probably way worse than doing the home computation on the cpu itself. So That's why we're trying to you know move away from the classical work thing Another thing is Right now people are writing two different kinds of blocks. They're writing like the classical stream operation blocks that your fm demodulator Or your modding demodulator for and stream of oftm frames like say in dvbt And they're writing really packet data things like what comes out of your stream demodulator What is an wi-fi packet and they put these? Into us and chronos messages because they don't really fit the I have a Contiguous memory buffer where I could write in samples paradigm and it also really doesn't fit that you don't want to Allocate these So that hopefully Allows us to have a single unified interface for both instead of having two different paradigms in the same Idea of how we we do work. So that's the block-centric view. This is what I would call the OOT module implementers view of things and then I've got um new um The scheduling interface so the scheduling interface obviously now gets a graph That's made of out of these connected blocks, right and it's still I know It's still a graph But instead of just you know distributing messages to all blocks that run in in a thread each We introduce workers like I mean web servers have been doing that since Around 1995 We just have one worker per cpu core and that worker is the owner of a couple of blocks that It executes so instead of sending messages to a block Executor we were sent messages about hey, I've just updated my output to a worker that worker will then Aggregate all the messages that it got while the main Work that it was currently executing was executing after that work is done Reorder if necessary and execute that So kind of imagine that as a queue with knowledge of what's done inside right so This is for cpu scheduler should pretty easily work because that's basically I mean if you're implementing a simple os that's exactly how we implement Scheduling threats you have a queue of that that should be doing stuff and then you check whether any of these had any update so that's easy but also That gives us a very clear interface of how to Communicate with other scheduling domains. So I'm introducing the concept of a scheduling domain. That's like hey I've got this cpu bound schedule. Hey, I've got to schedule that actually runs On a gpu or hey, I've got to schedule it actually runs on an fpj in on a data center somewhere else And all that it has to do is receive messages and the good thing about receiving messages is You're not really bound to having them in a shared memory space, right? You can do that over network Meaning that this is going to be transparent. We can simply zero mq that to you know your neighbor's pc And have that work. So that's What we're doing in terms of scheduling architecture So Could you press the next button? Yeah works So what what are we expecting from this? So I said like we'll be able to transparent remove blocks We'll be able to exchange schedulers and that's a big deal for us So that's actually where currently our main research is like we can Do awesome things in block? We can write the coolest equalizers But at the end of the day we're limited by the computational resources that we can usually utilize Yeah, and a lot of places we could basically have internet money, but we don't have internet bandwidth. So Yeah, we need to solve that And also we've been stuck for 20 years with basically the same scheduling paradigm And might just be the case that we need to just experiment get our fingers, you know A bit muddy on how to you know work things So I've got three minutes for questions And if anyone wants to leave if we could do that quickly so that I can shop off you So I don't know the order of things. So but I let the gentleman from So the metrics So So the like the question was how what what what would the metrics interface look like what could people do with that? And um, that's pretty easy like basically most of the metrics are simply Emergent properties of the messages that we're exchanging right so I could just say okay Splice that message to a observer and do that the hopes are that People who are into scheduling are optimizing based on what they can observe there Another hope is that we can just throw, you know phc students at that problem until it actually works And if that doesn't work, we still have machine learning So that's that's the one thing Yeah, the other thing is that we already have like I mentioned contrapoor we have ways of measuring performance They're just not utilized very well because they're you know, ipiy is very awkward I'm just gonna go ahead with you So the question is whether like real-time linux capabilities are beneficial. So bussy bluster wrote a paper on the influence of scheduling Strategies of the oscarnal for radio and you can actually increase performance if you're doing a bit more You know preempting guaranteeing latencies, but there's only so much you can do so Like basically real-time linux always does a trade-off between latency And rate and we're not willing to go that way. So I'm I'm sorry. I'm out of time Um, I'll have to hand over like I gladly hand over my microphone to the next speaker