 Thanks everybody for coming today, first time here, so I'm excited to be here and presenting. This talk's kind of split into two sections, try to time together a little bit. The first bit is going to be focused on integrating rust with GNU radio. Right now, you pretty much write either C++ or Python, if you want to do anything there. I personally really don't know how to write C++. It's never clicked for me, so I wanted to explore some other options, and so we get started there. Yeah, the other one is kind of, I have all this hardware that's been piling up where I have FPGAs sitting around with nothing to do with them, so I feel like I should do something with that and the information out there seems to be pretty poor about how you actually put something on there and then connect it to something real. There's like a lot of missing information there I feel like, so hopefully this will give some context. So why rust? I mean, I think a lot of people have heard about this, tries to take care of a lot of the memory management and safety there, has a really nice concurrency model, so you can kind of just write code that's concurrent and it won't let you write things that aren't safe. From the very beginning it ties into C pretty well. I find the language really easy to read. Maybe if some people don't, that's fine. And then the dependency side, it kind of has a pre-built package manager that works with it well, which something that C and C++ I think are very bad at. So I've enjoyed my experience with it so far. So in case you haven't seen any rust, here's like a very simple rust program. This is actually from the Rust book, which is really good if you want to learn it. Got some stuff at the top, we're pulling in an external package, some namespacing. We're doing explicit mutability, so we're telling that function it's going to go in there, it's going to play with things on that, and it's okay to do that. We have these things that are kind of like fancy switch statements. I don't know. For me, I can look at this and it just, it's easy to understand what's going on. This is kind of some of the concurrence, all we're doing, just creating threads, and they're just doing what they should do. There's not a bunch of boost hanging out here. So to kind of get to the point of it, how can we actually talk to GNU radio? We need to create that C binding for it. So it's fairly straightforward, this first bit. We're saying, hey, there's this library that we're promised we're going to connect with you, that has library funk in it, and it looks like this. Same thing. On the C side, we're saying, hey, we provide this, that goes out, and then we need to provide a header file that also represents that, so that C knows how to talk to that. So now kind of getting more to the GNU radio side. So for this, I'm going to walk through about the simplest thing you could do with GNU radio, that will multiply block, just for floating point numbers. We're going to build that. So this is what that looks like. There's really not very much fancy about that. We take in two vectors, and a number of items that are in those vectors, and we multiply them. So yeah, kind of already went through that, I feel like. So there's a few things that we should think about, kind of outside of that. One of the things that sometimes we want to do, so the multiply thing is, that's very simple, but frequently there's a lot more state that we're trying to hold into this thing. So maybe we're setting up some UDP socket, we're going to build the UDP sink or something like that. We need to be able to hold that descriptor, we need to be able to be able to hold other state. We'd like to be able to keep some of that inside the rust side, but every time that function returns and it doesn't pass, it didn't take in the mutable object. It's like, I got to take care of this, I'm going to get rid of it. That's not helpful if we want to be able to then talk to it later. And so there's a lot of things where you put this unsafe thing in there, and that means it's kind of, there are dragons here you need to handle that. So we're going to play with dragons for a minute. So this is kind of, you can think of this almost like a class that we've created. It's called structs, but here it's similar. And they have something called a box. And you put that object in the box, and you say it's in the box, don't touch the box. And then later we go and clean up the box. This lets us take that and send it out. This can be actually used to then build something that looks a lot more like a C++ class on the other side. There's a lot of complexities there that go into that. But there's somebody at Mozilla that's done an excellent job working through that. It's a very long article, so I didn't want to try to put that in here. But I highly recommend checking that out if you're interested in how you can use that to provide those interfaces. So getting back to this function. So what is this? We got some void pointers that we're playing with. We don't really know what they are. That's great. But if we look more specifically at how we've implemented in the C++ side that for the float type, it's not that weird, right? We're just saying that those void pointers, they word floats. It's fine. And there happens to be a really nice place here where before we were calling into the Volk library, where we can just now call into the Rust library. And it's a very simple interface there. And so we can take that and just implement that piece of it. So start building that with Rust. There's a thing that kind of gets you the boilerplate going. We want to make sure that we're creating a static library because we're actually going to tie this right into the library. The new radios creating for the base block set. So we're going to create that. This is the function that we're going to try to build. So we go to find it up there. We say we're taking in that output buffer and it's going to be mutable. We're taking a pointer, I should say. And then we take in the other two. They're not mutable and the number of points that are involved. We then do some unsafe things where we say, trust us, we know how big that memory region is. But the nice thing is that after this, long as this has been established as correct, everything else that follows is safe. We don't really have to start worrying about anything anymore. So we do that. And that's fine, right? It's just kind of done a scalar implementation of that. Where I think it gets more interesting. Oh, sorry. So if any of you have worked with GNU radio on it, you know, it's big CMake project with all kinds of hidden dragons in there. So I was a little unsure how this was actually going to tie into that process. But some people have actually done a nice little CMake wrapper for Rust. And so we're able to just take our Rust project, put it inside the tree, add our header file, make these changes to that component, and it builds. And that was all that we had to do. Which was really satisfying. It even goes out here. If you don't have the dependencies there, it'll go down and pull them in too. And it does the whole build process. So that was very satisfying. This, yeah. So one of the other things we can do is extend this a little bit. Since we made it a very non-performant function, let's maybe get some performance back again. So SIMDs is available now inside of Rust. I think it's parts of it are getting stable now. Might still be a nightly, I'm not sure. But there's a couple of libraries that help kind of provide nice interfaces around it. And so they, one of them is something called Faster. So we get this nice kind of interface where we're like, here's these two buffers. We're going to make them iterable over the vectors, zip them, perform this operation, and then put it in that thing. We can chain all kinds of things there. So if we wanted to then go do some subtraction in there, absolute values, whatever, to go build that, it's all nicely handled inside of there. So that's all we had to do to make it a little bit faster. My computer was doing all kinds of other things, but did some benchmarking. It's a little bit faster. The compiler's already doing a pretty good job with multiplying numbers. So what, actually, back here, so this is a, I do have a branch up if anybody wanted to play with this. It's there. I don't know where we'll go with that. So what does this have to do with an FPGA? So what kind of took me down this path was I've been trying to build a LBDC encoder for satellite communications, and I built a reference kind of implementation in Rust. It was a lot easier for me to do it there. But it's not fast enough. It's not even close to being fast enough. So we need to extend that out to the FPGA. So I wanted to talk a little bit about what that process is like, what are some of the building blocks. That is not complete. So I'm just going to help kind of walk people through maybe some of the components. So in case you don't know what an LBPC encoder is, the idea is that you have these kind of what they call information nodes, and then you should be able to take XOR operations across these connections, and it should all be satisfied if the data that's there is correct. That's simplification, but we don't have a lot of time. We can represent that within a matrix that represents the connections between the nodes. What's interesting about the one specifically used with DVB S2 is you get these little identity matrices that are like shifted a little bit that are scattered throughout this. All that black area is actually stuff we don't care about, which is partly what makes this a lot simpler to do. Each one of those white dots, this did not come up very well on the screen, is one of those identity matrices. So this is like the check nodes versus the information nodes, and you can kind of see they're just shifted identity matrices. So what's cool about that is that we can then build something that is very parallel because we can do all of those, each one of those dots, we can basically do in parallel. So we're no longer just trying to use vector operations where we have a little bit of data that we can do at once. We can now do on the order of like 390 of these components of this operation at the same time, which starts to get us to the data rates that we're looking for. So how do we start building this? So a bunch of, so like even I think in those plutoes, I think they have the zinc, they have the zinc in there as well. There's a few others. So what's interesting about the zinc processor and some of the Altair ones have this as well, is that they've put an actual processor on the chip and then they've tied in this block of programmable logic, which is connected through there to the RAM, which is really nice because all of a sudden now we can have a peripheral that we build that just magically shifts data in and out of your memory space. And it's very fast. So we have an application running on the ARM processor that's performing these DMA transfers of your data frame into your peripheral. And then it says, all right, I would like to wait for my data to come out of my peripheral. And then you have a block that's sitting there that's taking those memory transfers and putting them into a stream. And now it's an interface that's really easy to build, maybe I shouldn't say really easy. A much easier interface to then be able to actually build your Veralog or the HDL model of the actual signal processing that you want to do. Turns out that what goes on here is actually really complicated and I think is harder than the Veralog piece and there's no reason for it to be that way. So this is kind of what I'm talking a little bit. Here's your processor. Don't worry too much about that stuff right now or that. But over here we have this DMA controller. That's the thing that's in charge of converting those memory accesses to data that's on the bus, which comes in here and out here. This is my peripheral. In this example, this is just a block of RAM that I'm able to shift data in and out of. But that's really whatever your peripheral ends up being that block. So where this starts to get a little bit complicated is you're like, all right, I have this. The kernel does not want you to mess around in physical address space. It's like we work really hard to not let people do that. And soon as we start trying to work around that and like we really want to in this case, like we don't want to have to write a kernel driver for every little operation we're doing. There are ways though that we can get around that. And a lot of this actually comes out of the world of virtualization where you want to have your PCI card be able to connect to your virtual machine on your computer and actually operate quickly. So we can leverage some of that. So there's the virtual function IO interface that's really helpful to be able to do this. Some criminal documentation on there. But we're able to use this to just to generate these large blocks of memory and attach them to the address space the DMA controller is looking at. I don't want to get into too much. I hope that this is here as a way if somebody starts going down this path that they can kind of queue into this. But using these IO controls and a memory map we're able to create those spaces. We also need to be able to access registers inside of those devices to be able to control them as well. So there's also interfaces for being able to do that. And then once so we're here this was a, this was our way to get those registers, that register space. Now you have your data sheet for that part you're able to actually access all of that. So here's everything that's involved in actually starting a transfer. It's not wrapped nicely but hopefully that brings across some of the point. But then we also want to know when our transfers are done. Normally you have this interrupt that's sitting there in the kernel and what do we do? Well we could do this. We could just sit there and spin on the status register. Probably don't want to do that. Or we can use something called these event file descriptors which are very poorly documented. To get this to work it was really digging into QEMU, KVM, and a little bit of the kernel documentation to try to figure out what is supposed to come together. I would really wish this was put together. But basically you're able to create your own file descriptor and attach it to the interrupt system going below. Which then lets you use what you would expect to do. So select, pull, all of that. So now we kind of have everything we need. We have the memory allocation, we have control, we have the interrupt side of it. And that's like the core pieces that you need to be able to actually talk to this piece of fabric that you built that's sitting inside the FPGA. You can also use the libc rust bindings to connect to this. They even have some really nice stuff to help make parts of that a little more safe. There was just a lot of unsafe things that were going on in those few slides. So I've been trying to figure out ways to make that stuff a little safer. And hopefully build the rest of that out. So that's a lot of information. But I hope that kind of mapped some things out for people. So, yeah. Okay. Have you pushed any of this code? No. So I'm hoping to have a bunch of this stuff. Yes. It's. Even if it's not working. It sounds like you're aware of how bad the document. Yeah, so I also have. So I did all this development on the ultra 96 board that Avnet had, which is also terribly maintained and documented. It's unfortunate. I do have on GitLab a managed project with a build pipeline everything to build this. Not the user space side of it, but it builds the whole DMA. The whole project it's built and it doesn't pull in their pedolinics tool disaster thing. So, yeah. Yeah. So this is something. Okay, sorry. So I mentioned that there's a cat. There's part of this whole memory transfer bit where we're skipping the kernel where we have to worry about cache. So we are, let's go back to this block diagram, the stuff I told you to not pay attention to. So we are actually able to tell the hardware a little bit about what we want it to do to not cache certain types of operations. But then we also, that's really controlling inside of here the memory management unit, the SMMU, which is what they use inside of ARM. Yeah, it's able to do that internally. So you're leveraging the SMMU to do that. There's really nothing that I had to do except for set these bits to tell the SMMU to do that. Yeah, so this piece of it there's actually on Xilinx's wiki. There's a whole section that talks about this specific problem. And they don't actually tell you what the values are of these, but they have screen shots of windows that have them if you find the right piece. Yeah, so you're asking what the performance issue is. So I think there was a note I made in here what we're targeting. The top, we're trying to get bit rates upwards of 85 megabits per second. And the operations that are going through here, there's each one of these checknotes has, I think, three to five connections to it. So there's a lot of operations that are taking place. So there's a couple of people that have implemented this in C. And it's not even close to being able to do real time even on a fairly beefy machine. Yeah, I don't have specific numbers, I'm sorry, but we're able to do this clocking at like 100 megahertz and hit real time versus a CPU that's pegged. Have you seen the patches that you requested? Apparently there are some accelerator patches in the Linux kernel that just went in. I don't know anything about them. Okay, yeah, so the rust piece of this, so the way I kind of went about this is I started taking chunks of the rust code out and moving them in the FPGA. There's nothing left really now. But it was just helpful to kind of be able to do that piece by piece. That requires a licensed thing that goes away if my computer disappears. I didn't go down that generated things, side of it. So I can't really speak to how those tools are working or not working. I get more frustrated with Xilinx tools the more I use them. So when they say, oh, we have a new tool, I'm like, that's not what I want. Yeah, so there are a couple people that have, oh, so the question is, has this been implemented using graphics cards? You can. It is still very difficult to hit the performance numbers with that because the amount of data that you're transferring in and out of memory is not very much and the actual operations isn't very, like, so you end up in a situation where you're paying this massive latency penalty with the GPU and it's a expensive GPU. I would have, so I haven't completed all of it, but most of the space is taken up with RAM. So I don't know, it's a few K of RAM. There's a lot of data to store. Okay, thank you again.