 All right, so this is a libcigameth. This is a C++ library for dealing with Cigameth data sets and I'll talk a little bit about that more in the future. I'm Nathan West. I've done quite a bit of development for GNU Radio and Volk in the past. And now libcigameth is available on GitHub. More detail later. So this is in partnership with the UC Berkeley. So we'll have, they're going to be releasing some data as well. I'm going to be presenting some information from them. I am not really an astronomer. I'll do my best to present their information. But I'm really just going to point to their resources and you can go online and contact them if you're interested in the detail of that kind of stuff. I'll be sure to point out where it is. So what is Cigameth? It's the signal metadata format. I guess there was a GNU Radio hack fest last year or two years ago here at Fosdom, right before Fosdom, where they came up with this. So the actual specification is licensed with Creative Commons. So you can basically use it for whatever you want. Just give attribution. And the general idea is that you have two files. One is this Cigameth data file and one is this Cigameth meta file. The meta file is just JSON. And the whole structure of Cigameth is really just not around exactly how to store stuff so much. But it's more about what keys and metadata do you use to describe your signals that you've recorded. And then the Cigameth data is really you specify what the format is inside the meta file. But in general, it's just samples written to files as plain C types. So the goal is really just to keep metadata consistent because everyone uses different ways of capturing their sample rates and center frequencies. It's nice to have tools to actually share these kinds of recordings. The nice thing is that you can, because the Cigameth data is just like a plain sample file, you can just do memmaps into your sample array and index them like that. So at DeepSig, we apply machine learning and deep learning to sort of RF signal processing in general. So we wind up recording a lot of radio captures. And we use Cigameth internally to keep track of what we've recorded. And then there is this GRCON paper where we discuss some usage patterns that are kind of useful. So my goal with Libcigameth was to create static types. So we have these schemas that we find for JSON. But we're going to be using this in C++. It drives me nuts like every JSON library in C++ uses strings as keys to index values. And that just drives me absolutely nuts because it means you're reproducing strings everywhere. And it's pretty likely you're going to have a typo somewhere that won't get caught at compile time. But if you happen to have great test cases, it's, I don't know, good luck. So Cigameth also supports this idea of namespaces. So there's the core one. There's an antenna one that describes what antenna you would use. You can sort of define what name space you want. So you have like a point people to your description of that. And they can now get a sort of compliant like that. They will always be able to read and make sense of the metadata that you give them. So I wanted to be able to support any namespace that someone might make and also still generate compile errors instead of runtime errors on misusage. So it's header only. So it currently requires C++ 14, although there's early only. We could lower that to 11 if there is a reason to. And then there's a CMake helper. So it's a C++ project. We use CMake as a build system and make it easy to, when someone creates a new namespace, just make it easy to actually use that. Squash, flat buffers, and modern JSON together. So flat buffers, it's just another sort of like class generator. It's kind of like proto buff, but it's laid out differently in memory and it has some other nice features to it. One of the nice things is that it has these sort of object description types. So you can actually get this type that describes the object that you've generated from the schema. And that's what I use to take advantage of. I use that to sort of parse through the structure and make sense of the object that has been created from the schema. Modern JSON is just used for the JSON serialization and deserialization because why recreate that when someone else has already done it in a header only format? So I've seen people use SIGMF and C++ using modern JSON before. But they just leave it as modern JSON so they wind up copying their strings all over the place. So anyway, LibSigMF uses both of these. Those are really the only two dependencies. And they're also both header only. And we've released in this under Apache license 2.0. So here's kind of how it works. There's this one class, the variadic data class. That is where most of the work is done. So you describe this flat buffers schema. So SIGMF core.flatbuffer file. If you have flat C already compiled, then it's literally just headers. If not, then you can include this as a sub module and git. And we'll just build flat buffers for you. And then you get these class definitions for SIGMF objects. So the flat C generator, the options we give it generates just the actual object that has all the fields exposed so that you don't have to go through generator types. And then this is just a variadic. This actually shouldn't be a capital T over here, but it's not actual C++. So there's this variadic template. And then you can access specific objects inside there. So the nice thing is that this actually works with any type of, if you can take any kind, it doesn't have to be SIGMF in here, this could be any kind of just data storage class that you want. And then you can shove that into this template parameter and just squash together all kinds of different flat buffer objects into one convenient holder. So the way this works in practice is you create this variadic data class and say I want a capture type. So this is, this capture T is what's output from flat buffers or flat C. So once you do that, you have this new object that is parameter and you can access all the parameters inside that template using this access method. And so you get code completion in nice IDs and editors. And of course, since you get code completion, that also generates compiler errors if you use the wrong type or something. Down here, you can also, it returns a reference. So you can also just save that reference and then you don't have to call the access all the time. So then you just say ref comment equals, this is my comments and sample count is this. So it really couldn't be a whole lot simpler than this kind of usage pattern right here. Except I made it a little bit simpler. So here you see that there's this access method. Down here I'm using this get method. So you'll notice that up here, I create this annotation object. Then when I access it, I have to say, oh, by the way, I want the annotation object back out. Really, this is to say I want the core namespace annotation object as opposed to some other namespace object. So down here, there's the antenna one that's used. So what the get method does is says, I already know that this is an annotation. So this description T is a description for the namespace. So the core namespace is used here and then the antenna namespace is used down here. Since I already know that this is an annotation, it's just gonna attach the annotation field of the namespace description. So what does that look like in practice? So this is one of the implementations. We basically just kind of abused templates here and we just attached the annotation, which is highlighted here onto the get method and then call the access method with the annotation type. And for the capture, we do the same thing. For global, we do the same thing. So that turns this code into this code. So it's a little bit nicer. To make a SIGMF record, which is a whole description that has a global field, a capture field and an annotations field, you just create this SIGMF object type and tell it what kind of globals you want, what kind of captures you want and what kind of annotations you want. So this would have a global with core namespace, antenna namespace. There actually is no antenna namespace for captures. And then the annotations would have core and antenna namespaces. So this gives you a nice type and to go round trip from JSON back, here's a string that is describing some JSON file. Whoops. And we just call JSON parse, which comes from modern JSON. Object equal to that. And now you have your static object filled in with all of your SIGMF metadata. If you want to go back, all you have to do is say make a JSON object out of this capture object and now you're back as a string. So the round trip serialization is pretty simple and you get all the benefits of a static type. So the code's available here. It's on GitHub, deepsig, glibsig.mf. It's Apache 2.0. Yeah. So what can you do with SIGMF? So at deepsig, one of the things we do is we do spectrum annotations. So winds up looking kind of like this, just draw boxes around signals and label what they are. If you go a layer above that, then winds up anomaly detections where you can drive through Arlington in Virginia and just keep note of what downlinks you see and then draw heat maps over them with GPS coordinates. You can also do this sort of center frequency bandwidth kind of analysis where you plot a heat map of center frequency and bandwidths. So if you watch this and expect it, then it's in your signal. And then there's the breakthroughs and stuff. They're also doing similar kinds of things. And so with this, you basically just put in in the annotations section of SIGMF, you say whatever you want about the signal. And then there are some core namespace features that you would fill in. And then there's also a handful of maybe specific things you want. Like there's no SNR one, so we have a deep SIG SNR namespace. Yeah, you just fill in annotations in the JSON fields. So Steve Croft is the lead at the UC Berkeley. They are working on finding aliens. So here's, I guess 30 years ago, it wasn't known that there were planets outside of our solar system. And so that's a relatively new idea. And it turns out that there's a lot of common, a lot of planets that could potentially support life and they're just not very nearby. So this animation should show, yeah, this is, keep track of the year here. So this is over several years watching the reflection of various planets as they rotate around their sun. So this is one way of observing that planets are outside of our solar system because you have the sun in the middle, of course, and then these planets are, they have a very long orbit relative to Earth, but you can see them moving. And that's just reflections of their sun that we're observing. And so this is made up of real images from the Keck telescope and shows seven years. So I think that these, their orbits are between 40 and 400 years, which is pretty long, but. And then there's the Trappist-1 system. This has apparently become pretty popular with kind of the study research people. It's kind of an appropriate name because it's the Trappist system and it's discovered by Belgians. So it's great to talk about it here. The reason it's interesting is because they have seven planets, perhaps more, that are just very similar to Earth. So there's lots of potential there. And this is just an illustration, of course. So then the question is, if there's so many stars out there, many of those stars have planets that could support life, where is everyone? There's the Drake equation. I don't know if anyone's familiar with that, but it is an attempt to say, let's enumerate all the variables of, what the probabilities of things occurring that would cause life to exist. Given those probabilities, how many planets are there that might have intelligent life? And I think the first estimate of when they came up with the Drake equation, or when Frank Drake came up with the estimate, he decided in our solar system, there were about 10, based on some probabilities. And then there's the Fermi paradox of, if life is common, then why haven't we found it? So there's two ways. One is biosignatures. So this is, you take planets and here's Earth. So just look for similar composition to Earth. And then there's Venus, and just look for biosignatures because it could support life. Then there's technosignatures, which is basically looking for signs of technology. So pretty much the study people only find signs of technology coming from Earth. So what you do there is you basically just collect signals and then look for structure. You can do this with really cheap SCRs. So the Nulek type thing is really cheap. The Pluto is also very cheap. All you have to do is ask a question and you get it for free. And then you just need a dipole. So yeah, so far only technosignatures have been discovered on Earth, so little disappointing. And it's kind of like searching through a haystack or a needle in the haystack. So the breakthrough mission was announced in 2015. So the breakthrough initiatives I guess, and it was a hundred million dollar mission to find extra, or signs of extraterrestrial intelligence. So the Berkeley group has 20% of the time on the Green Bank Telescope, which is the largest dish on Earth, I think. Fully steerable. Fully steerable, all right. And it's in the Radio Quiet Zone in West Virginia. So they get 20% of the time on that, which is pretty substantial. So they have some infrastructure to support that and it's mostly tons and tons of storage and high-speed digitizers. And so they collect, what this winds up doing is that they collect tons and tons of data. And then they have to process that for some kind of sign of signals that you wouldn't expect to see. So how do you do that? Well, they have this approach where they collect, this is relatively wide, or sorry, relatively narrow compared to their digitizer capability. They can do six gigahertz of bandwidth. This is only, what, 30, 40. So they channelize this into smaller chunks. So the black lines that you see are just residuals from the way that they channelize things. And then they do some differential analysis. So there's lots of interferers that they have to deal with. So the Green Bank Telescope is in a quiet zone. Even in the quiet zone, you wind up hearing lots of things that you shouldn't, I mean, part of the problem is probably that at 81 goes through, major interstate goes through there. So it seems like there's lots of problems there. And then you also have satellites that are going overhead and they don't necessarily respect the quiet zone when they're doing their downlinks. So the sensitivity of the Green Bank Telescope is, because it is so big, it detects a lot of things that it shouldn't. And so this is looking at an example of their analysis. So they look at, here's six spectrograms that they, the same channelized data. They look at a star and then look just offset to the star and then look back at the star, then look offset to the star, look back at the star and then look offset to the star. And then doing that, they're looking for signals that basically just don't move. So if you're looking at the same star, you should, if any signal that you see coming from that star or from that object should be there in all of these and it should not be there in all of the off ones. And they also say that the geosynchronous satellites will also not appear in, like they'll have some star. So any signal that you see in here, but not here is really what you're looking for. So they have three different outputs. There's basically just time frequency trade-offs here and a lot of it is in this sort of guppy, I think it's the green bank, something pulsar data format and some of it's HTF five and these filter bank spectrograms, which are these things. So my claim is that this kind of highlights why, so in doing some background research on this, this guppy format is green bank specific. It's based on some other format and then a lot of users of green bank have a slightly modified format of this guppy data. So it seems like SIGMF may be useful to the radio astronomy community, but we'll see if it's widely adopted or not. So here is, they're trying to point out some signals that they see in the off and not off. So one approach that they're interested in is rather than doing this kind of differential analysis because you have a time delay here, right? So your signals may have moved in time or you don't capture some signal that it's just intermittent. So that wouldn't show up in all six spectrograms even if it's there in all of them. And that's why you have to do three passes to say like, is that signal really still there? But you're still losing these bursts and the bursts are kind of the interesting things. So they're interested in kind of a machine learning approach that detects all unknown signals on earth as well. So one of the ways to test that their algorithms are working is to look at the farthest known thing that is emitting that they can definitely detect and know what it is. As that's Voyager 1. So that's kind of the sanity check on a lot of their algorithms. So here you have the Voyager 1 spectrum and it's 12 billion miles from earth, so pretty far. If you can detect that, then maybe you have a chance at detecting other things. So here's kind of the zoomed out version. It's not a very strong signal. And so you zoom in on it and you start to see that, okay, there's a carrier and there's the side bands that have data on them, zoom in further and you can see the actual signal characteristics. So they have this tutorial on GitHub that you can follow along and it's a Python notebook that shows you, you can actually detect, like they give you the data set for their Voyager capture and then you can do this, zoom in and see Voyager. And so here's what that looks like. And they have, again, like it uses this filter bank file but there'll be a SIGMF captures at the end as well. So this is another URL to the same material, I think. Yeah, so then there's also these fast radio bursts which are interesting and these are, on the order of milliseconds, so these wouldn't necessarily show up when you do the three scans to the on, off, on, off, on, off. Because they're so short, using a machine learning approach to do this is useful and SIGMF basically allows you to easily say like here, like in this time, in this frequency, here's this burst, this fast radio burst. We're gonna build up a machine learning data set to do with these fast radio bursts and do our learning on that. There's only two of these that are known to repeat. So, yeah, so that's the fast bursts. So the Berkeley study group has actually created a data set, they've synthetically generated some of these and then trained some neural networks to detect is this, is there a fast radio burst here or not? And using a CNN, they worked through 100,000 simulated bursts and they were able to find 72 bursts that their traditional algorithms did not. So it seems to be a useful approach. And these are just more of the bursts and then there's paper links in here as well. The author that did that was Gary Zhang, he's at Berkeley. And then, yeah, like I mentioned, the approach that they're taking is, they would like to identify, if you can identify all the earth signals first, then just remove those, look at what's left, then you have much fewer candidates to look through for finding ET type signals. And then there's another paper going through some anomaly detections. So that's all. So if you wanna go for the Berkeley data sets, they have a blog post here and then actual data that you can download here and then they also have those Voyager tutorials where you can download their Voyager data set and walk through demodulating the Voyager signal. And then LibSigMF is available on GitHub and we also have a SIGMF recorder, which uses UHD and just dumps out SIGMF data sets. We'll be supporting some other radios as time allows to develop. So that's all. Any questions about LibSigMF? Maybe some astronomy, but you probably just have to punt to the steady people and they're also on social media. Yep. Is there any particular reason for how to separate files from data? Yeah, so the question is why use two different files, one for the metadata, one for the sample data. The reason is a lot of tools, at least like my personal opinion is that a lot of tools are easier to write and less fragile. If you keep your samples completely in their own, like you either have to have a fixed size struct in somewhere in your data and then you can mem map. So like personally, I really love mem mapping samples because I don't have to load the whole thing from disk all at once. And if I need to do that, then I just explicitly load them. So to do a mem map, you either need a fixed size struct or you need to read the struct to figure out how much data to skip to get to your samples. The other is one of our convenient usage patterns, at least that I've discovered is using a really big disk to actually do your storage and then maybe like do sim links to like a read only protection so that people can't mess it up. Yeah, so you can do nice tricks like that. So like you don't have to necessarily have your metadata and your signal data in the same spot means is with it. Otherwise, you have to develop very, very specific decoders to read a sample file. Yeah, the Sigma spec allows you to squash them together into a TAR archive. But I guess you could do that even without the spec. It's just the Sigma spec allows you to call it a dot Sigma F file. In the Sigma F report, there should also be like all the rational explanation of why this was done. Yeah, that's true. We should. Yeah. If you want more details, comparing optical or radio close at all, you can use the same kind of... Is this in the search for ET signals or in... Oh, yeah. So I think one of the reasons that they don't use optical searches is they're looking at planets that are much further out than you can typically see with optical, like just images. Yeah, that's a radio astronomy thing that I'll have to punt on. They did mention in this slide with the animation. Oh. Oh, I see. Yeah, there's nothing. There's not really anything particularly specific to radio. It's just a JSON field with some keys, right? So you would define keys that you want to share image data with and then release that as a namespace and share it with whoever you wanted. Using all those machine learning frameworks and libraries with complex signals with a lot of information that's contained in the face and you can't really split it up between in-face and portraiture. Yeah, so that question was on deep learning frameworks and complex sample types. It's pretty weak. I know that TensorFlow has complex data types. None of the neural network operations actually work on them. There are like TensorFlow ops that work with complex data types, but none of the actual neural network layers do, there's some technical challenges there. And there's also been papers, I think Benjiro's group, Joshua Benjiro's group put out a paper comparing complex neural networks to equivalent non-complex neural networks. I think the results basically showed that it wasn't particularly useful. So maybe it doesn't matter. I also know that PyTorch or LibTorch or Cafe, whatever you want to call it these days, they just don't even have support for any complex type, I think. I guess it doesn't work. Yeah, so that's a study-specific question. I suspect that they were using 2.4 gigahertz in this example just to show easy interferers that they're, it's showing the procedure that they use to remove interferers and that's probably just where they're guaranteed to get interferers. I doubt they're actually searching at 2.4 gig. Oh, okay. He knows more about this than I do. All right, I see. Makes sense. For the video, there was just discussions on which elements resonated, which frequencies. Yeah, that's a great question on, is there any automated labeling tool? I don't think that there is, I think there is a GR SIGMF, which has, it just uses modern JSON. I don't know that it's attached. I know that they have said publicly that they have a fork of InSpectrum that I think that they use with SIGMF. I don't know if, I don't know why not, but I think no about releasing it. Yeah, I think if someone actually released one maybe later today, some people can just hack together SIGMF into InSpectrum and then contribute that fact. Yep. Yeah, so other groups using SIGMF, my understanding is that there are several independent groups using it. It is just a GNU Radio spec, so like it's part of the GNU Radio Foundation, like one of their projects. So it's not specific to my company at all. It's just we're releasing a library to use it conveniently. Yeah. I personally am not, I believe that there are several, I mean, there's several companies using it for sure. I know that in the U.S., several government agencies that do research like the NRAO is using it. I think, what's the one in Boulder? They do the timekeeping. NIST, yeah, I believe NIST is using it. But yeah, I don't have a public list. I don't know if there is a public list of claimed users. Yeah, I'll ask my boss. He loves SIGMF. Oh yeah, I've been shocking in questions.