 As mentioned before, Internet of Things, it would be great if it would work, and one big part of Internet of Things is the Internet part, so stuff has to talk and cables are shit, so we use Wi-Fi and other wireless protocols. So our next speaker is going to take a very close look at the physical layer of Lora, a low-power wireless area network, and he built some stuff to actually sniff what's happening and inject stuff, and apparently he offered his sacrifices to the demo gods, so we'll see something. Please give a warm round of applause to Matt Knight. Thank you for that warm introduction, and thank you all for coming. I'm really excited to be here. So for the next hour, we're going to be talking about the Lora Phylare, and Lora is a low-power wide area network wireless technology that is designed for the Internet of Things. So first little bit of background on myself. I'm a software engineer and a security researcher with bestial networks. I have a bachelor's in engineering, electrical engineering and embedded systems from Dartmouth, but really my interests are in applied RF security research. So that means everything from reverse engineering wireless protocols to developing functional base bands and software in HDL, and also all the way up to software networking stacks. So all these things are interesting to me, but I'm really excited about the material that we're going to talk about today. So before we get started, there aren't going to be any like zero days or traditional security-related exploits here, but we are going to take apart a cutting-edge wireless protocol. We'll talk about why that's important in a minute, but first I'd just like to survey the room and get a sense for who's here so I can figure out where to spend more of my time. So if you'd be so kind as to raise your hand if you've heard of software to find radio, that's a lot of hands. That's great. Okay, how about raise your hand if you know what a fast Fourier transform is or an FFT? Awesome. And how about a symbol in the context of wireless systems? Okay, cool. We're going to do well. This is going to be fun. So why is this sort of network forensics interesting? Or why is it relevant? Why is this important? The Cisco Internet Business Solutions Group has a figure that I really like that states that by 2020 there are going to be 50 billion devices connected to the Internet in some way. As we know with the growth of mobile and the Internet of Things, fewer and fewer of those devices are connected with wires every year. And as we know, tools like Wireshark and Monitor Mode weren't always a thing. Even for common interfaces like Wi-Fi and 802.11, those tools that we come to rely on every day exist because somebody thought to look below the layer they had and make it. And I believe that low-level access to interfaces is essential for enabling comprehensive security on various interfaces. So we're going to begin by discussing LP WANs at a high level, and then we're going to do a little bit of a background on some technical radio concepts just so we can level out our domain knowledge and inform the rest of the conversation. Then I'm going to take you through my recent reverse engineering of the Laura Phi layer that was powered through Suffer to Find Radio. And finally, I'm going to give you a demo of this tool called GRLaura that I've made that is an open source implementation of this Phi that will enable you to begin doing your own security research with it. So to begin, what is Laura? What is this thing? It is a wireless IoT protocol, and IoT isn't read because none of us are marketers, we're all engineers, we know that this is a dirty term, right? IoT is really code for connected embedded devices. And there are tons of common standards for embedded systems already. Everything like 802.15.4 and all of its friends like ZigBee and 6Lopan, 802.11, Wi-Fi, and then also more common things like Bluetooth and Bluetooth Low Energy, and the list goes on, right? We've got all these standards. What is wrong with them? Why don't we just use one of these existing ones? Well, all the ones that we just mentioned all require some degree of local provisioning. You need, say, to connect your device to an SSID or hook your ZigBee device up to a coordinator in order to get it communicating. Some of them require gateways to talk out to the internet. And in the case of 802.11, it's very power-intensive. So you can't run a device for a long time on a battery. So what's ideal? What about cellular? Cellular works everywhere. It's easy to install. You don't have to worry about any hardware on-premises. As long as you can talk to a tower that could be miles away, you're good to go. Well, it's power-intensive. And in the case of certain types of the standards, they're going away. And I'm talking about 2G, GPRS, and Edge service. In the United States, AT&T, one of the largest carriers, is saying they're going to sunset their 2G network in about three days. In Australia, this has already happened. Telstra, which is one of the largest telecom companies in Australia, has sunset their GPRS service earlier this month. And all the other major carriers are soon to follow. So 2G works everywhere. It's very battery-conscious, and it's fairly cheap. So this is exactly what the Internet of Things needs to power its communication standards. Now, say you're a developer and you want to move on to a new wireless standard that won't deprecate in three days. You can either go to 3G or a more modern cell stack, which comes with a more expensive radio and harder power requirements. Or you can wait for the 3GPP, which is the standards body that makes and maintains the cellular standards to come out with their IoT-focused standards that are currently in development. And the indications that I've gotten state that those won't be ready until the end of next year, really at the earliest. So it's going to be the end of 2017, the beginning of 2018, before we start to see these things in the wild. Which means that until then, there's a massive hole in the market. So if you want to develop a embedded system that requires this type of connectivity, you're going to have to look elsewhere. And that brings us to the topic of low-power wide area networks. And you can think of these networks as being just like cellular, but optimized for IoT and M2M communications. The architecture is almost exactly the same, in that you have a network of base stations or gateways worldwide, and then end nodes uplink directly to those base stations without any meshing or routing among themselves. It's just like a star network, basically. You have these nodes that connect directly to the base station, and they have a range on the order of miles. So very similar topology to cellular. There are tons of standards that are popping up. More and more every day. But the two that have the most momentum are Laura and SIGFOX. There's been a ton of investment in both of these technologies. Actually, just last month, Laura or SIGFOX closed a 150 million Euro series F, some late-stage funding round. In the Wall Street Journal wrote an article recently that stated that they were investigating a U.S. IPO soon. Additionally, Senate and Actility, two of the biggest backers of the glorify, have raised a combined $51 million in the last year or two. So one firm raising $150 million, they are absolutely going for it. They're investing like crazy in these technologies. So when we say that these networks are optimized for the Internet of Things, we're really talking about two things. They're battery conscious. SIGFOX advertises that they can get up to 10 years of battery on the amount of energy in a single AA battery and they're long range. And if you turn all the knobs and Laura just right and have a perfect noiseless channel, they advertise that you can get 13.6 miles on one of these very long range devices. And if you compare that with some of the standards we talked about earlier, that's pretty competitive. So how do they do that? How does that work? Well, they've designed the entire system around the fact that they're willing to accept compromises in the protocol and the functionality of these devices. Now when I talk about compromises, I'm talking about aggressive duty cycling, both the transmitting and listening, very sparse datagrams, so tiny packet sizes, and they're highly rate limited, meaning they can't send that many packets that often. Now, for example, SIGFOX limits, this is built into the PHY, limits devices to 142 byte datagrams per day. That's like nothing. I think that's less than like a UDPM to you. It's tiny. Now, in the weightless end, another LPWAN standard is uplink only, so it can only send messages up to a gateway but can't receive any downlink. So for example, if you had a device deployed, you could never deliver firmware to it later, unless you rolled a truck to it or climbed up the telephone pole to where it's mounted. And finally, Laura Class A devices can only receive downlink for a brief window after they uplink. So if you're an application operator and you want to send a message to a device you have in the field, you have to wait for that device to call home before you have your brief window to tell it what you want. So these systems are built around compromises, but that's what enables them to get some pretty incredible performance. All right. Let's get into the details with Laura. So Laura is an LPWAN that's developed by Semtech, which is a French semiconductor company. The PHY was patented in June of 2014, and the Laura WAN, Mac and Network Stack, was published in January of 2015. So this entire standard is less than two and a half years old. It's brand new. And it's supported by an industry trade group called the Laura Alliance, which has tripled in size every year since its founding, so growing quite a bit. Before we move on, I just want to clear up some nomenclature that will help us focus in on what this talk is going to center on. And that is disambiguating Laura and Laura WAN. Laura refers strictly to the PHY layer, the physical layer of the standard. Laura WAN defines a Mac and a networking, some upper layer stacks that ride on top of Laura. The Laura WAN standard, the upper layer has been published, and that's public, but the PHY layer itself is totally closed. So the Laura WAN upper layer stack gives some information about its topology. It's kind of interesting. It suggests that they were really thinking about security when they designed it. There are kind of four stages in the network. All the way out in the field on your sensor, you have the end node, and that connects to a gateway over a wireless link, that's the Laura link. And then once you get into the gateway, everything from there up is all on, it's all on IP networks, just standard commercial IP networks. And then they have roaming that works among different networks, so you'll be able to take your device and move to different areas of coverage and have it all play nicely, and then you can hook your application server up to that as well to receive packets to and from the network servers. It's all over IP. And they actually went as far as to define two different mechanisms for encrypting it. There are two different keys. You have the network key, which covers from the end node up to the network server, and then you have the application key, which is actually fully end to end. It goes from the end device all the way up to the application server. So if you design that right, the network should never see your traffic unencrypted. And they also provide a mechanism for having unique keys per device. It's built into the standard, but it's not required. So it's still up to the implementer to do that and get that right. So there are some good thoughts that went into security with LauraWan. However, that's not what we're talking about today. That's all we're going to say about LauraWan. We're just going to tell you it exists that it rides above Laura, but we're not going to go into any more detail than that. So from here on out, it's all Laura all the time. We're just talking about the file error. So let's get into what makes that really interesting. One of the big defining features of Laura and Sigfox, the two biggest LP Wans, is that they're designed to use what are called ISM spectrum, at least that's what it's called in the United States. It stands for industrial, scientific, and medical. And what's cool about these bands is they're what are called unlicensed, which means that you don't need a specific license from the FCC or your telecom regulation authority to operate on it. So if you go and you buy a new Wi-Fi router on Amazon, you take it home and you plug it in, you don't need to then go and apply for a specific license to be able to communicate on it. Because it was built to a certain standard, it is compliant with those unlicensed band rules and therefore can just work. So these devices use that same spectrum, but to much greater effect, much longer ranges and in a much different use case. So that's quite novel. And some other things that use these technologies are Wi-Fi, Bluetooth, cordless phones, baby monitors, things like that. So you can think of this as occupying the same space in the spectrum as these. Now, why is this noteworthy? Well, contrast it with the cellular model, where cellular technologies use protected spectrum, where you have to have specific rights to transmit on it in order to legally use it. And regulatory authorities sell the spectrum for fortunes. Like billions of dollars is what the spectrum sells for in the US. I'm sure it's the same over here. And I just want to call your attention to how expensive this is. On the left here we have a picture. It's an excerpt from a document that I found that was related to the FCC's TV white space reverse auction. They're trying to repurpose a lot of spectrum that used to be used for digital TV. They're selling it off. And if you want to come in and buy some really prime low UHF spectrum to use for whatever purposes you have, mind you, this is just one TV station in the New York area. You can get out your checkbook and write a $900 million check and take over WCBCS TV in New York. So getting into the cellular game is crazy expensive. It's cost of fortune. But there are a lot of us in here. Maybe we can pass the hat and buy some spectrum at the end of this. So as a result of this unlicensed nature, there are a number of different models of commercialization that are starting to emerge. We have the traditional telecon-like model. We're seeing through companies like Senate, which is a company that deploys heating oil tank monitoring solutions in the United States. They're also opening the network up for IoT applications to ride on top of that traffic as well. And you operate with them just like you would operate with Verizon or AT&T or I guess Deutsche Telecom or whoever you work with here. Also interesting is I believe it's KPN has rolled out a commercial Lora network, Lora WAN network, throughout the entire region of the Netherlands. So the country is entirely covered with Lora. So that's the commercial side. In the middle, we also have crowdsource networks. The one that I like to talk about is this group called the Things Network where basically they have defined in the cloud the network server architecture for operating a worldwide Lora WAN network. So if you want to provide Lora WAN service on the Things Network in your area, you can get your hands on a Lora gateway, point it at their network servers and basically become a base station in their network from your living room, which is kind of cool. So it can kind of spread and grow organically based on the needs of people like me and you who want, you know, this sort of service. Then finally, all the way up at the kind of independent amateur side, we have people like Travis Goodspeed and some of his friends that are working on a technology called Lora HAM. And that's leveraging the fact that you can actually get Lora radios that work in, work around 433, which is in the, I think it's the 70 centimeter ham band in the United States. So you can actually put a reasonable amount of power behind Lora and do text-based communications in the clear. So they're developing a Lora-based mesh networking system for doing basic ASCII packet radio and communicating. It's not public yet, but he's blessed me to come and tell you that he's working on this and it should be out soon. So there are all sorts of different ways to use these technologies. So this is a very different paradigm with what we're used to and it's opening up lots of different opportunities for how this technology might be used and grown. Okay, so that wraps up our background on Lora. We're about to get into some really technical stuff, but before we do, I want to go through a very short crash course on some basic radio fundamentals to try to even the playing field so that we can understand this. And I call it the obscenely short radio crash course, but with apologies to any PhDs or real telecom whizzes in the room, I think this is probably more appropriate. We're going to blow through this material and I'm just going to try to pick out a few points that are really essential to understanding the rest of this talk. I'll tell you what's important and just try to grab those concepts and we'll reiterate them later as we go through it. So again, we're going to be talking about the physical layer. And if you think about the OSI data model that we've all seen, the physical layer refers to how your bits, your data get mapped into physical phenomena that represent them in reality. And when you're dealing with wireless systems, that mapping maps the bits into patterns of energy in an RF medium. RF stands for radio frequency. And it's basically electromagnetic waves or energy that is just everywhere. And you can manipulate RF by using a device called a radio. And radios can either be hardware defined where the RF kind of mechanics and the protocol are baked into the silicon and are inflexible. Or you can use a software defined radio where you have some very general flexible silicon upfront that basically just grabs some raw information and feeds it to some sort of a processor, which can either be a traditional CPU or an FPGA to implement some of the more radio specific things. And SDRs come a long way in the most recent few years. And it's now incredibly powerful. So we're going to be talking about both hardware defined radios and software defined radios throughout this talk. So if you put together a radio coherently, you can start to develop it into a PHY. And a PHY has one main component or several components. But one of the main components is this notion of the modulation. And the modulation is the algorithm that defines how your digital values, your bits are mapped into RF energy. And there are a few parameters that we can kind of tweak to do that. And those are amplitude, frequency, and phase. And then we can put them together and use some combination of them as well. And modulators can modulate either analog or digital information. But we're going to be talking about modulating digital information today. And an essential concept with that is this notion of a symbol. This is something that's very important to remember. And a symbol represents a discrete RF energy state that represents some quantity of information. So it's discreetly sampled. And just think of it as being like a state in your RF medium that means something. And we'll illustrate this in just a moment. So here we have two pictures of two different modulations. And I just want to put these up here to help maybe get a grasp on what a symbol looks like. So on top we have frequency shift keying, where you can see your signal is alternating between two frequencies. When it's on the left, it's dwelling on one frequency. And when it's on the right, it's dwelling on another frequency. Which symbol is present is based on basically what frequency that signal is on at a discreetly sampled moment in time. So you could think of this as being like it's a zero when the signal is dwelling on the first frequency, the one on the left. And it's a one when the signal is dwelling on the right frequency, frequency two. And you can see the analog with the bottom modulation on off keying, where the signal being present represents a one and the signal being off represents a zero. So hopefully that helps you get a grasp of what it is that we're talking about. There are of course more complicated IOT fies. We have spread spectrum where data can be basically chipped at a higher rate. It'll occupy more spectrum but it makes it more resilient to noise. And then we have some technologies that do that like 802.154 is one that uses a spread spectrum mechanism. So we talked a bit about radios just a moment ago. We're going to use two different kinds of radios when going through this talk. First we have a hardware defined radio which is a microchip laura rn2903 module. And this is basically a dev board that has a hardware defined laura radio built onto it. So this is going to be our transmitter that we're going to be targeting. And then finally our receiver is this software defined radio right here. This is an edis usrp b210. It's just a commodity software defined radio board. And basically what this thing does is it gets raw rf information from the air, serves it to my computer so that I can start to work with it with commodity tools like python, numpy, can do radio, things like that to start to process it. One last thing to cover is the fast Fourier transform. The fast Fourier transform basically takes a signal and decomposes it into all of the smaller signals or subcarriers that compose it. And any periodic signal can be modeled as a sum of harmonic sine waves. So basically the FFT takes any signal and unravels it into the components. And why we care about this is it's basically a very easy way for analyzing and visualizing signals in the frequency domain. So when we take a bunch of FFTs and put them together we get this picture called a spectrogram where you have time in the ones that we're going to be looking at. I'll have time in the y-axis, frequency in the z-axis, frequency in the x-axis, and then power in the z-axis. So the intensity of the color is how powerful that component is at that instant in time. So here you can start to visualize all the different signals that are present. Okay, raise your hand if you're an expert. I see a few hands. Hopefully this is all that we're going to need. I'm going to reiterate some of these concepts as we go through. So I really hope this doesn't alarm you or send you running for the door. It's going to be very visual as we go through it, and hopefully the graphics will help keep this all grounded. So let's get into the meat of how this LoRa file works. LoRa uses a really neat proprietary file that's built on a modulation called Chirp Spread Spectrum, CSS for short. Now what is a chirp? A chirp is a signal whose frequency continuously increases or decreases. You can think of it as being like a sweep tone. And if we visualize it using a spectrogram as before, it looks kind of like this. In this case, we have a finite amount of bandwidth, and the frequency either increases or decreases. You can have up chirps or down chirps until it reaches the end of its band, and then it wraps around back to the bottom, back to the beginning, and continues. So here you can see that the first derivative of the frequency is constant. So the frequency is always increasing or decreasing at the same rate. And then when it hits the end of the band, it just wraps and keeps going. So why use something like CSS? It has properties that make it really resilient to noise and very performant to low power. So all these things with IoT-focused radios and having very long battery life, these are properties that lend directly to that sort of efficiency. It's also really resilient to multi-path and Doppler, which is great for urban and mobile uses. So this is an interesting set of features here. Where else do we see chirps? Radar. I just heard it. Thank you. Radar is a really common usage. And you'll see military marine radars will sometimes refer to chirps as wideband or pulse compression if they're using chirping in the radar scheme. And they're also used for scientific over-the-horizon radars as well. And there's an open-source project called the New Chirp Sounder that has some features like that for visualizing these over-the-horizon scientific radars. And also, in a past life, I worked on a scientific radar called Superdarn, which is a similar over-the-horizon radar for visualizing ionospheric activity. Cool. So that's a little bit of background on the technology here. So this is kind of my journey into starting to work with Laura here. In December of 2015, I joined this company, Bestial, where I am currently. And on the Threat Research Team, we have these weekly meetings where we get together and we look at new, either new RF techniques or protocols, things that are interesting. And we basically just have a deep brainstorm on how they work and what's interesting. And the first meeting that I participated in, it was the first week that I joined, they were mentioning, they were talking about these LP-WAN technologies. They sounded pretty cool. So when we broke for Christmas, I went back to New York, where I'm from, and, you know, brought my radio and started poking around and seeing what I could find. And my colleagues looked in San Francisco, Atlanta, and I also looked in Boston. I was there, too. And we didn't see Laura anywhere in December. Fortunately, a few weeks later, I was at a meet-up and I encountered this company, Senate. I was living in Cambridge, Massachusetts at the time, and they were talking about their home heating oil monitoring network. It sounded pretty cool. So I looked them up later and was watching one of their marketing videos. And there was like a two or three second bit where you could see one of their technicians operating a computer, right? And they put up this picture. And this looks just like a coverage map, right? So, you know, this could be fake data or it could be live. And I take a bit of a closer look and I realize where that is. That's Portsmouth, New Hampshire. That's like an hour away from Boston. So there's really only one thing to do. So I hop in my car. I drive up to New Hampshire and Main Border. And there's, you know, me behind the wheel of my sob with the USRP on the dash. And after about 10 minutes in the Marriott parking lot across the street from their headquarters, we have our first setting of Laura in the wild. There it is. It's the first signal I recorded. So let's take a closer look at what we have here. So if we look at the top third of the picture, we have a series of repeated upchirps. You can see the signal is just continuously increasing until it hits the band and then it wraps and continues. And knowing what we know about digital communication systems, most of them have some notion of a preamble or a training sequence to tell a receiver that, hey, heads up, you're about to get a packet. So probably what that is. Following that, you can see the chirp direction changes right in the middle. And you have two and a quarter downchirps. And this looks like a starter frame delimiter or a synchronization element. So this tells the receiver, hey, heads up, preamble's over. You're about to get the data. You're about to get the payload here. And finally, you can see the chirp direction again changes to be upchirps. But this time, the chirps are kind of choppy. You see, they jump around throughout the band, you know, just kind of arbitrarily. It's not arbitrary though. That's actually the data being encoded into the PHY. So here we can see that the chirp frequency, that is the first derivative of the frequency, the rate at which the frequency changes, remains constant, right? However, the instantaneous frequency may change within the band. So you may have these jumps, but remember that the rate at which it's changing is always constant. You can just have those discontinuities. And those instantaneous frequency changes represent data being modulated onto the chirps. And you can kind of think of this as being like a frequency-modulated chirp. Where with an FM signal, you have a static carrier, a carrier at a fixed frequency that you're modulating to produce that signal, the modulated signal. Here we're modulating a chirp signal to produce that. So rather than having a fixed frequency that you're modulating, you're modulating this continuous chirp. Cool. So let's get our hands dirty. Let's figure out how this thing works and start to pull some data out of it. Before we dive into demodulating it, let's take a look at what we know through some open-source intelligence. And using open-source intelligence is a great way to really kind of shortcut the reverse engineering process, because otherwise you can wind up doing a lot more work than you have to. So there are a few things that were really useful. And we'll talk about these as we go through this material. The first thing I found was the Semtech European patent application. It was in the EU market that basically defined a modulation that looked a lot like what Laura could be. And that's the number if you want to look it up later. But that had some pretty good information in there. Finally, or secondly, we have the Laura WAN spec. And again, that's the layer, you know, two and up spec that's open, not the PHY. But it still has some references and defines some terms that are likely going to be analogous to the PHY. So it's still pretty useful. And finally, we have two application notes from Semtech that were pretty juicy. The first one in there, the .181 contained a number of reference algorithms for implementing a whitening sequence, which is like a scrambler. We'll talk about that momentarily. And then .22 had just a general overview of the PHY to find some terms. Also, there was some prior art online. There was a partial implementation in RTL Strangelove that didn't really seem to be maintained. It seemed pretty neglected. And I never really got it to do anything at all. But it was still good to look at and had some really good hints in there. And then there were also some very high level observations in the PHY in this wiki page, revspace.nl slash decoding Laura. And it was mostly just like looking at the spectrum and seeing that it's a chirp modulation and a few example recordings and things like that. So from this documentation, we can start to pull out some definitions. Define we have the bandwidth, which is how much spectrum the chirp can occupy, the spreading factor, which is the number of bits encoded per symbol. And remember, the symbol is just an RF state, right? It's the number of bits in each RF state within the modulation. And then finally, we have this thing called the chirp rate, which we've kind of hinted at. It's the first derivative of the chirp frequency. So the rate at which that chirp signal is constantly changing. And we can pull some numbers out of this documentation to define those. So we actually have some common constants for the first two, and then we find a formula in one of those documentations that states the chirp rate is a function of those first two. And since there's a finite number of values there, we can start to, you know, iterate and just try all the different chirp frequencies and start to find one that works. So in this case, what is a symbol? Well, we've talked about how this modulation is basically frequency-modulated chirps, right? So what we're going to try to do with this demodulator is quantify exactly where the chirp jumps to whenever we have one of those discontinuities. So let's start working through it here. There are really three steps we're going to achieve. We're going to identify the preamble, which is the beginning of the frame, denoted with a 1. We're going to find the start of that, of the phi data unit by looking and synchronizing against the sync word, which are those down chirps that are there. And then finally, step three is we're going to try to figure out how to extract the data from these instantaneous frequency transitions. And to do that, we need to quantify them. Now, there's a technique that I found pretty early on that was enormously helpful for doing this. And that is to transform the signal by de-chirping it. And we'll show you what the result is in just a moment, but first, we're going to have to do some math. And math isn't read because it's scary, but it's not really. It's actually pretty easy. So there's a basic property of complex signals that states that if you multiply two signals together, if you multiply two signals together, the resulting signal has the frequency of the frequency of each of the components added together. And from that, if we multiply a signal with one frequency against a signal that has the negative value of its frequency, the result is zero. We get a constant signal. And we're working at baseband here, which means the center of the band is zero hertz, so we can see negative frequencies and things like that. So if you multiply an up-chirp and a down-chirp together, what do you get? You get constant frequency. Now, why do I say constant frequency rather than DC? If the chirps are out of phase with one another, there might be an offset from zero hertz there. So it might not be perfectly aligned with zero hertz. We might do expect to get some offset there. So what happens if you multiply a chirp signal like this separately against an up-chirp and a down-chirp? So two different operations produce two different products. What do we think is going to happen? Well, if you do that, you get these pretty pictures right here. So here you can see those really tricky diagonal chirp signals that are cutting all over your spectrum and are hard to measure are translated into these nice signals that are aligned in time. And that looks like something we can start to really work with and do something with. So we need to quantify those. So again, remember symbols. We're going to keep coming back to this. It's an RF state that represents some number of bits. And Laura has this value called the spreading factor that we found in some of the documentation that defines the number of bits encoded for symbol. And from the picture we saw a little bit earlier, the common values are seven through 12, or six through 12. You see them both in different markets. So from that, how many possible symbols do we expect there can be? Well, each bit can have, you know, two states. It's a zero or one. And there are spreading factor number of bits. So the number of symbols is two to the spreading factor. So how can we start to quantify these symbols and start to pull them out of the phi? So the steps that I found that were the trick to this were to channelize and resample the signal to the chirp bandwidth, de-chirp the signal with the locally generated chirp we just talked about. Then we're going to take a fast Fourier transform of that signal where the number of bins of the FFT that we compute is equal to the number of possible symbols. And we'll illustrate this momentarily. And then if we do that correctly, then the most powerful component in that fast Fourier transform that is the strongest component frequency that we get back from that operation is the symbol that we're looking for. So by de-chirping it, we get it into a form where we really expect there to only be one strong component per FFT. Whereas if we didn't de-chirp it, when we took the FFT of a chirp's worth of symbols, we would see the energy kind of spread all throughout all the different bins. But by de-chirping it correctly, all that energy gets pushed into one bin, and we get a single, like, clear value out of it. So if we do that, we get a picture that looks like this. And here the z-axis again is the intensity, the power present, and we expect that to be the symbol that we're looking for. And here it's aligned in time with the base chirp on the left there. So here are the steps. Again, we mentioned this earlier. Let's look for the preamble. What's a stupid simple algorithm for finding this? Let's de-chirp it, let's do an FFT, and let's look for basically the most powerful component being in the same bin for some number of consecutive FFTs. Easy. Finding the SFD is the same thing, but again, this time we're going to do it on the opposite de-chirp product. So when we de-chirp it, we get back two different streams. We get one of the de-chirped up-chirps and one of the de-chirped down-chirps. So we can look at the opposite stream and do the same algorithm looking for the SFD here. Important caveat, accurately synchronizing on the SFD is essential for getting good, good data out of this de-modulation. Because if you have a bad sync, then you can wind up having your, basically your symbols, the samples that comprise your symbol, spread between multiple adjacent FFTs. And if that happens, then you get incorrect data. Now, let's illustrate what that looks like. If you look at rows 39 and 50, you can see that visually it's almost impossible to tell which of those two readings represents the symbol. You see there are two different values that are really powerful. That's the result of basically half of the samples from one chirp and basically half of the samples from chirp n and then half of the samples from chirp n plus 1 wind up in the same FFT. So when we do it, we get those two components in there and it's really, it's really ugly and hard to work with. So we can solve this by using a technique called overlapping FFTs when looking for our SFD synchronization. And basically what that means is we're going to process each sample multiple times with the effect of getting better resolution in time of our resulting FFTs. It's more computationally intensive, but it gets us much better, better fidelity here. So if we do that, this is what the result looks like. It's a little bit hard to see right now. I'll get you a better picture in a moment, but basically it's much less ambiguous in terms of which symbol is present. So if we use those overlapping FFTs, we can synchronize on that SFD and then once we know exactly where the first symbol of the PhiData unit is in our buffer, we can go back to using non-overlapping FFTs, which are more computationally efficient, and get us this nice read on the right. Here you can see that again, if we look at lines 38 and 39, that ambiguity is gone. You can see exactly where the most intense bin is and therefore which symbol is present. And here's the whole frame synchronized. So we got the collisions on the left and it doesn't look that great. On the right, it's much clearer. Cool. So again, we recompute, more computationally intensive, and then we get out our data. Now, one last thing we have to do to wrap up the demodulation. So doing this, again, remember when we were talking about the chirp math, if our chirps aren't perfectly phase aligned, then the resulting D-chirp signal might not necessarily be off of the same reference, right? And of course, we don't know what chirp was used to generate the signal on the transmitter. So we have to find some way of normalizing this data to account for that phase discrepancy. And we can do that by referencing the preamble. And it just so happens that the preamble, when you de-chirp it, always represents symbol value zero. So you can basically just do a modulo operation on your received symbols to rotate that back, so all the symbols are referenced off of the preamble and you're good to go. And that's it, right? Not even close. We're just getting started, people. Why is that? Because the data here is encoded. What is encoding? Basically, encoding is a transformation that is applied to the data before it's transmitted. Why would you do something like that? Because encoding increases over-the-air resiliency. Why is this necessary, right? Remember that we're dealing with unlicensed spectrum, right? This is what the 900 megahertz band, which is what Laura uses in the United States, looks like. Look at all that stuff that's not Laura, right? That stuff is there to ruin your day. It's there to create all sorts of interference and make your receiver not work the way you expect. So RF is a really brutal environment. There's all sorts of interference. And basically the encoding is a way of treating your data so that even if you have a non-ideal reception, you can still get the data out of the frame. So what do we have here? Remember that Laura's closed source. We have some material that's available through datasheets, but we really don't know for sure definitively what's in this file. So again, we're going to go back to open source intelligence, figure out what we know, and then try to narrow in on how we're going to iterate through this and figure out how it works. So from the patent, we have a number of very good clues. First of all, it refers to this stage called gray indexing, which, as is defined there, should add error tolerance in the event that you read a symbol as being off by one bit, basically if you read a symbol in the incorrect bin. Secondly, you have data whitening, which induces randomness into the frame. We'll talk about that momentarily. You have interleaving, which scrambles the bits within the frame. Then you have forward error correction, which adds correcting parity bits. You can think of it as being like parity bits on steroids. Rather than telling you that just an error occurred, it can actually help you correct the error without needing to retransmit. So we have four different things to comprise the encoding, and they're in the patent, right? So that's awesome. It's easy, right? Why is that? Because documentation lies to us. And even the clearest signals can lead us into dead ends. So let me show you how. So the gray indexing, we read to represent gray coding, which is just a basic binary transformation that you can use to treat data. Whitening, we actually have defined, in one of the application notes, reference designs for the pseudo-random number generators for the use of the whitening. It's like C code that you can copy and paste. So this should be rock solid. Step three, we have an actual algorithm for the interleaver that is defined in the patent. I'll show you what it is momentarily. And then finally, step four suggests that a hamming code is used, which is just a standard forward error correction mechanism. So the first thing we're going to focus on figuring out here is the data whitening. And that's a critical step, because basically the way the whitening works is you XOR your message against a random string. And unless you know what the random string is, you're not going to be able to make any sense of what follows it. So figuring out that pseudo-random string is essential to being able to even make sense of what follows it. So again, with whitening, you take your buffer that's going out to the radio, and you XOR it against a precomputed pseudo-random string. It is known to both the transmitter and the receiver. Then when the receiver gets in the frame, it XORs the received buffer against the same sequence that the transmitter used. And you get back the original data, because if you remember, XORs its own inverse, so that nicely undoes itself. Now why would we bother with whitening? And that's because having random data is really good for receivers. It's similar to Manchester encoding, where basically by encoding the data such that you don't have some number of consecutive values, some number of consecutive symbols of the same value, you get this nice random data source. And what that does is it creates lots of edges for your receiver to do clock recovery against. So you get better reception of longer messages, or if your clocks are bad. Manchester, of course, comes with a penalty of a reduced bit rate. It actually cuts the effective bit rate that you can use into half of the baud rate, whereas whitening does not. The caveat is that you have to know what the string is in order for it to work. So let's find the whitening sequence. We've got these algorithms in the application note. We've got some examples in strange love. None of them worked. So we had to figure this out empirically. How can we do that when there's interleaving and forward error correction in the pipeline here, right? We can send something that might put the whitening in a certain state that we could leverage, right? But we still have these unknown transformers that follow it. How are we going to be able to figure out the whitening when those operations are in the loop, too? Well, we need to bound the problem and make some assumptions that we can start to iterate through this black box problem. So we're going to assume that the forward error correction is what the documentation tells us it is, the hamming N4. And we're also going to make another assumption. And we're going to set the spreading factor equal to 8 bits per symbol. And basically if you do that, then it makes it such that we'll have exactly one hamming 8, 4 code word per 8 bits per symbol, right? Because if we set the number of total bits in our hamming error correcting code to 8, 8 bits per symbol fits very nicely and should work out well. Now there's another very useful property of the hamming forward error correcting code scheme that we're also going to exploit. And that's that hamming 8, 4 contains four data bits and four parity bits each. And for 14 of those 16 states, again remember two possible states per bit to the power of 4 data bits per code word. And each of those, in 14 of those 16 code word possibilities, there are 4 ones and 4 zeros each. However, for the word for data nibble 0, that's 4 zeros, the code word of that is 8 zeros. So it's totally non-additive. So if we send our forward error correcting scheme a string of zeros to apply itself to, it's totally non-additive. We get back twice as many zeros. So we can leverage that to try to cancel out that forward error correcting stage. So let's go ahead and transmit a string of all zeros, right? So again, if it's hamming 8, 4 as we assume, we expect that stage 4, the forward error correcting code to cancel out, right? What about the interleaver? Let's take a look at the algorithm that's suggested in the patent. There it is. The key takeaway from this is if this is implemented in a way that's similar to this, is this should be totally non-additive. So this should just move bits around but not add any bits, right? So if it is in fact non-additive and all we pass through are a bunch of zeros, what happens when you shuffle around a bunch of zeros? You get the same thing out. So that falls away too, right? So we're left with two states, right? We have our symbol gray indexing stage and our data whitening stage. Whitening is what we're solving for. That's our variable. And gray indexing, the quote unquote gray indexing, is a bit of an ambiguous term but it likely refers to some variant of gray coding which we mentioned earlier. But even if it is gray coding or nothing at all, it's just something they didn't implement, that leaves only three permutations here, right? So we've just reduced all the ambiguity of figuring out what this decoder is to really figure out what the whitening sequence is, to really just figuring out which of the three states, which of the three operations, this first gray indexing stages, right? So if we do that and we trial three, that's only three things to attempt in order to derive the whitening sequence from the transmitter. Because again, if we send through a string of zeros, what does the whitening do? It XORs the zeros against the pseudo random string. And what does anything XOR with zero? It's the input. So we can do this and get the transmitter to tell us what its whitening sequence is. So we can implement the receiver, read that out, plug it back in, and then start to solve for the rest. Cool. Next stage is the interleaver. Again, we had that formula from the patent, surprise, surprise, implemented it. It was no good. So let's figure out how this works. Now, we're going to move very quickly through this because this was the hardest part of all this. And I'm going to show you the process without making you waste all the time of staring at a bunch of graph paper and trying things that kind of went into this. But again, just like with the whitening sequence, we're going to exploit properties of the hamming effect to reveal patterns in the interleaver. So again, if we look at our hamming 8.4 code words that we know and love that are very useful, we're going to use this time the code word for four ones, or the code word for hex F. And in that case, the state of that code word is eight ones. So if we construct a bunch of packets, we're basically, we take eight symbols, or we start, we take four bytes, which is eight symbols in SF8, and we walk the position of those ones through our frame here, we can start to look for patterns. Who sees it? I'll save you the trouble. Who sees it now? Look at the bottom row second from the right, and you'll see the pattern. Basically it's a diagonal interleaver, but the first two, the two most significant bits are flipped. So if we take this and then read out, basically we can take this and we can start to map those diagonal positions into positions within a deinterleave matrix. So if we do that, we walk through all the different states and map those positions out with data that we know, we get this nice table. Now let's put this table next to the data that we're looking for, right? So here we've decomposed the hamming code words for the data that we passed in, which is, of course, our beloved dead beef. In the middle column on the left, we have the data values, the four data bits that we're looking for, and then the column, the right column on the left there are the parity bits that we're looking for. Again, I'm going to make this easy for you. If you stare at this for long enough, you become compelled to reverse the bit order, and then if you continue staring at it, you start to see some patterns. That looks like our data, right? So if we go a step further, we can start to map in some of these hamming correcting fields into this matrix here. So here we see the four data bits, the rightmost bits, and then we can see that parity bits one and two correlate very nicely. And if we go a step further, we can see that bits five and four map very closely as well, although they're flipped. You'll see that parity bit four is actually more significant than parity bit three. So we're almost there, right? All that we have left to do is apply our feck and we're done, and that's the modulation. That's the whole thing. So again, let's... Thank you. So again, let's talk briefly about these red herrings and try to wrap this up. I want to do a demo before our Q&A. So we had these four different decoding stages here, right? And we had great documentation for all of them, but empirically, after implementing them, we were able to establish that, well, three of the four just weren't the case, right? One of them was actually cool, right? One of them was actually what it said it was. So yeah, anyway, how were we able to work through this? I think it's important to reflect and try to get some takeaways from this. Hopefully, this is useful as you approach your reverse engineering challenges. Basically, what was essential here was being able to bound the problem and hold certain things constant so that we could solve for unknowns. And if you remember, we kind of did this in two stages. We were able to cancel out the interleaving in the forward error correction and hold that standard or hold that static in order to figure out the whitening sequence and the gray indexing were kind of all in one go. And then when we controlled the gray indexing in the whitening sequence and were pretty confident about what the forward error correction was, there was really only one variable that we really had to solve, really only one thing we actually had to go into the bits and really kind of dig out of this thing, right? So by making these assumptions, using open source information and really bounding the problem and working through it coherently, we're able to reverse these four stages down into really one experimental variable and just solve for it. So that's really the trick here. Okay, I'm going to blow through this next part, talk very briefly about the structure of the Lora Phi packet. So this is a picture pulled out of one of the data sheets. We already talked about the preamble, those repeated chirps. One thing that's not pictured here is the sync word and the start of frame delimiter, which is right there. And then we have this thing called the header, right? And it says here that the header is only present in explicit mode. So there's this notion of implicit versus explicit header in Lora. And the explicit header includes a Phi header that has some information such as the length of the payload, the type of effect scheme in there that's applied to the remainder of the payload, not the header itself but the rest of it. And then there's also an optional CRC as well that can be included. And implicit assumes that the receiver knows the modulation parameters and skips that bit. So no problem, right? We can use implicit mode, figure out what the whitening sequence is, and then switch back to explicit mode, use the whitening sequence from implicit and figure out what the header is by just looking to see what the values are as we change the modulation. Yeah, right. None of this is easy, right? Nothing helps us here. So as it turns out, implicit and explicit header modes use different whitening sequences. So the Phi header remains obfuscated even if we know what the implicit whitening sequence is, the implicit mode whitening sequences. So let's see what we know. Again, we've got this header here, and in this picture it tells us the code rate is always 4-8 for the header. So no matter what the code rate, that is the number of bits in handing forward error correcting codes used is for the rest of the packet, this code rate is always 4-8. Well, what about the spreading factor? As it turns out, the header is always sent at the spreading factor that is too less than the rest of your modulation. The code rate is still 4-8, but the spreading factor for the header is spreading factor minus 2, so two fewer bits per symbol, even if the header is implicit. And I have to credit Thomas Telcamp for giving me the tip that actually led to kind of putting this all together. Thanks to him. So again, the first state symbols, no matter whether you're in implicit or explicit mode, are always sent at SF minus 2 and code rate 4-8. That's always the case. Also, there's this mode called low data rate, where if that's set on, then all of the symbols in the remainder of the Phi packet are also sent at spreading factor SF minus 2. So it's just an extra, it basically gets you some extra margin in case you're dealing with a noisy channel and need to get data through. That's the Phi. Who wants some tools to go with it? Who's curious about this and wants to start playing with it? Does Laura seem cool? So with that, that brings us to GR Laura, which is an out-of-tree GNU radio module that I've been working on for the last couple months. And it's an open source implementation of the Phi that works very nicely with the GNU radio, software defined radio, digital signal processing toolkit. It's open source software. It's free software. It's got a great community built up around it. It's really cool. If you're curious about this DR, there are loads of good tutorials. And even if you're a wizard, well, if you're a wizard, you already know what this is. But it's a really, really great, great piece of software and ecosystem. And why is having an open source version of this interesting? Well, existing interfaces to Laura are at layer 2 and above, both with the data sheets that we get that go with each of the different Laura radios, and the standards that are available and open, it's all a layer 2 and up. We don't have any insight into what the Phi state machine actually does. And PhiLayer security really can't be taken for granted. And to back this up, I'm going to point to some 802.15.4 exploits that kind of reinforce this. From a couple years ago, we have Travis Goodspeed's packet and packet that showed that he was able to do a full 7-layer compromise by basically encoding the data that would induce the preamble and sort of frame symbols for 802.15.4 within the payload of another message. He was able to get some really wonky things to happen to radio state machines in doing so. In relation to that, we have this wireless intrusion detection system evasion that was done by Travis Goodspeed and some friends of mine from Dartmouth, where they were basically able to fingerprint how different 802.15.4 radio state machines work and construct packets that would be able to be heard by some but not others. So from that, you could basically bend the Phi, generate versions of packets that weren't totally compliant with the standard, but would still be heard by certain receivers and not others. So some really tricky stuff here. Phi's really matter. You can't take them for granted in the picture of security. So my hope with this is by getting this tool out there, we can actually really start to look at this interface and figure out how it works and how it can be made better and really start to get involved with improving the security of this new protocol. So there's some prior art to sight. Josh Bloom has a module for Pothos, which is kind of like a competitor to GNU radio. It's like another framework. It gets the modulation right, but the decoding is basically off of the documentation. So it can talk to itself, but it can't talk to actual hardware because it doesn't implement the real decoding stage that we had to reverse engineer. And also there's another GR lore out there made by this guy RPP0 on GitHub. When I first looked at it, it was like this Python thing that I couldn't quite get to work. I went and looked at it again last night. It actually looks pretty cool. So you might check that out too if you're interested in this. It looks like it's pretty solid. So my GRLaura implements modulation encoding into separate blocks so that you can be modular in experiment. So if you want to have like a multiple kind of like a common two-layer forward error correcting thing, better resiliency, you can write that in without having to touch the demodulator. It's all decoupled for you. Also, there's a very simple asynchronous PDU interface for passing data between the blocks. And you basically write to it just using WebSockets, which is really easy. I'll demonstrate in a minute. And it's just like IEEE 802.15.4, which is a GR IEEE 802.15.4, which is a really great module made by Bastion, who I think is here. Really cool tool. I use it all the time. So the demodulator and the decoding implements the process that we just reversed engineered using the stacked FFTs and all that. The modulator and the encoder use a more efficient method that does direct synthesis of the chirps. So rather than like basically computing the FFT results and then doing an IFFT of that, we can actually index into a pre-computed chirp to make the generation a lot more computationally efficient. Do you want the source? It's right there. Just pushed a giant update to it about two hours ago. So if you're interested in playing with it, there it is. Let's run through a quick demo before we're out of time here. So here's a scenario. I've written you guys a poem, or I'm going to play you guys a poem, and I want to be able to sniff it and show you what it is, right? So to transmit, we have our Adafruit. It's an Adafruit radio, like an Arduino basically, with a Lora radio on it. And to receive it, we're going to use our USRP right down here. And of course, it's all being received by GR Lora. So I'm going to jump over to my VM if I can. Let me see if I can get this up on the other screen. Bear with me one moment. There we go. I'll show you all the entropy of my password. We're going to start our receiver here. And now I'm going to just open a socket here, and I'm going to start my transmitter. And let's see what we have for you in case you're unsure of what you're looking at. So that's all over Lora. There are a few to-dos. If you want to contribute, be happy to have you do so. Some additional resources if you want to know more. I've written this up all in detail in Travis Goodspeed's POC or GTFO. The most recent issue has that in there. Also, if you want to learn more about radios and SDR, my colleague Mark and I are giving a talk at Schmucon and Troopers called So You Want to Hack Radios, which is going to go through how to reverse engineer really basic IoT modulations. It will spend a lot more time on some of the basics and show you how to actually apply the stuff yourself. To wrap up, LP WANs are exploding. They have tons of momentum and are popping up everywhere. RF stacks are also becoming more diverse. So when you're talking about securing your wireless airspace, you're not just worried about Wi-Fi anymore. If you're a corporate security administrator, you're working corporate IT, you also have to worry about all these other IoT appliances that are coming into your enterprise and are starting to take root. On a technical note, we've shown how to go from some obscure modulation into bits, and we've also added a new tool to the researcher's arsenal. I want to thank Bal and Sieber at Bastille. He's an incredible resource and this wouldn't have been possible without him. Also, the open source contributors who helped us all get here, and finally the Chaos Computer Club for organizing 33C3 and having me. So thank you very much. Thank you for your attention. I'd be happy to take your questions. We are almost out of time. Thank you very much, Matt. We're able to take very few and brief questions. So microphone in the front right, please. I remember you. We met at the New Radio Conference. Good to see you. Yes. Are there ways to quantify the reliability of a dense Lora network? Could you repeat that, please? Is there a way to quantify the reliability of a dense Lora network? I'm sure there are. I haven't really looked at all at benchmarking or figuring out what kind of the limits of fire. My interest has really been in getting the decoding and information extraction done. I know that there's a group in San Francisco that's building, they're called Beep Networks. They're building a Lora product or network of some sort. They've done some benchmarking of how well Lora works in cities and they have a blog post that's pretty good. You might check that out. We have one question from the internet via our signal angel. Our panel on the ISC is asking how long did it take to figure out all of this? I first saw Lora in the wild in January and let the capture sit in my hard drive for a while. It probably took about four or five weeks of working on this more or less full-time. I had some other things working on, too. I'd say probably four weeks from when I actually said, all right, let's figure this thing out to having the initial results. Another question from the rear-right microphone. So in decoding those two unknown layers, you had your proprietary hardware and you could send it data and it won't do the AES and encryption stuff and it just sends that with its encoding? That's a great question. I skipped over that. The microchip Lora radio that I had, this guy right here, I also looked at another one that was a Lora WAN radio. This is a Lora WAN radio but actually exposes an API to pause the MAC state machine. So you can turn off all of the layer 2 stuff that would add a header and encryption stuff like that and send what are close to arbitrary frames. And why I say what are close to arbitrary frames is because you can't turn off the implicit header. So it's always an implicit, sorry, you can't turn off explicit headers. So it's always in the explicit header mode. So this more or less exposed raw payload injection. Okay, thanks. We're already in overtime. We're taking one last question from our signal angel on IRC and then we'll have to wrap up. I'll be happy to hang out and answer questions after the fact too. Now many people are wondering what implications does it have that basically the patent is not used at all? So could you say that this technology is patent free in a way? I am not a lawyer but I have known lawyers and I know that they're clever enough to not fall for that. So I'm sure that the patent was defined as generally as possible. And again, it describes a modulation similar to Laura. Again, not a lawyer but I'm almost certain that it would be covered. But that's a clever thought. Thank you, Matt Knight. Please give him a warm round of applause. Thank you again.