 Welcome to the Bluetooth device database. Ryan Holman, take it away. Thank you. Thanks for coming. Last talk of the day. Everyone having fun so far? So this project is a little more high level than some of my previous projects. If you're getting into Bluetooth, it's really fun to see it at this type of view. And I don't believe that it's really ever been looked at in this way. And I believe that you really need to look at devices in the wild in order to understand the technology. So basically, in short, this project was to beat the largest scale Bluetooth device survey that I've seen out there. And with it, I gave a lot of support for the community to participate. So I have a lot of clients and I'll tell you how to install them later on. But if you want, go out and install some of this stuff and send me your data. It's for everyone, not just for me. So a little bit about myself before I get started. It's a great picture of me. I'm a senior server, or I used to be a server developer. I'm a security researcher at Zifton Technologies where we do a lot of end point security analytics and management solutions. If you haven't heard about us, check us out. We do some cool stuff. In my spare time, I spent a lot of time tinkering with Bluetooth. I have some Python UberTooth projects that I have up on my GitHub repo. Fun stuff. I have fun doing this stuff in my spare time. Follow me on Twitter to follow any other projects I'm doing. And all this research today is kind of just of my own. It has nothing to do with Zifton. They just allow me to come out and present to you guys. So as I mentioned before, at its core, the Bluetooth device survey, the Bluetooth database project was one of the widest-scale Bluetooth surveys that I'm aware of. And I did it because I had a lot of questions about devices in the wild. And there weren't any data sets big enough and broad enough to answer a lot of my questions. So I kind of just took this on. And as I kind of went through this and started doing a lot of analysis on the data, I became aware of a lot of things that I really wasn't aware of before. A lot of questions got risen. And I was able to come up with a lot of answers for them and learn some new things. And so all of this project was done only on discoverable devices. And a lot of people say, hey, why did you only do discoverable devices? You do a lot of passive stuff in your spare time. Well, the number one reason was convenience, right? I could write a simple iPhone app and put it in my pocket and walk around and scan things. People don't really like it when you walk around with big antennas and crowds and computers. So that was the main reason was for convenience. That and secondly, there just wasn't anything done at this wide of a scale before. And when we look at stuff with passive monitoring, we miss a lot of the information that I was interested in, such as the upper half of the Bluetooth addresses. So what was I interested in? Mostly Bluetooth addresses. But along with this comes device information as far as the device name. Since I was doing all this from mobile devices, geo location was really easy to add in there. And any other metadata that I get with a simple scan. So this would be device class information, et cetera. I did nothing that was actually probing Bluetooth ports or anything like that. Because that typically takes time and by the time a Bluetooth object moves on, you wouldn't actually be able to complete the scan. So what were the tools that I used for this project? Everything about this project is open source. The clients and I have a simple server too that you can implement if you don't want to send all of your data up to my managed server. But the main client that I created was an iOS client. It will not be in the app store. I had to leverage a lot of private libraries and APIs that Apple does not allow to go through making the application run in the background 24-7. So they don't allow you to put that stuff there. But it's up on my GitHub repo. You can compile it and throw it on your phone really easily. For cross-platform support, if you saw Joseph Cohen's talk about an hour ago, actually in order to support cross-platform, I hacked up one of his Blue Cat clients. So I have a fork of it on my repo and basically the data that it collects will ship off to the remote server too. The server, if you want to participate in this project but you don't want your data to go out to the public, I have a simple server implementation on my GitHub repo too. So you can kind of just change the URL of the client and have a report to your server too. All the data that I collect or any data that goes to my server, I'm completely open to people. I do a database dump about once a week so you can't use the information to track someone. I'm kind of giving them a weak head start, right? So there are similar projects. There was nothing that kind of like hit this broad of a scale. BNAPNAP is something that is similar. This was a project run by Josh Wright where he was trying to correlate the upper half of an address which is vendor specific to the actual device that it correlates to. And so this was kind of a little bit different. It didn't really follow what I wanted to look at. J.P. Dunning, he did a couple years ago, he did the Bluetooth profiling project. And it was similar to mine. It didn't have the capability for shipping to a remote server like these clients do. It didn't have geolocation information. It was mostly looking at unique devices whereas repeated sightings of my data is really important. Things like this and, you know, Bluetooth changes from year to year. His last scan was, you know, from like two years ago. And so, you know, you wouldn't see things that came out in the last year or two. The stuff kind of changed really quickly. There are some clothes source projects that kind of do this detection stuff. So wireless works is a company. If you've been paying attention to the media lately, there's been a big stink about malls and department stores tracking you based on your cell phones, wireless probing, right? Wireless works is kind of the Bluetooth version of this. Where they track you going into a store based on Bluetooth information. There's not a lot of information about them online. So if anyone, you know, is familiar with them, I wouldn't mind talking to you after the talk. Houston's TransR system. They use Bluetooth to, even though I'm from Texas, I have nothing to do with this project, but they use Bluetooth to detect traffic patterns in Texas highways. So obviously these projects don't, you know, open the data sets up to the public. So it wasn't much use to me. But it's interesting just to know that they're out there. So the database right now currently has over 12,000 sightings and around 5,000 actual unique devices. This is kind of just like my collection over time for the last couple months. As you can see, Vegas was pretty lucrative. It would be the last couple days on the end of the time series here. And so one of the first questions I had, you know, what is the most popular discoverable Bluetooth device out there? If you'd seen Joseph Cohen's talk a few, like an hour or so ago, you might have a hint at what it is. Anyone here want to take a guess? No, no. It was Blackberries actually. Blackberries win by landslide. So this is broken out based on device name, which there is a bit of air here. But I had to munch some of the data. A lot of the devices that we get back in have generic names. You don't actually get the name on the first scan, so they would just be kind of bucketed into mobile phone. So for this data set, I just truncated that information off. But for the most part, if you take the top ten in this list, it's pretty accurate as far as what's really out there. You know, Apple products, Macbooks, iPhones, Roku, just going down a list of the top five. Bluetooth TV sets, Streck TV, iMacs, iPads. I'll kind of get into some of this information a little bit. So some of the cool things that we can do, I kind of truncated a lot of this material. I thought this was a 20 minute talk, but they gave me more. So I get to take my time on this. But some of the cool things we can do with geolocation on this is the way that I was collecting data was a lot different than a lot of other techniques where most techniques as far as like wireless works or the Texas Department of Transportation, they're basically a stationary cell. And basically you can assume that anything passing them is a mobile device, right? So it's moving. With my survey, it was a little different. I was the moving device. So it's really hard for me to determine am I seeing a moving device on the other side. And so in order to do this, I would take, I would have to see a device more than once. I need two or more sightings and I would take the too far, this geolocation points for that device in order to correlate and bucket it into how much this device actually moves. So for this, I'm getting about, you know, over 70% of the devices that I actually saw were moving. And you can kind of bucket them out into how much distance I actually saw them moving. So because I'm one person, it was, you know, it was rare for me to see things at a higher end chain, right? So you can see that I only had about 5% that I actually saw move more than five kilometers at the end. Cool way to look at the data and I liked it. Another thing you can do that's pretty cool with geolocation information is you can look at the reoccurrence of it in your data set. So in this top picture here, this is a local Costco that I go to, you know, time after time and I always have my phone on, you know, scanning the devices in there. So on this map, you can see the blue dots and these are devices that are stationary and local to this particular Costco that I go to. And all of the red dots would be, you know, devices which are, you know, not local to this. So these would be most likely people, you know, traversing the store at the time. So that's one way you can look at geolocation information with the stuff which was pretty neat. The other way I call it solving the small world phenomena where that would be assume that, you know, you live in Cleveland and your friend lives in LA and somehow you meet in Denver and you say, oh, wow, what a small world, you know. So you can kind of do this with Bluetooth information too. So this bottom picture here is the route that I skateboard to work every day and it's an access road so there's not a lot of traffic that goes by. But the blue dots denoted here in this image are cars that I pass multiple times whereas the red dots are cars that I would never, you know, see again. And so, I mean, this is something that you would never really realize on your own. You know, you're not going to memorize every car that you pass every day. So it's kind of a cool way of looking at the data set. And so on to geolocation was cool. It wasn't really why I got into this project. It was just something I could tack on in order to kind of see the data in a different way. I was mostly interested in the Bluetooth address space. And I threw this slide in here because I understand that not everyone is familiar with the Bluetooth address space so this is my quick primer of it. So in Bluetooth, addresses are laid out a lot like a MAC address and basically you have the upper half being vendor specific and the lower half being device specific. The device specific half, the LAP here is supposed to be unique across devices. And once you get into the vendor specific part, we kind of split it up into the NAP and the UAP. UAP is something that when we do passive monitoring techniques in Bluetooth, we don't always get it. And the NAP we never get. And even though all of this data that I actually did the research on was for, you know, discoverable devices, I wanted to take that data set and use it in my techniques for passive Bluetooth monitoring. So I don't know if I mentioned, but basically when we do passive Bluetooth monitoring, the LAP is the only thing we're guaranteed ever. So one of the things that I really wanted to determine, if I were to go back and do this whole survey again with only, you know, Bluetooth passive monitoring, I would most likely only get LAP addresses for geolocation. And so what I really wanted to know is our LAP address is unique or our vendor is just kind of printing and pressing them out and, you know, reusing the same LAP over and over. And lo and behold, it turns out that they're not. LAPs are actually pretty evenly distributed. So of all of the devices I saw, which is around 5,000, I only had one collision and that happened at around 3,000 devices. So, which isn't too bad. That's an acceptable loss for me if I'm out scanning a whole bunch of devices and I get a collision every 3,000 or 4,000. That's not too bad. This graph here really doesn't mean anything, it just kind of looked cool. It was basically from, you know, 00 to, you know, FF, like the whole 256 bytes that you can get for all sections of the LAP, just how evenly it is distributed across. You can see there's no hotspots and that kind of leads me to believe too that it is pretty unique. UAP. So the UAP is something that we do get in passive monitoring sometimes. It depends on whether or not the traffic that goes over the wire has a payload. So by looking at it in active devices across the board, we can kind of, you know, drive some cool information about it. It looks as if, you know, the whole address space is pretty much used for UAPs and there is a hotspot for popular UAPs. This can be used for, you know, mostly if you were to grab the LAP, if you only had an LAP and you wanted to just derive what the most probable UAP is, you could use the top one. I guess if you really wanted to, you could use it for brute forcing UAPs, although it's probably not the most effective thing. But it was just interesting. I saw that there was a hotspot with UAPs. A lot of UAPs are used more than others. And this is only the top 35 UAPs. So basically the last, you know, 200 sum of them, you know, you're getting down into one device, citing per UAP. So it tails out pretty nicely. And so this was the coolest thing. This is, I don't know, I'm a nerd. This is my favorite part of all the research. So in passive monitoring, we do not get NAPs. So what I really wanted to do was see if I could derive an NAP based on probability of the rest of its address space. So in order, you know, you need an NAP if you want to correlate a device to a vendor. And since we don't get this in passive monitoring, what I was looking for here is are there higher probabilities of getting particular NAPs based on a UAP? And so this is pretty interesting. Right here I have, basically this graph is just the first eight addresses, first eight UAP addresses based out of the 256 possibilities from 00 to 07, just for this. But you can see for every UAP, I have correlating NAPs with that. And so what we can basically do here is say, if you have a UAP, what is your most probable NAP based on devices seen out in the wild? As it turns out, this is actually pretty good for coming up with a high probability of what it is. Worst case scenario, I think there's eight NAPs associated with one or two of the UAPs on the list. But a one in eight probability isn't too bad. And then you're going to be able to see which ones were actually used the most. So you can kind of narrow that down into the highest probability. So that was pretty interesting. And like I said, that's your worst case scenario. Best case scenario, which happens to be for the majority of all UAPs is there's only one to three NAPs actually associated with those UAPs. So for the majority of the time, you can narrow it down to one or few NAPs that are actually associated with it. And then kind of increase your probability based on how many times you saw it. So this was interesting. I think last year I kind of touched on this subject, but I was just looking at vendor lists. So my NAP probabilities were, you know, I would have 40 to 60 possible NAPs, which was kind of completely not as useful, right? If I can tell you, I can give you three possible vendors, that's pretty good. And so on to vendor statistics, this stuff can be used for two different purposes I think. One is increasing the probabilities even more from the last slides that I just talked about. So if you had two NAPs that were kind of tied, you could weight it based on the actual vendor, which one's more popular. The other cool thing to just look at is, you know, what are the most popular vendors out there for Bluetooth? Apple kind of takes the cake. You know, I said Blackberry was the number one actual device, but Apple has more products. So they're kind of taking the cake with this Blackberry in second. Samsung does a lot of embedded devices, so they're pretty far up there. Roku's was pretty interesting. I saw a lot of Roku's during the scan, and so before I started doing a lot of the correlations, I thought that Roku's might be the most popular device that I saw out there, but they're pretty high up on the list. Kind of interesting to know. In Roku's, they transmit very far. I was doing all this with my iPhone, and obviously I was getting Roku boxes just driving down the road, which is crazy, which means, you know, I'm probably getting, you know, 50 to 70 feet based on an iPhone. So they're really loud. Security, so what does this mean for security? You know, you can be tracked with Bluetooth. It's something that not everyone knows. I think that if it is something that is important to you, you know, you're probably best turning Bluetooth off. And on top of that, Bluetooth is a secure protocol itself, but there are vulnerabilities that exist out there, right? It's mostly based on software implementations, vendors who will create services that, you know, accept connections without actual pin authentication or easy pin authentication connections. You know, typically you'll see that where you can connect in just 0, 0, 0, 0. So it is out there. It is something to be aware of. And I think too, like, if you ever wanted to do research in this realm, this was the list that I never had, right? If you wanted to get the most bang for your buck, finding, you know, the most widespread device, this is kind of the list that I, you know, I wish that I could just go down and go for. So it would be interesting to just start from the top and start going down and doing Bluetooth audits on a lot of these devices. Awareness, you know, a lot of Bluetooth devices don't really act how you think. Or you might have Bluetooth in places where you don't know, right? A lot of people are not aware that Bluetooth is on in their car and discoverable all the time. A lot of people aren't aware that sometimes when you start up your car, your Bluetooth goes on in a discoverable mode for 60 seconds or longer. So this is just kind of, you know, this happens in other devices besides car audios. But this is just kind of something that, you know, some people might not be aware of. And it's something that you can't turn off most of the time. And I did notice a lot of bugs. If you notice in my device list broken out by actual devices out there, if you're aware of how iOS devices work, your Bluetooth actually only goes into the discoverable mode when you go to your Bluetooth settings menu. Yet I'm seeing so many iPads and iPhones in my scans that, you know, is it just a chance that I'm, you know, walking by somebody when they're actually configuring the Bluetooth? Most likely not. So what happens, this actually happened to me multiple times while I was scanning this. And I've never really scanned 24-7. So it wasn't apparent to me at the time that my phone would get stuck in discoverable mode sometimes. It depends on how you actually leave the Bluetooth settings page in your iOS device. If you leave too fast, sometimes it just kind of gets perma stuck in discoverable mode. So that's why you actually see iPhones and iPads in discoverable mode so much. And I believe that the other reason why there's so many discoverable Bluetooth devices out there is bad human-computer interface, right? Vendors just, you know, they give you that perma discoverable button when you really don't need it. I'm not picking on Apple here, but they did it right with iOS despite the bug that I mentioned. But in OSX, they don't, right? They could have done the same thing, whereas when you go into the configuration page, you're indiscoverable when you leave, you're not. But, you know, they have that perma button. I believe that happens with BlackBerry too. Legal issues, you know, I don't, you know, it's kind of the same as just getting Wi-Fi devices. It does seem to be a little more personal because it's something that belongs to you more than, you know, your home Wi-Fi router. But as far as legal issues go, I mean, there's really nothing out there. And it is kind of just based, the closest thing would be, you know, detecting Wi-Fi devices. Transstar and wireless work says, you know, if you don't want to be tracked, then it is your responsibility as a consumer to turn off your Bluetooth device. You know, I don't know, not everything can be turned off, so I don't know if that's the right answer. So all my data sets for this stuff can be downloaded from Bluetoothdatabase.com. As I said, I do a dump about once a week. All the client code and server code that I mentioned in this talk can be linked to from Bluetoothdatabase.com or you can go to my GitHub repo directly in order to get to this stuff. And if you need to contact me, hate mail, whatever, Ryan at hacknarr.com. Future work. I would like to currently on the Bluetoothdatabase.com, there's no real time statistics except for device sightings over time. So I'd like to take a lot of the slides that I kind of showed you today and just kind of have, you know, like the week's most popular Bluetooth devices or things like that, which would be pretty easy for me to add in there. Community participation, obviously the reason why I'm here today, if you guys want to participate in this, you know, feel free to install my clients, even if you don't want to submit it to this database, fire up your own server, I supply the code for it. Just kind of look at what's out and around you, right? It's kind of fun to see and it will, you know, it'll give you a wider, you know, idea of what's really going on out there in the Bluetooth space. Service enumeration. I didn't add this in originally and I might play around with it. I think if you, I didn't want to do anything like too, I wanted to be as evasive as I totally could. I didn't want to do RF COM scannings on a lot of these things, mostly because I don't want to interrupt anyone's daily process, right? RF COM scanning can sometimes bring up pop-ups on your devices and stuff like that, so I really wasn't looking to offend anyone. And a passive survey, I think that by comparing, by doing a large-scale passive survey and kind of comparing the datasets, it would lead to a lot of interesting space like you would get a wider view of what the actual Bluetooth deployment is out there. So you can kind of compare the set. She would see, okay, out of X amount of discoverable, how many passive do you typically see in an area? Things of this nature. It would be cool to do with standard rate and Bluetooth low energy, right? I don't believe anyone's really done a large-scale Bluetooth low energy scan, which would be kind of cool to see. So that is it. I almost clapped for myself.