 So we're here today to talk about a mechanism that identifies the type of device connecting to a Wi-Fi network. It can be quite specific. It can tell the difference between an iPhone 5s and an iPhone 5, between a Samsung Galaxy S7 and an S8, between a We-Things scale and a Nest thermostat. Classically, this kind of client detection will be called fingerprinting like the OS fingerprinting mechanisms in NMAP. However, in its current usage, the term fingerprinting has evolved to mean identification of specific users like browser fingerprinting. And, well, the word fingerprint kind of refers to an individual's fingers. As the mechanism discussed here identifies the species of the device and not the individual user, we refer to it as Wi-Fi taxonomy. It identifies the species. We'll get a chance to try it in the last few minutes during time for questions. The mechanism works by examining Wi-Fi management frames called MLME frames. These are frames used to join, leave, and configure the Wi-Fi network. They're not TCP IP packets. They're not routable. They don't leave the Wi-Fi network. We'll focus on two specific types of frames. The probe request is where a Wi-Fi client can ask for all nearby APs or one specific AP to respond. The client includes information about itself and its capabilities in the request, and the AP can respond with its own capabilities in the response. We'll also look at the association request, which is where a client joins a Wi-Fi network. The client includes many of the same capabilities as were in its probe request, plus a few more. There are a bunch more MLME frames, like authentication or action frames to modify various parameters. But for the taxonomy mechanism we're talking about today, we'll just rely on these two. Information elements are type-length value tuples packed one after another in the management frame. They're all optional, though in practice a few are universal because Wi-Fi can't work without them. Each Wi-Fi standard has added more types of information elements. In the 802.11B days, there were very few. .11G added a few more. .11N and AC added a bunch more, and so on. And in addition to the standard elements, there is a mechanism for vendors to define their own. Vendor extensions are type-221 with an ID for the vendor called the organizationally unique identifier, or OUI, and then followed by a subtype so that the vendor can define multiple of their own types. Because the length field provides enough information to skip over the IE, any Wi-Fi client device can interoperate whether it understands that vendor extension or not. It just skips over the ones that it doesn't implement. This is the association frame from an iPhone 7 plus as broken out by Wireshark. The association request includes the SSID that the client wants to join, information about its supported rates and channels, about its power levels and its radio management capabilities, plus three vendor extensions from Microsoft, Broadcom, and Apple. A few of the vendor extensions are very widespread. The Microsoft extension shown here is for prioritization, and it's widely implemented even on devices that are not running any kind of Windows OS. The Broadcom extension is also quite widespread owing to how common Broadcom chipsets are. The Apple extension shown here was added in iOS 10.2. We don't really know what it is, but it was added on all devices running that version or later. The signature lists the tag numbers of the IEs that are present in the frame in the order that they appear as a text string of decimal numbers. For vendor extensions, it additionally includes the OUI of the vendor and that vendor's subtype. For this first part of the signature, we end up with the text shown in red on the slide. This part of the signature is most strongly influenced by the OS of the client device where the client Wi-Fi stack is implemented. It's next most strongly influenced by the Wi-Fi chipset, both in terms of the standards it supports and on any vendor extensions that that vendor implements in their driver. In addition to the tag numbers, a few of the information elements contain capability bit masks or other information which is useful in identifying the device. For example, 802.11n defines 16 bits of optional capabilities and .11ac defines 32 bits more. This is most strongly influenced by the chipset and the subset of the standard that's implemented by that ASIC. The transmit power information element depends strongly on the board design and how the antennas are laid out. Two devices built by the same manufacturer using the same software or even using the same Wi-Fi chipset will often have different TX power values because their board layouts are different. The number of antennas that are present is encoded in both the .11n and the .11ac capabilities, and it's also indicative of the board design. And there's an extended capabilities bit mask which contains even more optional elements. It's most strongly influenced by the driver and the WPA supplicant software. A number of the capability bit masks are appended in the signature to further differentiate it, also shown in red on this slide. Looking at the signature as we've discussed it so far, it has become more complex over time. This shows the association request portion of the signature for three devices. The first is from an original iPhone which is a .11g device. This taxonomy mechanism wouldn't have worked very well in that time frame. There was very little differentiation between devices. iPhone 4s is a .11n device introduced about four years later and it added a number of options to its management frames. iPhone 7 is from about five years after that and it's a .11ac device and it added even more. The full signature contains the list of IEs and the various bit masks from each of the probe request and the association request separated by a pipe. The whole thing is prefaced by Wi-Fi 4 because this is the fourth iteration of the signature format. Prepending that string allowed the Wi-Fi 1, 2, and 3 signatures to remain in the database while we were working on updating everything. We shall speak no more of the earlier formats. When you include all of this into the signature, it ends up being quite distinctive and it allows us to identify what the device is. The taxonomy signature is influenced by the client OS, by its Wi-Fi chipset, by its board layout. The current database of signatures identifies the most common Wi-Fi devices which are overwhelmingly phones nowadays. We have signatures for most widely sold phones and tablet devices of the last few years and a selection of other types of devices, like media streaming devices from Google, Apple, Roku, Amazon, and so forth, and Internet of Things devices from Nest, Honeywell, and We Things and so on. For larger devices like laptops and desktops which use a separate Wi-Fi card, this mechanism identifies the card. We had signatures for some laptops and desktop devices in the database but it was kind of ridiculous. There was one model of Apple's airport extreme card which could be a MacBook or an iMac or a Mac Pro, basically any machine of that generation. We couldn't tell them apart using this mechanism. Intel Centrino chipsets as used in Windows laptops are even less distinctive. It could be basically anything. So at this point, we don't even try. We don't add signatures from laptops or desktops into the database. It just tends to result in confusion and isn't very useful. Additionally, there are a few classes of device which we choose not to gather signatures for. First, we only want to focus on common devices, devices that lots of people are likely to have. And we use lists of top-selling consumer electronics over the last few years to target devices that we want to gather signatures for. If it's something that isn't very common or is unique, we don't really want to put it in the database. The other set of things that we don't add to the database are things that would make people uncomfortable if they saw it in the list of devices on their router. That includes various medical devices, devices of an adult nature, home incarceration monitoring devices, and so forth. Many devices have been seen to emit more than one signature, and so there's more than one entry for them in the database. For devices which support both 2.4 and 5 GHz operation, the signatures are almost always distinct. There are information elements that are only defined for one band or the other, and the whole of .11ac is only defined for 5 GHz operation. So if the device supports both bands, we gather signatures from each of the two bands. However, even in the same band, devices often have multiple signatures. They vary what they advertise based on the local conditions, like noise. This example shows two signatures from a Google Pixel phone. It varies its handling of beam forming, presumably based on the noise environment that it sees. Clients can also behave differently depending on what they see from the AP in response to their probe request. For example, if the AP says that it supports radio resource management, most Apple and some Android devices will include a spectrum management IE in their association request. That's IE number 70 highlighted and read in that list. Another example is that although .11ac is only really defined for 5 GHz operation, many vendors have a proprietary extension to it, which makes it operate on 2.4 GHz, and we will see the .11ac fields in their probe request. They typically only then include it in the association if they see the magic proprietary handshake back from the AP, and so it won't be in the associate. So when capturing signatures for the database, we use three different APs to maximize the chance of capturing different signatures. Sometimes we see the same signature from multiple devices. These examples are all devices using the Broadcom 43362 chip set, running Linux, using the same driver, same Wi-Fi supplicant, same WPA supplicant, and they're all old enough that they don't have a transmit power information element. The signatures are identical. They're an Amazon Dash button, a first alert thermostat, a Nexus 7 from 2012 Roku HD and a We-Thing Scale. In most cases like this, we distinguish them using the upper 24 bits of the MAC address, which is an organizationally unique identifier. OUIs are assigned to the manufacturer, and adding the OUI as a qualifier can distinguish similar devices from different manufacturers which have the same signature. We sometimes also use information from DHCP. The options present in a DHCP request can identify the OS. This was originally developed by the Fingerbank project, and that whole mechanism inspired this mechanism for Wi-Fi. However, using DHCP gets us further and further from the Wi-Fi layer, and so we try to be more sparing in using it. In particular, only the access point will be able to see the DHCP request unencrypted. Other devices like sniffer devices that might want to use this mechanism would not be able to rely on DHCP. However, there remain a few cases which are still troublesome, mainly devices made by the same vendor using the same software, the same chipset, and about the same time. Often the transmit power information will distinguish them due to differing board designs, but not always. For example, iPad Air, second generation, and iPhone 6S have the same signature. We can try to use heuristics like if the DHCP host name contains the string iPad, it's probably an iPad, but if nothing else, we return all of the possibilities that it's one of these. This mechanism was originally developed as part of a Wi-Fi AP project. We intended to focus on identifying the Wi-Fi chipset the client was using. We thought that if we could just know what that chipset is, then we'd be able to implement all kinds of very clever bug workarounds and we would make Wi-Fi perfect. As it turns out, if bugs can be worked around easily, they mostly work around them in the client software. Who knew? Instead, where this kind of information is currently used is in the UI of the router, where there's a list of connected clients. We can give an indication of what the client is. If the client included a useful host name in its DHCP request, then that's great. If it didn't or if it includes something like its serial number as its name, then it's much more helpful to say what we think it is to help the user identify it. We also use it to correlate with other performance information to break it out by the kind of client device. My colleague gave Repenner and gave a talk at NetDev 1.1 on this topic. The graph on this page is from that talk and it shows Wi-Fi throughput getting better and better as the client gets closer to the AP until it gets really close and then it starts dropping again. That's unusual. Most devices don't do that. And you only can see that this is happening if you break it out by the type of device and see that some of them do some weird things. In the future, we may use the mechanism for more. We might use it for optimizations based on the type of client device. In particular, if we can know how well it handles packet reordering, we could use that to get lower latency on average by allowing the occasional packet to arrive out of order rather than buffer all of them to keep them all in order. Also, wireless intrusion detection systems might be able to use information like this. If they think they know what kind of device it is, then they know what sorts of network activities would be reasonable from that device. Other resources. So we published a paper about the mechanism, which goes one level of detail deeper into how it works. And the dev talk that I mentioned earlier is linked from the slides, which you'll be able to get after the talk. That talk described the overall environment where this mechanism was used and how it was used in that environment. So the current status. The implementation to extract signatures for clients went into host APD in August of 2016. And it's present in host APD 2.6 and later. The database of known signatures is released as open source code within Apache license on GitHub. And the link is also in the slides. It currently identifies about 60% of Wi-Fi clients across a broad swath of the market. The remaining 40% of devices are mostly laptops and desktops, but with a very long tail of just other stuff that we don't know what it is. So what comes next? There's this thing, which can identify an interesting subset of Wi-Fi client devices. The signature mechanism is in host APD. The database has been released as open source, but it's only useful if it's integrated into other products and systems. Wi-Fi APs, wireless intrusion detection systems, and anything else interesting that people think of. And so one of the main reasons for this talk is to build awareness that this thing exists and it's available for use. Other things that we need to do are develop tools for gathering signatures. It's a pretty manually intensive process right now, which means me. So also the longer we've been at it, the more we realize that the client responds to things that it sees from the AP. We've been using three APs for a long time. We need to start using even more and more different types of APs to make sure that we're getting the different signatures devices can emit. Other things that might happen in the future. So this talk has been all about how APs can identify client devices, but running it in reverse would probably work as well. A client device could list off the information elements that are present in the beacon that it sees from an AP and maybe in the probe response that it sees from the AP and use it to identify what type of AP that it's talking to. And then the client for any kind of performance or quality measurements that it does can also be associated with the brand and model of AP. So I surveyed coworkers about whether to run a demo. As you can see the results were quite encouraging. So you can try it. And you might be able to try it. Let me move it back. You might be able to try it. Okay. You can join the SSID is smell of Wi-Fi talk and the password is all lower case smell of Wi-Fi talk. And the system will try to identify what kind of device it is. I, to make sure that the demo worked, use the Nexus 4. There you go. Nexus 6P. Any questions? So the question is about voting and polling systems. Yes. I would not use this mechanism for protecting voting polling places. Other questions? Okay. Thank you.